WO2022227563A1 - 一种硬件电路、数据搬移方法、芯片和电子设备 - Google Patents

一种硬件电路、数据搬移方法、芯片和电子设备 Download PDF

Info

Publication number
WO2022227563A1
WO2022227563A1 PCT/CN2021/134610 CN2021134610W WO2022227563A1 WO 2022227563 A1 WO2022227563 A1 WO 2022227563A1 CN 2021134610 W CN2021134610 W CN 2021134610W WO 2022227563 A1 WO2022227563 A1 WO 2022227563A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
information
storage
data
instruction
Prior art date
Application number
PCT/CN2021/134610
Other languages
English (en)
French (fr)
Inventor
李越
朱志岐
王文强
徐宁仪
Original Assignee
上海阵量智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海阵量智能科技有限公司 filed Critical 上海阵量智能科技有限公司
Publication of WO2022227563A1 publication Critical patent/WO2022227563A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/3013Organisation of register space, e.g. banked or distributed register file according to data content, e.g. floating-point registers, address registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution

Definitions

  • the present application relates to computer technology, and in particular to a hardware circuit, a data transfer method, a chip and an electronic device.
  • thread register resources need to be occupied, and the thread Crossbar used for data movement between the thread and the thread register needs to be frequently occupied (there are as many threads as there are many threads based on the crossbar switch matrix. Selector), resulting in performance degradation of thread registers and thread Crossbar; on the other hand, the instruction issuing unit needs to send two instructions, a read instruction and a store instruction, resulting in a performance degradation of the instruction issuing unit.
  • a first aspect of the present application discloses a hardware circuit.
  • the hardware circuit includes: a memory access controller; a source memory and a destination memory connected to the memory access controller; and a memory controller connected to the source memory, the destination memory, and the memory access controller; the The source memory indicates the memory for sending data; the destination memory indicates the memory for receiving data; the memory access controller is used to, in response to the received data moving instruction, obtain the first data for reading data included in the data moving instruction.
  • the source memory is used for reading according to the read instruction Fetch the first information included in the instruction to read the target data, and send the read target data to the storage controller;
  • the storage controller is configured to acquire the second information, and according to the acquired second information The information and the read target data generate a store instruction to store the target data to the destination memory.
  • the hardware circuit further includes: an information memory; the information memory is connected to the memory access controller and the memory controller; the memory access controller is configured to store the memory access controller.
  • the second information is stored in the information memory, and a storage address for storing the second information is obtained; and a read instruction is generated according to the storage address and the first information, and the read instruction is sent to the source memory;
  • the source memory is configured to read target data according to the first information included in the read instruction, and send the read target data and the storage address included in the read instruction to the storage a controller;
  • the storage controller is configured to acquire the second information from the information memory according to the storage address, and generate a storage instruction according to the acquired second information and the read target data to store the The target data is stored in the destination memory.
  • a second aspect of the present application also proposes a hardware circuit, including: a memory access controller; a shared memory and an external memory connected to the memory access controller; and a shared memory, the memory access controller, and the external memory a first controller connected to the memory; and a second controller connected to the external memory, the memory access controller and the shared memory;
  • the memory access controller is configured to respond to the received data movement instruction, Acquiring first information for reading data and second information for writing data included in the data moving instruction, and generating a read instruction according to the first information, and sending the read instruction to the source memory ;
  • the source memory is used to read target data according to the first information included in the read instruction, and the read target data is sent to the storage controller;
  • the storage controller is used to obtain the second information, and generate a storage instruction according to the acquired second information and the read target data to store the target data in a destination memory; wherein, the source memory is the shared memory, and the destination memory is the external memory, the storage controller is the first controller; or the source memory is the external
  • the hardware circuit further includes: a first memory and a second memory; the first memory is connected to the memory access controller and the first controller; the second memory connected with the memory access controller and the second controller; the memory access controller is configured to store the second information in an information memory to obtain a storage address for storing the second information; and, according to The storage address and the first information generate a read instruction, and the read instruction is sent to the source memory; the source memory is used for reading target data according to the first information included in the read instruction, and send the read target data and the storage address included in the read instruction to a storage controller; the storage controller is configured to obtain the second data from the information storage according to the storage address information, and generate a storage instruction according to the acquired second information and the read target data to store the target data in a destination memory; wherein, the source memory is the shared memory, and the destination memory is the external memory, the information memory is the first memory, the storage controller is the first controller; or the source memory is the external memory, the destination memory is the shared memory, the The information
  • the data moving instruction includes an instruction initiated for each thread in the thread group; the information storage includes several storage units; the memory access controller is configured to, according to the data moving instruction, obtaining the first information and the second information corresponding to the threads; and sending the second information corresponding to the threads to the information storage; the information storage is used to determine the A target storage unit that does not store data in the plurality of storage units, stores the second information corresponding to each thread in the target storage unit, and sends the storage address of the target storage unit to the memory access controller to causing the memory access controller to obtain the memory address.
  • the target storage unit includes a storage space corresponding to each thread; the information storage is configured to store second information corresponding to each thread in the target storage unit , the storage space corresponding to each thread.
  • the memory access controller is configured to determine, according to the source addresses represented by the first information corresponding to each thread, several first threads in which address conflicts occur among the threads; The first information respectively corresponding to the plurality of first threads is combined with the storage address to obtain the first read instructions corresponding to the plurality of first threads respectively.
  • the memory access controller is further configured to combine first information corresponding to several second threads that do not have address conflicts among the threads with the storage address to obtain the same the second read instructions corresponding to the plurality of second threads.
  • the source memory is configured to, in response to the first read instruction, acquire the first information included in the first read instruction, and according to the source address represented by the acquired first information, Read the first data corresponding to the first thread corresponding to the first read instruction; the first data to be read, the storage address included in the first read instruction, and the The thread ID corresponding to the first data is sent to the storage controller; in response to the second read instruction, the first information included in the second read instruction and corresponding to each of the second threads is obtained, and according to each The source address represented by the first information corresponding to the second thread, respectively, reads the second data corresponding to each second thread; the second data to be read, the storage address included in the second read instruction, and the thread ID corresponding to each second data is sent to the storage controller.
  • the target storage unit is further configured with an access identifier corresponding to each thread; the access identifier indicates whether the data in the storage space corresponding to the thread is accessed; the The storage controller is configured to obtain the second information corresponding to the target thread from the information storage according to the storage address; the information storage is configured to, in response to the storage controller, obtain the second information from the storage The second information is obtained from the information storage, and the access identifier corresponding to the target thread is set as the first identifier; wherein, the first identifier indicates that the data in the storage space corresponding to the target thread has been accessed.
  • the information storage is used to determine whether the access identifiers corresponding to the threads included in the target storage unit are all the first identifiers, and if so, release the access identifiers stored in the target storage unit. data.
  • the memory access controller includes a configuration register, an address register, and an operation unit; the address register is used to instruct the target data to be moved in response to the data move instruction to include multiple source addresses. data, which stores the preset source addresses in the multiple source addresses; the configuration register is used for storing the multiple source addresses used to obtain the multiple source addresses based on the preset source addresses, except for the preset source addresses. The operation relationship of other source addresses; the operation unit is used to generate a plurality of instructions for moving the data of the multiple source addresses according to the operation relationship and the preset source address to complete the target data. 's move.
  • the hardware circuit includes a plurality of shared memories, and a plurality of external memories, and between the plurality of shared memories and the second controller and the plurality of external memories and the first
  • a crossbar switch matrix is connected between the controllers; the data moving instruction is used to move data between the plurality of shared memories and the plurality of external memories; the storage controller is used to move data according to the acquired
  • the second information and the read target data generate a storage instruction, and the storage instruction is sent to the destination memory through the crossbar matrix to complete the data transfer.
  • the memory access controller further includes a state feedback unit; the state feedback unit is configured to send a first state to the processor in response to receiving a data move instruction sent by the processor information to indicate that the hardware circuit is currently in a busy state; in response to storing the target data to the destination memory according to the storage instruction, sending second state information to the processor to indicate that the hardware circuit is currently in an idle state.
  • the first memory and/or the second memory includes a read-write-supported buffer.
  • a third aspect of the present application further provides a data moving method, which is applied to the hardware circuit shown in any of the foregoing embodiments.
  • the method includes: in response to the received data moving instruction, the memory access controller obtains the data moving instruction, including: the first information for reading data and the second information for writing data; and generating a read instruction according to the first information, and sending the read instruction to the source memory; the source memory according to the The first information included in the read instruction reads the target data, and sends the read target data to the storage controller; the storage controller acquires the second information, and according to the acquired second information and the read The fetched target data generates a store instruction to store the target data to the destination memory.
  • the hardware circuit further includes: an information memory; the information memory is connected to the memory access controller and the memory controller.
  • the memory access controller obtains the first information for reading data and the second information for writing data included in the data moving instruction; and, according to the first information generating a read instruction from the information, and sending the read instruction to the source memory, including: the memory access controller, in response to the received data moving instruction, obtains the first data for reading data included in the data moving instruction information and second information for writing data, and storing the second information in the information memory to obtain a storage address for storing the second information; and, according to the storage address and the first information A read instruction is generated and sent to the source memory.
  • the source memory reads data according to the first information included in the read instruction, and sends the read data to the storage controller, including: the source memory reads data according to the first information included in the read instruction
  • the target data is read, and the read target data and the storage address included in the read instruction are sent to the storage controller.
  • the storage controller acquires the second information, and generates a storage instruction according to the acquired second information and the read data to store the data in the destination memory, including: the storage controller according to the acquired second information and the read data. the storage address, acquire the second information from the information storage, and generate a storage instruction according to the acquired second information and the read target data to store the target data in the destination storage.
  • a fourth aspect of the present application further provides a data moving method, which is applied to the hardware circuit shown in any of the foregoing embodiments.
  • the method includes: the memory access controller, in response to a received data moving instruction, obtains the data moving
  • the instruction includes first information for reading data and second information for writing data, and generating a read instruction according to the first information, and sending the read instruction to the source memory; the source memory Read target data according to the first information included in the read instruction, and send the read target data to a storage controller; the storage controller acquires the second information, and according to the acquired second information
  • a store instruction is generated with the read target data to store the target data to the destination memory.
  • the hardware circuit further includes: a first memory and a second memory; the first memory is connected to the memory access controller and the first controller; the second memory connected with the memory access controller and the second controller.
  • the memory access controller obtains the first information for reading data and the second information for writing data included in the data moving instruction, and, according to the first generating a read instruction from the information, and sending the read instruction to the source memory, including: the memory access controller, in response to the received data moving instruction, acquires first information for reading data included in the data moving instruction and second information for writing data, and storing the second information in an information memory to obtain a storage address for storing the second information; and generating a read instruction according to the storage address and the first information , and send the read instruction to the source memory.
  • the source memory reads data according to the first information included in the read instruction, and sends the read data to the storage controller, including: the source memory reads data according to the first information included in the read instruction target data, and send the read target data and the storage address included in the read instruction to the storage controller.
  • the storage controller generates a storage instruction according to the second information and according to the acquired second information and the read data to store the data in the destination memory, including: the storage controller according to the storage address, acquire the second information from the information memory, and generate a storage instruction according to the acquired second information and the read target data to store the target data in the destination memory.
  • the source memory is the shared memory
  • the destination memory is the external memory
  • the information memory is the first memory
  • the storage controller is the first controller
  • the source The memory is the external memory
  • the destination memory is the shared memory
  • the information memory is the second memory
  • the storage controller is the second controller
  • the data movement instruction includes an instruction initiated for each thread in a thread group; the information memory includes a number of storage units.
  • the memory access controller acquires the first information for reading data and the second information for writing data included in the data moving instruction, and stores the second information. storing to the information memory, and obtaining a storage address for storing the second information, comprising: the memory access controller acquiring the first information and the first information corresponding to the threads according to the data moving instruction second information; and sending the second information corresponding to each thread to the information storage.
  • the method may further include: the information storage determining a target storage unit that does not store data in the plurality of storage units, storing the second information corresponding to each thread in the target storage unit, and storing the data in the target storage unit.
  • the memory address of the target memory unit is sent to the memory access controller so that the memory access controller obtains the memory address.
  • the target storage unit includes storage spaces corresponding to the respective threads.
  • the information storage determines a target storage unit that does not store data in the plurality of storage units, and stores the second information corresponding to each thread in the target storage unit, including: the information storage stores each thread separately. The corresponding second information is stored in the target storage unit, the storage space corresponding to each thread.
  • the generating, by the memory access controller, a read instruction according to the storage address and the first information includes: the memory access controller is represented by the first information corresponding to each thread, respectively. source address, to determine a number of first threads with address conflicts among the threads; combine the first information corresponding to the several first threads with the storage address to obtain the first information corresponding to the several first threads respectively A read command.
  • the memory access controller generates a read instruction according to the storage address and the first information, including: the memory access controller assigns, by the memory access controller, an address conflict that does not occur in each thread.
  • the first information respectively corresponding to the plurality of second threads is combined with the storage address to obtain second read instructions corresponding to the plurality of second threads.
  • the source memory reads target data according to the first information included in the read instruction, and sends the read target data and the storage address included in the read instruction to
  • the storage controller includes: the source memory, in response to the first read instruction, acquires the first information included in the first read instruction, and reads the data with the source address represented by the acquired first information according to the source address.
  • the thread ID is sent to the storage controller; in response to the second read instruction, the first information included in the second read instruction and corresponding to each second thread is obtained, and according to the respective second thread
  • the thread ID corresponding to the two data is sent to the storage controller.
  • the target storage unit further includes an access identifier corresponding to each thread; the access identifier indicates whether the data in the storage space corresponding to the thread is accessed.
  • acquiring, by the storage controller, the second information from the information storage according to the storage address includes: the storage controller acquiring, according to the storage address, the information of the target thread from the information storage. corresponding second information.
  • the method may further include: in response to the storage controller acquiring the second information from the information storage according to the storage address, the information storage sets the access identifier corresponding to the target thread to the first an identifier; wherein, the first identifier indicates that the data in the storage space corresponding to the target thread has been accessed.
  • the method further includes: determining, by the information storage unit, whether the access identifiers included in the target storage unit and corresponding to the respective threads are all the first identifiers, and if so, releasing the target storage unit data stored in the unit.
  • the access controller includes a configuration register, an address register, and an arithmetic unit.
  • the method may further include: in response to the data moving instruction indicating that the target data to be moved includes data of multiple source addresses, the address register stores a preset source address in the multiple source addresses; the The configuration register stores the operation relationship used to obtain other source addresses other than the preset source address among the plurality of source addresses based on the preset source address; the operation unit is based on the operation relationship and the preset source address.
  • a source address is set, and multiple instructions for moving data at the multiple source addresses are generated to complete the moving of the target data.
  • the hardware circuit includes a plurality of shared memories, and a plurality of external memories, and between the plurality of shared memories and the second controller and the plurality of external memories and the first A crossbar switch matrix is connected between the controllers; the data transfer instruction is used for data transfer between the plurality of shared memories and the plurality of external memories.
  • the storage controller generates a storage instruction according to the acquired second information and the read data to store the data in the destination memory, including: the storage controller is based on the acquired second information and the read data.
  • the read target data generates a storage instruction, and the storage instruction is sent to the destination memory through the crossbar matrix to complete the data movement.
  • the access controller further includes a state feedback unit.
  • the method may further include: the state feedback unit: in response to receiving the data moving instruction sent by the processor, sending first state information to the processor to indicate that the hardware circuit is currently in a busy state; responding to After storing the target data in the destination memory according to the storage instruction, second state information is sent to the processor to indicate that the hardware circuit is currently in an idle state.
  • the first memory and/or the second memory includes a read-write-supported buffer.
  • a fifth aspect of the present application further provides a chip, including the hardware circuit shown in any of the foregoing embodiments.
  • a sixth aspect of the present application further provides an electronic device, including the hardware circuit shown in any of the foregoing embodiments or the chip shown in any of the foregoing embodiments.
  • a seventh aspect of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a controller, implements the data moving method shown in any of the foregoing embodiments.
  • point-to-point data movement between the source memory and the destination memory can be formed, thereby releasing the thread register resources, and the thread and thread registers Thread Crossbar (a multi-to-many selector of threads based on a crossbar matrix), thereby reducing thread power consumption and improving thread Crossbar performance; This improves the efficiency of data transfer.
  • Thread Crossbar a multi-to-many selector of threads based on a crossbar matrix
  • point-to-point data transfer between the source memory (shared memory) and the destination memory (external memory) can be formed, thereby releasing thread register resources, as well as threads and threads Thread Crossbar between registers, thereby reducing thread power consumption and improving thread Crossbar performance;
  • third, for different types of For the data moving instructions different devices in the hardware circuit are used to complete the data moving solutions disclosed in the foregoing embodiments, so that two types of data moving instructions can be processed asynchronously, thereby improving the data moving efficiency.
  • FIG. 2 is a schematic structural diagram of a hardware circuit shown in the application.
  • FIG. 3 is a schematic structural diagram of a hardware circuit shown in the application.
  • FIG. 5 is a schematic structural diagram of a hardware circuit shown in the application.
  • FIG. 6 is a method flowchart of a data moving method shown in this application.
  • the present application proposes a hardware circuit (hereinafter referred to as a circuit).
  • the devices in the circuit can cooperate with each other to execute the data transfer method.
  • the hardware circuit may be part of a chip or an integrated circuit board.
  • FIG. 1 is a schematic structural diagram of a hardware circuit shown in this application.
  • the method disclosed in the present application may be used for data transfer between the memories included in the hardware circuit.
  • the hardware circuit structure shown in FIG. 1 is only a schematic illustration, and other structures may exist in actual situations.
  • the present application does not specifically limit the structure of the hardware circuit.
  • the hardware circuit may include a memory access controller 101; a source memory 102 and a destination memory 103 connected to the memory access controller 101; and a memory access controller 101 and the source memory 102.
  • the storage controller 104 to which the destination storage 103 is connected.
  • the source memory 102 indicates a memory for sending data; the destination memory 103 indicates a memory for receiving data.
  • the source storage 102 and the destination storage 103 may refer to different storages.
  • the source memory 102 in response to moving data from shared memory to external memory, the source memory 102 may be shared memory and the destination memory 103 may be external memory.
  • the source memory 102 in response to data being moved from the external memory to the shared memory, the source memory 102 may be the external memory, and the destination memory 103 may be the shared memory.
  • the memory access controller 101 can open an interface to receive a data moving instruction, and then can be used to respond to the received data moving instruction to obtain the first information for reading data included in the data moving instruction and for writing. The second information of the data.
  • the data moving instruction may be an instruction initiated by the instruction issuing unit for the thread in response to the data moving requirement.
  • the data moving instruction may include first information for reading data and second information for writing data.
  • the first information can be used to read data from the source memory 102 .
  • the first information may include information such as a source address.
  • the target data can be read from the source memory 102 through the source address.
  • the second information can be used to store the read target data in the target memory 103 .
  • the second information may include information such as a destination address.
  • the read target data can be stored in the target memory 103 through the target address.
  • the memory access controller 101 may parse the received data movement instruction to obtain the first information and the second information.
  • the memory access controller 101 may be configured to generate a read instruction according to the first information, and send the read instruction to the source memory 102 .
  • the memory access controller 101 can generate a read instruction according to the first information through a built-in instruction editing unit, and send the instruction to the source memory 102 .
  • the source memory 102 may be configured to read target data according to the first information included in the read instruction, and send the read target data to the storage controller 104 .
  • the source memory 102 may address according to the source address represented by the first information included in the read instruction, and then read out the data stored in the source address.
  • the read data may be sent to the storage controller 104 according to the storage controller address (the access address of the storage controller) included in the read instruction.
  • the storage controller 104 may be configured to acquire the second information, and generate a storage instruction according to the acquired second information and the read target data to store the target data in the destination memory 103 .
  • the memory access controller 101 may send the second information to the storage controller 104 for storage in advance.
  • the storage controller 104 can obtain the second information corresponding to the target data from the stored at least one second information, that is, the second information used to write the target data, and then can use the instruction editing unit to obtain the second information according to the obtained second information.
  • a store instruction is generated, and the store instruction is sent to the destination memory 103 .
  • the destination memory 103 may, in response to the destination address represented by the second information included in the storage instruction, store the target data in the storage space corresponding to the destination address.
  • the memory access controller may, in response to the received data moving instruction, obtain the first information for reading data and the second information for writing data included in the data moving instruction, and according to the The first information generates a read instruction and sends the read instruction to the source memory.
  • the source memory reads data according to the first information included in the read instruction, and sends the read target data to the storage controller.
  • the storage controller may acquire the second information, and generate a storage instruction according to the acquired second information and target data to store the target data in the destination memory.
  • point-to-point data movement between the source memory and the destination memory can be formed, thereby releasing the thread register resources, and the thread Crossbar between the thread and the thread register (the threads based on the crossbar switch matrix are many to many. Selector), thereby reducing thread power consumption and improving thread Crossbar performance; on the other hand, only the instruction issuing unit needs to send a data moving instruction to complete the data moving, thereby improving the performance of the instruction issuing unit.
  • the hardware circuit may include an information memory for storing the second information, thereby freeing the storage space of the storage controller and improving the performance of the storage controller.
  • FIG. 2 is a schematic structural diagram of a hardware circuit shown in this application.
  • the method disclosed in the present application may be used for data transfer between the memories included in the hardware circuit.
  • the hardware circuit structure shown in FIG. 2 is only a schematic illustration, and other structures may exist in an actual situation.
  • the present application does not specifically limit the structure of the hardware circuit.
  • the hardware circuit may include a memory access controller 201; a source memory 202, a destination memory 203, and an information memory 204 connected to the memory access controller 201;
  • the storage controller 205 to which the destination storage 203 and the information storage 204 are connected.
  • the storage controller 205 is connected to the storage controller 201 through the information storage 204 .
  • the memory access controller 201 may store the second information in the information memory 204 to obtain a storage address for storing the second information.
  • the information storage 204 may include several storage units.
  • the information storage 204 may store the received second information in an empty storage unit, and then return the storage address corresponding to the storage unit to the memory access controller 201 .
  • the memory access controller 201 may generate a read command according to the storage address and the first information, and send the read command to the source memory 202 .
  • the memory access controller 201 may obtain the memory address from the information memory 204 and the access address of the memory controller 205 according to the first information through a built-in instruction editing unit, The read command is generated and sent to the source memory 202 , so that the source memory 202 can send the read target data to the storage controller 205 .
  • the source memory 202 is configured to read target data according to the first information included in the read instruction, and send the read target data and the storage address included in the read instruction to the storage controller 205.
  • the source memory 202 may perform addressing according to the source address represented by the first information included in the read instruction, and then read out the data stored in the source address as the target data. Afterwards, the read target data and the storage address included in the read instruction may be sent to the storage controller 205 .
  • the storage controller 205 is configured to acquire the second information from the information storage 204 according to the storage address, and generate a storage instruction according to the acquired second information and the read target data to store all the information.
  • the target data is stored in the target memory 203 .
  • the storage controller 205 may generate a second information read instruction according to the storage address and send it to the information storage 204 .
  • the information storage 204 may perform addressing according to the storage address in response to the read instruction, and return the second information stored in the storage address to the storage controller 205 .
  • the storage controller 205 can generate a storage instruction through the instruction editing unit according to the acquired second information and the read target data, and send the storage instruction to the destination memory 203 .
  • the destination memory 203 may store the read target data in a storage space corresponding to the destination address in response to the destination address represented by the second information included in the storage instruction.
  • the second information can be stored by using the information memory as an intermediate storage device, thereby releasing the storage space of the storage controller and improving the performance of the storage controller.
  • the information storage may include at least one of a buffer that supports one read and one write, and a buffer that supports read and write out-of-order, so that the second information parsed by the memory access controller can be stored.
  • the process and the process of reading the second information from the information storage by the storage controller are parallel to improve data transfer efficiency.
  • FIG. 3 is a schematic structural diagram of a hardware circuit shown in this application.
  • the hardware circuit may include: a memory access controller 301 ; a shared memory 302 and an external memory 303 connected to the memory access controller 301 ; and the shared memory 302 and the memory access controller 301 and a first controller 304 connected to the external memory 303 ; and a second controller 305 connected to the external memory 303 , the memory access controller 301 and the shared memory 302 .
  • the shared memory 302 is used to provide shared data services for threads, thread groups, thread blocks or processes. This application does not limit the type of shared memory.
  • the shared memory 302 may adopt SRAM (Static Random-Access Memory, static random-access memory).
  • SRAM Static Random-Access Memory, static random-access memory.
  • the shared memory 302 can store data that the computing system needs to use to perform the current task.
  • the external memory 303 is a concept relative to the shared memory 302, and is used to store task result data. Data movement often occurs between the external memory 303 and the shared memory 302 .
  • the present application does not limit the type of the external memory 303 .
  • the external memory 303 may be DDR SDRAM (Double Data Rate Synchronous Dynamic Random-Access Memory, double-rate synchronous dynamic random-access memory).
  • data can be moved from the shared memory 302 to the external memory 303 .
  • the shared memory 302 can be regarded as the source memory
  • the external memory 303 can be regarded as the destination memory.
  • data may be moved from the external memory 303 to the shared memory 302 .
  • the external memory 303 can be regarded as the source memory
  • the shared memory 302 can be regarded as the destination memory.
  • the functions of the first controller 304 and the second controller 305 may be equivalent to the aforementioned storage controllers.
  • the first controller 304 can be used to obtain the second information, and according to the second information and the target data read from the shared memory 302, a storage instruction is generated to store the target data in the External storage 303.
  • the second controller 305 can be used to obtain the second information, and according to the second information and the target data read from the external memory 303, a storage instruction is generated to transfer the target data. Stored in shared memory 302 .
  • the data moving instruction may be an instruction initiated by the instruction issuing unit for a thread or a thread group (Warp) in response to a data moving requirement. It can be understood that an instruction initiated for a thread group is actually an instruction initiated by each thread in the thread group.
  • an instruction issued for a thread group is referred to as a thread group instruction
  • an instruction issued for a thread is referred to as a thread instruction.
  • the information shared by each thread instruction can be extracted as shared information, so that the instruction format can be simplified compared with the method of sending instructions for each thread individually.
  • the performance of the instruction issuing unit is improved, and the other is On the one hand, it improves the efficiency of instruction processing, thereby improving the efficiency of data movement.
  • the types of the data moving instructions may include first type instructions and second type instructions.
  • the following description will be given by taking the data moving instruction as the first type instruction as an example. It can be understood that, for the data moving method for the second type of instruction, reference may be made to the data moving process for the first type of instruction, which will not be described in detail here.
  • the shared memory 302 may be regarded as the source memory
  • the external memory 303 may be regarded as the destination memory
  • the first controller 304 may be regarded as a storage controller.
  • the memory access controller 301 is configured to, in response to the received data moving instruction, acquire the first information for reading data and the second information for writing data included in the data moving instruction, and, according to the A message generates a read command and sends the read command to the source memory.
  • the source memory is configured to read target data according to the first information included in the read instruction, and send the read target data to the storage controller.
  • the storage controller is configured to acquire the second information, and generate a storage instruction according to the acquired second information and the read target data to store the target data in a destination memory.
  • the source memory is the external memory 303
  • the destination memory is the shared memory 302
  • the storage controller is the second controller 305 .
  • the hardware circuit may include an information memory for storing the second information, whereby the storage space of the storage controllers (the first controller and the second controller) may be released, and the storage controller may be upgraded performance.
  • FIG. 4 is a schematic structural diagram of a hardware circuit shown in this application.
  • the hardware circuit may include: a memory access controller 401 ; a shared memory 402 , an external memory 403 , a first memory 404 and a second memory 405 connected to the memory access controller 401 ; a first controller 406 connected to the shared memory 402 , the first memory 404 and the external memory 403 ; and a second controller connected to the external memory 403 , the second memory 405 and the shared memory 402 407.
  • the functions of the first memory 404 and the second memory 405 may be equivalent to the aforementioned information memory.
  • the first memory 404 may be used to store the second information.
  • the second memory 405 may be used to store the second information.
  • the shared memory 402 can be regarded as the source memory
  • the external memory 403 can be regarded as the destination memory
  • the first memory 404 can be regarded as the information memory
  • the first controller 406 can be regarded as the information memory. for the storage controller.
  • the memory access controller 401 can receive the data moving instruction through the set interface, and then can be used to respond to the received data moving instruction to obtain the first information for reading the data included in the data moving instruction and use it. For writing the second information of the data, and storing the second information in the information memory (the first memory 404 ), a storage address for storing the second information is obtained.
  • the data moving instruction includes an instruction initiated for each thread in the thread group, and the memory access controller 401 may be configured to obtain the corresponding corresponding to each thread according to the data moving instruction. the first information and the second information.
  • the data movement instruction may be split to obtain thread instructions for each thread in the thread group, and then each thread instruction may be parsed separately to obtain first information and second information corresponding to each thread. information. In this way, asynchronous processing of each thread instruction included in the thread group instruction is realized, and the data transfer efficiency is improved.
  • the memory access controller 401 may be configured to send the second information corresponding to each thread to the information memory (the first memory 404 ). In some embodiments, the memory access controller 401 may send the second information of each thread to the information storage (the first storage 404 ) individually or collectively.
  • the information storage may include several storage units; each storage unit may have a one-to-one correspondence with each storage address, for example, each storage unit has a corresponding storage address.
  • the information storage (the first storage 404) can be used to determine the target storage unit that does not store data in the plurality of storage units, and store the second information corresponding to each thread in the storage unit. target storage unit, and sending the storage address of the target storage unit to the memory access controller 401 so that the memory access controller 401 obtains the storage address for storing the second information.
  • the second information corresponding to each thread in the same thread group can be stored in the same storage unit, so that each thread can share the same storage address, thereby simplifying the reading of instructions when asynchronously processing instructions of each thread With the difficulty of generating storage instructions, the efficiency of data movement is improved.
  • the target storage unit includes storage spaces corresponding to the threads respectively.
  • the information storage (the first storage 404 ) may be configured to store the second information corresponding to each thread in the target storage unit, the storage space corresponding to each thread.
  • each thread in the thread group has a unique thread ID (Identifier, identification), and the information controller (the first memory 404 ) can maintain the corresponding relationship between the thread ID of the thread and each storage space.
  • the second information After receiving the second information corresponding to a thread, the second information may be stored in the corresponding storage space according to the thread ID of the thread. In this way, the subsequent reading of the second information corresponding to each thread can be facilitated.
  • the target storage unit further includes an access identifier corresponding to each thread.
  • the access identifier may indicate whether data in the storage space corresponding to the thread is accessed or whether data is stored.
  • the first identifier may indicate that data in the storage space is accessed, or there is no data in the storage space. Data that has not been accessed can be indicated in the storage space through the second identifier.
  • the information storage (the first storage 404 ) can be used to set the access identifier corresponding to the thread as the second identifier to represent This storage space has stored data that has not been accessed. In this way, it is convenient to count whether the second information corresponding to each thread has been read (accessed). If the access identifiers corresponding to each thread in the same storage unit are the first identifiers, the storage unit can be released, thereby increasing the storage unit complexity. usage rate.
  • the information storage (the first storage 404) can set the access identifier of the thread corresponding to the storage space as the first identifier; or, after a certain storage unit is released, the information storage (the first storage 404)
  • the first memory 404) may set the access identifier corresponding to each thread as the first identifier; the specific processing process will be described in detail later.
  • the information storage (the first storage 404 ) returns the storage address of the target storage unit to the memory access controller 401 after completing the second information storage.
  • the memory access controller 401 may be configured to generate a read command according to the storage address and the first information, and send the read command to the source memory.
  • the memory access controller 401 may be configured to determine, according to the source addresses represented by the first information corresponding to the threads, whether there are several first threads with address conflicts among the threads. Then, when there are several first threads, the first information corresponding to the several first threads can be combined with the storage address to obtain the first read instructions corresponding to the first threads respectively. Afterwards, the obtained plurality of first read commands may be sequentially sent to the source memory (shared memory 402 ) to complete the data read. This can help to solve the problem of abnormal data reading caused by address conflicts, thereby improving the accuracy of data reading.
  • the memory access controller 401 may be further configured to combine the first information corresponding to several second threads that do not have address conflicts among the threads with the storage address to obtain the first information corresponding to the storage address.
  • threads 0-9 that do not have an address access conflict need to read data from source memory (shared memory 402), respectively. Where threads 0-9 belong to the same thread group.
  • the memory access controller 401 may generate a second read instruction for threads 0-9 according to data read operations corresponding to threads 0-9 respectively.
  • the second read instruction may include only one piece of information related to the thread group basic information (such as the thread group number).
  • the instruction format can be simplified.
  • the efficiency of generating instructions by the memory access controller is improved, thereby improving its performance.
  • the processing efficiency of the instructions by the source memory can be improved, thereby improving the data transfer efficiency.
  • the source memory (shared memory 402) can be used to read data according to the first information included in the read command, and store the data from the source memory (shared memory 402).
  • the read data and the storage address included in the read instruction are sent to a storage controller (the first controller 406 or the second controller 407 ).
  • the source memory (shared memory 402 ) may be configured to, in response to the first read instruction, acquire first information included in the first read instruction, and characterize the source according to the acquired first information address, and read the first data corresponding to the first thread corresponding to the first read instruction. Then, the read first data, the storage address included in the first read instruction, and the thread ID corresponding to the first data may be sent to the storage controller (the first controller 406 ). ).
  • the first read instruction may include the thread ID of the first thread, the first information corresponding to the first thread, the storage address of the second information corresponding to the first thread, and the storage controller (No. A controller 406) address.
  • the source memory shared memory 402
  • the source memory can be addressed according to the source address represented by the first information, and the stored first data can be read.
  • the thread ID, the storage address of the first data, the second information can be sent to the storage controller (first controller 406) according to the address of the storage controller (first controller 406) to transfer the first A data is stored in the destination memory (external memory 403). Thereby, the data read can be completed for the first read command.
  • the source memory (shared memory 402 ) may be configured to acquire, in response to the second read instruction, the first information included in the second read instruction and corresponding to each of the second threads, respectively, And according to the source address represented by the first information corresponding to each second thread, the second data corresponding to each second thread is read. Then, each second data read, the storage address included in the second read instruction, and the thread ID corresponding to each second data may be sent to the storage controller (the first controller 406 ).
  • the second read instruction may include the thread ID of each second thread, the first information corresponding to each second thread, the storage address of the second information corresponding to each second thread, and the storage control controller (first controller 406) address.
  • the source memory shared memory 402 may use each second thread as a current thread and execute:
  • the first information corresponding to the current thread is acquired from the second read instruction through the thread ID of the current thread. Then, addressing is performed through the source address represented by the first information and the second data corresponding to the current thread is read. Then, according to the address of the storage controller (first controller 406), the read second data, the thread ID of the current thread, and the storage address of the second information corresponding to the current thread can be sent to the storage controller ( The first controller 406) to store the second data corresponding to the current thread to the destination memory (external memory 403). Thereby, data reading can be completed for the second read instruction.
  • first controller 406 After the storage controller (first controller 406) receives information such as the second data corresponding to each second thread, the storage address of the second information, the thread ID and other information read from the source storage (shared memory 402), For each second thread, respectively, according to the storage address of the second information corresponding to the second thread, the corresponding thread ID, from the information storage (first storage 404) to obtain the second thread corresponding to the second thread. second information, and generate a storage instruction according to the acquired second information and the second data corresponding to the second thread to store the second data in the destination memory.
  • the storage controller may acquire the second information in various ways. In some embodiments, the storage controller (the first controller 406 ) may only obtain the second information according to the storage address. , and obtain the second information from the information storage (the first storage 404).
  • the storage controller when the storage unit includes storage spaces corresponding to multiple threads, and a corresponding relationship between thread IDs and each storage space is established, the storage controller (the first controller 406 ) may ID and storage address, construct a second information read command, and send it to the information memory (first memory 404).
  • the information memory (the first memory 404) can respond to the second information read instruction, perform addressing according to the storage address, read the second information corresponding to the thread ID, and return the second information to the storage controller (first controller 406).
  • the storage controller (the first controller 406 ) may be configured to generate a storage instruction according to the acquired second information and the read data, and store the The store instruction is sent to the destination memory (external memory 403).
  • the destination memory (external memory 403 ) can be addressed according to the destination address represented by the second information, and store the data read from the source memory (shared memory 402 ) to the corresponding address.
  • the data transfer instruction (the first type of instruction)
  • the data transfer from the shared memory 402 to the external memory 403 is completed.
  • the information memory (the first memory, the second memory) can be used as an intermediate storage device to store the second information, thereby releasing the storage space of the memory controller (the first controller, the second controller) , to improve the performance of the storage controller.
  • the information memory (the first memory and/or the second memory) may be a buffer that supports one read and one write, so that the process and storage of the second information parsed by the memory access controller can be stored.
  • the process of reading the second information from the information storage by the controller (the first controller and/or the second controller) is parallel, which improves the data transfer efficiency.
  • the information store (first memory 404 ) may be used to read the read.
  • the access identifier of the target thread corresponding to the obtained second information is set as the first identifier; wherein, the first identifier indicates that the data in the storage space corresponding to the target thread has been accessed.
  • the information storage (the first storage 404) may periodically determine whether the access identifiers corresponding to the threads included in the target storage unit are all the first identifiers, and if so, release the data stored in the target storage unit. Thereby, the memory cell multiplexing rate can be increased.
  • the memory access controller may include a configuration register, an address register, and an arithmetic unit.
  • the address register may be configured to store a preset source address among the multiple source addresses in response to the data move instruction indicating that the moved target data includes data of multiple source addresses.
  • the configuration register may be used to store an operation relationship for obtaining other source addresses other than the preset source address among the plurality of source addresses based on the preset source address.
  • the operation unit may generate a plurality of instructions for moving data of the multiple source addresses according to the operation relationship and the preset source address, so as to complete the moving of the target data.
  • the plurality of source addresses can be viewed as a cube.
  • the data of the multiple source addresses may form a data block.
  • the preset source address may be any address among the multiple source addresses. In some embodiments, the smallest address among multiple source addresses may be used as the preset source address. Other source addresses are obtained through the preset source address and the operation relationship.
  • the preset source address may be stored in the address register
  • the operation relationship may be stored in the configuration register
  • the memory access controller may, through the operation unit, generate data for the multiple source addresses. multiple instructions for moving data.
  • the source addresses included in the multiple instructions are calculated through the operation relationship and the preset source address.
  • the memory access controller may respectively execute the data moving process shown in any of the foregoing embodiments for the multiple instructions, so as to complete the moving of the target data (data block).
  • the memory access controller can split the one instruction into multiple instructions according to the address register, the configuration register and the operation unit, thereby The data transfer of multiple source addresses is completed, thereby improving the working performance of the instruction issuing unit.
  • the hardware circuit includes a plurality of shared memories, a plurality of external memories, and between the plurality of shared memories and the second controller, and between the plurality of external memories and the first controller Connected to the Crossbar.
  • the data move instruction is used to move data between the plurality of shared memories and the plurality of external memories.
  • the Crossbar connects each shared memory with each external memory through a crossbar switch matrix structure, so that each shared memory can perform data transfer with each external memory.
  • the storage controller (first controller) may be configured to generate a storage instruction according to the acquired second information and the read target data, and send the storage instruction to the destination memory (external) through the crossbar matrix. save) to complete the data transfer. Therefore, the data transfer method shown in this application can expand the point-to-point data transfer to the multi-point-to-multi-point data transfer, which improves the application scope of the data transfer method.
  • the memory access controller further includes a state feedback unit.
  • the state feedback unit is used for feeding back the current working state of the memory access controller.
  • the state feedback unit may be configured to send first state information to the processor to indicate that the hardware circuit is currently in a busy state in response to receiving a data moving instruction sent by the processor.
  • the state feedback unit may be configured to, in response to storing the target data to the destination memory (external memory) according to the storage instruction, send second state information to the processor to indicate that the hardware circuit is currently in an idle state .
  • the processor can grasp the working state of the memory access controller and the hardware circuit in real time, which is convenient for the processor to perform task scheduling efficiently.
  • FIG. 5 is a schematic structural diagram of a hardware circuit shown in this application.
  • the hardware circuit includes: a memory access controller; a plurality of shared memories, a plurality of external memories, a first memory and a second memory connected to the memory access controller; shared with the plurality of memories a first controller connected to a memory, the first memory and the plurality of external memories; a second controller connected to the plurality of external memories, the second memory and the plurality of shared memories; and at Crossbars are connected between the plurality of shared memories and the second controller, and between the plurality of external memories and the first controller.
  • FIG. 5 only shows the connection relationship between the memory access controller and shared memory 0 and external memory 0, and the connection relationship between the memory access controller and other shared memory and other external memory is similar.
  • the memory access controller includes a state feedback unit, a configuration register, an address register and an arithmetic unit.
  • the processor can generate a first type instruction by using the address 0-3 as the first information (source address) and the address 5 as the second information (destination address) through the instruction transmitting unit, and transmit the instruction to the memory access controller.
  • the memory access controller may feed back first state information indicating that the current state of the hardware circuit is busy to the processor through the state feedback unit.
  • the address 0 can also be stored in the address register as the preset source address, and the step size 1 can be stored in the configuration register as the operation relation. Then, the memory access controller can generate 4 instructions whose source address is address 0-address 3 and destination address is address 5 through the operation unit.
  • the memory access controller may execute the foregoing data moving method for the four instructions respectively, until all data in addresses 0-3 are moved to address 5. Thereby, the number of instructions generated by the instruction issuing unit can be reduced, and the working performance thereof can be improved.
  • the state feedback unit may further send second state information to the processor to indicate that the hardware circuit is currently in an idle state. Therefore, the processor can control the working state of the hardware circuit in real time, which is convenient for task scheduling.
  • the present application also proposes a data migration method.
  • the method can be applied to the hardware circuit shown in any of the foregoing embodiments.
  • FIG. 6 is a method flowchart of a data moving method shown in this application. As shown in FIG. 6, the method may include S602-S606.
  • the memory access controller in response to the received data moving instruction, acquires first information for reading data and second information for writing data included in the data moving instruction; and, according to the first information A read instruction is generated and sent to the source memory.
  • the source memory reads target data according to the first information included in the read instruction, and sends the target data to the storage controller.
  • the storage controller acquires the second information, and generates a storage instruction according to the acquired second information and the target data to store the target data in the destination memory.
  • the method can acquire first information for reading data and second information for writing data included in a data moving instruction, generate a read instruction by using the first information, read target data from a source memory, and then use the second information Generate a storage instruction with the read target data, and store the read target data to the destination memory to complete the data transfer.
  • the source memory indicates a memory for sending data.
  • shared memory For example, shared memory.
  • the destination memory indicates a memory for receiving data. For example, external storage.
  • this method can form point-to-point data transfer between the source memory and the destination memory, thereby releasing the thread register resources and the thread Crossbar between the thread and the thread register, thereby reducing the thread power consumption and improving the thread Crossbar performance;
  • the data transfer can be completed only by sending a data transfer instruction from the command transmitting unit, thereby improving the data transfer efficiency.
  • the hardware circuit further includes an information memory; the information memory is connected to the memory access controller and the memory controller.
  • the memory access controller acquires the first information for reading data and the second information for writing data included in the data moving instruction, and stores the second information. storing to the information memory to obtain a storage address for storing the second information; and generating a read instruction according to the storage address and the first information, and sending the read instruction to the source memory.
  • the source memory reads target data according to the first information included in the read instruction, and sends the read target data and the storage address included in the read instruction to the storage controller.
  • the storage controller acquires the second information from the information memory according to the storage address, and generates a storage instruction according to the acquired second information and the read target data to store the target data to the destination memory.
  • the second information can be stored in the information memory, thereby releasing the storage space of the storage controller and improving the performance of the storage controller.
  • the present application also proposes a data migration method.
  • the method can be applied to the hardware circuit shown in any of the foregoing embodiments.
  • the method may include S702 ⁇ S706.
  • the memory access controller acquires, in response to the received data moving instruction, first information for reading data and second information for writing data included in the data moving instruction, and, according to the first information A read instruction is generated and sent to the source memory.
  • the source memory reads target data according to the first information included in the read instruction, and sends the read target data to the storage controller.
  • the storage controller acquires the second information, and generates a storage instruction according to the acquired second information and the read target data to store the target data in the destination memory.
  • the data move instruction may be a first type of data move instruction or a second type of data move instruction.
  • the source memory in response to the first type of data movement instruction, is the shared memory, the destination memory is the external memory, and the storage controller is the first controller.
  • the source memory in response to a second type of data move instruction, is the external memory, the destination memory is the shared memory, and the storage controller is the second controller.
  • the method can implement the data moving scheme disclosed in the foregoing embodiment through different devices in the hardware circuit for the first type of data moving instruction and the second type of data moving instruction, so that the two types of data moving instructions can be processed asynchronously, Improve data transfer efficiency.
  • the hardware circuit further includes a first memory and a second memory; the first memory is connected to the access controller and the first controller; the second memory is connected to The memory access controller is connected to the second controller.
  • the memory access controller obtains the first information for reading data and the second information for writing data included in the data moving instruction, and, according to the first generating a read instruction from the information, and sending the read instruction to the source memory, including: the memory access controller, in response to the received data moving instruction, acquires first information for reading data included in the data moving instruction and second information for writing data, and storing the second information in an information memory to obtain a storage address for storing the second information; and generating a read instruction according to the storage address and the first information , and send the read instruction to the source memory.
  • the source memory reads data according to the first information included in the read instruction, and sends the read data to the storage controller, including: the source memory reads data according to the first information included in the read instruction target data, and send the read target data and the storage address included in the read instruction to the storage controller.
  • the storage controller generates a storage instruction according to the second information and according to the acquired second information and the read data to store the data in the destination memory, including: the storage controller according to the storage address, acquire the second information from the information memory, and generate a storage instruction according to the acquired second information and the read target data to store the target data in the destination memory.
  • the source memory is the shared memory
  • the destination memory is the external memory
  • the information memory is the first memory
  • the storage controller is the first controller
  • the The source memory is the external memory
  • the destination memory is the shared memory
  • the information memory is the second memory
  • the storage controller is the second controller
  • the data movement instruction includes an instruction initiated for each thread in a thread group; the information memory includes a number of storage units.
  • the memory access controller acquires the first information for reading data and the second information for writing data included in the data moving instruction, and stores the second information. storing to the information memory, and obtaining a storage address for storing the second information, comprising: the memory access controller acquiring the first information and the first information corresponding to the threads according to the data moving instruction second information; and sending the second information corresponding to each thread to the information storage.
  • the method may further include: the information storage determining a target storage unit that does not store data in the plurality of storage units, storing the second information corresponding to each thread in the target storage unit, and storing the data in the target storage unit.
  • the memory address of the target memory unit is sent to the memory access controller so that the memory access controller obtains the memory address.
  • the target storage unit includes storage spaces corresponding to the respective threads.
  • the information storage determines a target storage unit that does not store data in the plurality of storage units, and stores the second information corresponding to each thread in the target storage unit, including: the information storage stores each thread separately. The corresponding second information is stored in the target storage unit, the storage space corresponding to each thread.
  • the generating, by the memory access controller, a read instruction according to the storage address and the first information includes: the memory access controller is represented by the first information corresponding to each thread, respectively. source address, to determine a number of first threads with address conflicts among the threads; combine the first information corresponding to the several first threads with the storage address to obtain the first information corresponding to the several first threads respectively A read command.
  • the memory access controller generates a read instruction according to the storage address and the first information, including: the memory access controller assigns, by the memory access controller, an address conflict that does not occur in each thread.
  • the first information respectively corresponding to the plurality of second threads is combined with the storage address to obtain second read instructions corresponding to the plurality of second threads.
  • the source memory reads target data according to the first information included in the read instruction, and sends the read target data and the storage address included in the read instruction to
  • the storage controller includes: the source memory, in response to the first read instruction, acquires the first information included in the first read instruction, and reads the data with the source address represented by the acquired first information according to the source address.
  • the thread ID is sent to the storage controller; in response to the second read instruction, the first information included in the second read instruction and corresponding to each second thread is obtained, and according to the respective second thread
  • the thread ID corresponding to the two data is sent to the storage controller.
  • the target storage unit further includes an access identifier corresponding to each thread; the access identifier indicates whether the data in the storage space corresponding to the thread is accessed.
  • acquiring, by the storage controller, the second information from the information storage according to the storage address includes: the storage controller acquiring, according to the storage address, the information of the target thread from the information storage. corresponding second information.
  • the method may further include: in response to the storage controller acquiring the second information from the information storage according to the storage address, the information storage sets the access identifier corresponding to the target thread to the first an identifier; wherein, the first identifier indicates that the data in the storage space corresponding to the target thread has been accessed.
  • the method further includes: determining, by the information storage unit, whether the access identifiers included in the target storage unit and corresponding to the respective threads are all the first identifiers, and if so, releasing the target storage unit data stored in the unit.
  • the access controller includes a configuration register, an address register, and an arithmetic unit.
  • the method may further include: in response to the data moving instruction indicating that the target data to be moved includes data of multiple source addresses, the address register stores a preset source address in the multiple source addresses; the The configuration register stores the operation relationship used to obtain other source addresses other than the preset source address among the plurality of source addresses based on the preset source address; the operation unit is based on the operation relationship and the preset source address.
  • a source address is set, and multiple instructions for moving data at the multiple source addresses are generated to complete the moving of the target data.
  • the hardware circuit includes a plurality of shared memories, and a plurality of external memories, and between the plurality of shared memories and the second controller and the plurality of external memories and the first A crossbar switch matrix is connected between the controllers; the data transfer instruction is used for data transfer between the plurality of shared memories and the plurality of external memories.
  • the storage controller generates a storage instruction according to the acquired second information and the read data to store the data in the destination memory, including: the storage controller is based on the acquired second information and the read data.
  • the read target data generates a storage instruction, and the storage instruction is sent to the destination memory through the crossbar matrix to complete the data movement.
  • the access controller further includes a state feedback unit.
  • the method may further include, the state feedback unit: in response to receiving the data moving instruction sent by the processor, sending first state information to the processor to indicate that the hardware circuit is currently in a busy state; responding to After storing the target data in the destination memory according to the storage instruction, second state information is sent to the processor to indicate that the hardware circuit is currently in an idle state.
  • the first memory and/or the second memory includes a read-write-supported buffer.
  • the present application also proposes a chip.
  • the chip may include the hardware circuits shown in any of the foregoing embodiments.
  • the data movement method shown in any of the foregoing embodiments can be implemented by using the hardware circuit inside the chip, so that point-to-point data movement between the source memory and the destination memory can be formed on the one hand, thereby releasing the thread register. resources, and thread Crossbar between threads and thread registers, thereby reducing thread power consumption and improving thread Crossbar performance; This improves chip performance.
  • the present application further provides an electronic device, including the hardware circuit shown in any of the embodiments or the aforementioned chip.
  • the electronic device may be a smart terminal such as a mobile phone, or may be other devices having a camera and capable of image processing.
  • the data transfer method shown in any of the foregoing embodiments can be implemented through a chip or hardware circuit inside the device, so that on the one hand, point-to-point data between the source memory and the destination memory can be formed. Move, thereby releasing thread register resources and thread crossbar between threads and thread registers, thereby reducing thread power consumption and improving thread crossbar performance; In this way, the efficiency of data movement is improved, thereby improving the performance of the device.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by the controller, implements the data moving method shown in any of the foregoing embodiments.
  • one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may take the form of a computer program product implemented on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.
  • Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuits, in tangible embodiment of computer software or firmware, in computer hardware including the structures disclosed in this application and their structural equivalents, or in a combination of one or more.
  • Embodiments of the subject matter described in this application may be implemented as one or more computer programs, ie, one or more of computer program instructions encoded on a tangible non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for interpretation by the data.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of these.
  • the processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, eg, an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
  • Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing system.
  • a central processing system will receive instructions and data from read-only memory and/or random access memory.
  • the basic components of a computer include a central processing system for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic, magneto-optical or optical disks, to receive data therefrom or to It transmits data, or both.
  • the computer does not have to have such a device.
  • the computer may be embedded in another device, such as a mobile phone, personal digital assistant (PDA), mobile audio or video player, game console, global positioning system (GPS) receiver, or a universal serial bus (USB) ) flash drives for portable storage devices, to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storage of computer program instructions and data include all forms of non-volatile memory, media, and memory devices including, for example, semiconductor memory devices (eg, EPROM, EEPROM, and flash memory devices), magnetic disks (eg, internal hard disks or memory devices). removable disks), magneto-optical disks, and 0xCD_00ROM and DVD-ROM disks.
  • semiconductor memory devices eg, EPROM, EEPROM, and flash memory devices
  • magnetic disks eg, internal hard disks or memory devices. removable disks
  • magneto-optical disks e.g, magneto-optical disks
  • 0xCD_00ROM and DVD-ROM disks 0xCD_00ROM and DVD-ROM disks.
  • the processor and memory may be supplemented by or incorporated in special purpose logic circuitry.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种硬件电路、数据搬移方法、芯片和电子设备。该硬件电路可以包括:访存控制器(101);与所述访存控制器(101)连接的源存储器(102)、目的存储器(103);以及与所述源存储器(102)、所述目的存储器(103)、所述访存控制器(101)连接的存储控制器(104)。所述访存控制器(101)用于响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息;以及,根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器(102)。所述源存储器(102)用于根据所述读取指令包括的第一信息读取目标数据,并将目标数据发送至所述存储控制器(104)。所述存储控制器(104)用于获取所述第二信息,并根据获取的所述第二信息与目标数据生成存储指令以将所述目标数据存储至所述目的存储器(103)。

Description

一种硬件电路、数据搬移方法、芯片和电子设备
相关公开的交叉引用
本公开要求于2021年4月29日提交的、申请号为202110476825.0的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。
技术领域
本申请涉及计算机技术,具体涉及一种硬件电路、数据搬移方法、芯片和电子设备。
背景技术
随着人工智能和高性能计算的不断发展,计算系统所需要处理的数据量日趋庞大。在计算过程中,受限于计算系统的内存空间大小,大量的原始数据和运算中间结果需要频繁的在共享内存与外存间搬运。当从共享内存向外存搬移数据时,可以通过计算系统的指令发射单元向线程发送读取指令以将需要搬移的数据从共享内存读取到线程寄存器,然后再发送存储指令以将线程寄存器中的数据存储至外存中。从外存向共享内存搬移数据的过程可以参照前述。
不难发现,在上述数据搬移的过程中,一方面,需要占用线程寄存器资源,并需要频繁的占用线程与线程寄存器之间用于进行数据搬移的线程Crossbar(基于交叉开关矩阵的线程多到多选择器),从而造成线程寄存器与线程Crossbar的性能下降;另一方面,指令发射单元需要发送读取指令和存储指令两条指令,导致指令发射单元的性能下降。
发明内容
有鉴于此,本申请第一方面公开一种硬件电路。该硬件电路包括:访存控制器;与所述访存控制器连接的源存储器、目的存储器;以及与所述源存储器、所述目的存储器、所述访存控制器连接的存储控制器;所述源存储器指示发送数据的存储器;所述目的存储器指示接收数据的存储器;所述访存控制器,用于响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息;以及,根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器;所述源存储器,用于根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据发送至所述存储控制器;所述存储控制器,用于获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
在示出的一些实施例中,所述硬件电路还包括:信息存储器;所述信息存储器与所述访存控制器和所述存储控制器连接;所述访存控制器,用于将所述第二信息存储至所述信息存储器,得到存储所述第二信息的存储地址;以及,根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器;所述源存储器,用于根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据与所述读取指令包括的所述存储地址发送至所述存储控制器;所述存储控制器,用于根据所述存储地址,从所述信息存储器中获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
本申请第二方面还提出一种硬件电路,包括:访存控制器;与所述访存控制器连接的共享内存、外存;与所述共享内存、所述访存控制器和所述外存连接的第一控制器;以及与所述外存、所述访存控制器和所述共享内存连接的第二控制器;所述访存控制器,用于响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,以及,根据所述第一信息生成读取指令,并将所述读取指令发送至源存储器;所述源存储器,用于根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据发送至存储控制器;所述存储控制器,用于获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至目的存储器;其中,所述源存储器为所述共享内存,所述目的存储器为所述外存,所述存储控制器为所述第一控制器;或所述源存储器为所述外存,所述目的存储器为所述共享内存,所述存储控制器为所述第二控制器。
在示出的一些实施例中,所述硬件电路还包括:第一存储器和第二存储器;所述第一存储器与所述访存控制器和所述第一控制器连接;所述第二存储器与所述访存控制器和所述第二控制器连接;所述访存控制器,用于将所述第二信息存储至信息存储器,得到存储所述第二信息的存储地址;以及,根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至源存储器;所述源存储器,用于根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据与所述读取指令包括的所述存储地址发送至存储控制器;所述存储控制器,用于根据所述存储地址,从所述信息存储器中获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至目的存储器;其中,所述源存储器为所述共享内存,所述目的存储器为所述外存,所述信息存储器为所述第一存储器,所述存储控制器为所述第一控制器;或所述源存储器为所述外存,所述目的存储器为所述共享内存,所述信息存储器为所述第二存储器,所述存储控制器为所述第二控制器。
在示出的一些实施例中,所述数据搬移指令包括针对线程组内各线程发起的指令;所述信息存储器包括若干存储单元;所述访存控制器,用于根据所述数据搬移指令,获取与所述各线程分别对应的所述第一信息与所述第二信息;以及,将所述各线程分别对应的第二信息发送至所述信息存储器;所述信息存储器,用于确定所述若干存储单元中未存储数据的目标存储单元,并将各线程分别对应的第二信息存储至所述目标存储单元,以及将所述目标存储单元的存储地址发送至所述访存控制器以使所述访存控制器得到所述存储地址。
在示出的一些实施例中,所述目标存储单元包括与所述各线程分别对应的存储空间;所述信息存储器,用于将各线程分别对应的第二信息存储至与所述目标存储单元中,与所述各线程分别对应的存储空间。
在示出的一些实施例中,所述访存控制器,用于根据各线程分别对应的第一信息表征的源地址,确定所述各线程中发生地址冲突的若干第一线程;将所述若干第一线程分别对应的第一信息,与所述存储地址进行组合,得到与若干第一线程分别对应的第一读取指令。
在示出的一些实施例中,所述访存控制器,还用于将所述各线程中未发生地址冲突的若干第二线程分别对应的第一信息,与所述存储地址组合,得到与所述若干第二线程对应的第二读取指令。
在示出的一些实施例中,所述源存储器,用于响应于所述第一读取指令,获取第一读取指令包括的第一信息,并根据获取的第一信息表征的源地址,读取与所述第一读取指令对应的第一线程所对应的第一数据;将读取的所述第一数据,所述第一读取指令包括的所述存储地址,以及与所述第一数据对应的线程ID发送至所述存储控制器;响应于所述第二读取指令,获取所述第二读取指令包括的与各第二线程分别对应的第一信息,并根据各第二线程分别对应的第一信息所表征的源地址,读取与各第二线程分别对应的第二数据;将读取的各第二数据,所述第二读取指令包括的存储地址,以及与各第二数据对应的线程ID发送至所述存储控制器。
在示出的一些实施例中,所述目标存储单元还配置有与所述各线程分别对应的访问标识;所述访问标识指示与所述线程对应的存储空间中的数据是否被访问;所述存储控制器,用于根据所述存储地址,从所述信息存储器中获取目标线程所对应的第二信息;所述信息存储器,用于响应于所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,将目标线程所对应的访问标识置为第一标识;其中,所述第一标识表征与所述目标线程对应的存储空间中的数据已被访问。
在示出的一些实施例中,所述信息存储器,用于确定目标存储单元包括的与所述各线程分别对应的访问标识是否均为所述第一标识,如果是,释放目标存储单元存储的数据。
在示出的一些实施例中,所述访存控制器包括配置寄存器,地址寄存器以及运算单元;所述地址寄存器,用于响应于所述数据搬移指令指示搬移的目标数据包括多个源地址的数据,存储所述多个源地址中的预设源地址;所述配置寄存器,用于存储用以基于所述预设源地址得到所述多个源地址中,除去所述预设源地址以外的其它源地址的运算关系;所述运算单元,用于根据所述运算关系与所述预设源地址,生成针对所述多个源地址的数据进行搬移的多个指令以完成所述目标数据的搬移。
在示出的一些实施例中,所述硬件电路包括多个共享内存,与多个外存,以及在所述多个共享内存与第二控制器之间和所述多个外存与第一控制器之间连接了交叉开关矩阵;所述数据搬移指令用于在所述多个共享内存与所述多个外存之间进行数据搬移;所述存储控制器,用于根据获取的所述第二信息与读取的目标数据生成存储指令,并通过所述交叉开关矩阵将所述存储指令发送至目的存储器以完成数据搬移。
在示出的一些实施例中,所述访存控制器还包括状态反馈单元;所述状态反馈单元,用于响应于接收到处理器发送的数据搬移指令,向所述处理器发送第一状态信息以指示所述硬件电路当前为繁忙状态;响应于根据存储指令将所述目标数据存储至所述目的存储器,向所述处理器发送第二状态信息以指示所述硬件电路当前为空闲状态。
在示出的一些实施例中,所述第一存储器和/或所述第二存储器包括支持一读一写的缓存器。
本申请第三方面还提出一种数据搬移方法,应用于如前述任一实施例示出的硬件电路,所述方法包括:访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息;以及,根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器;源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据发送至所述存储控制器;存储控制器获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
在示出的一些实施例中,所述硬件电路还包括:信息存储器;所述信息存储器与所述访存控制器和所述存储控制器连接。相应地,所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息;以及,根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器,包括:所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,并将所述第二信息存储至所述信息存储器,得到存储所述第二信息的存储地址;以及,根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器。所述源存储器根据所述读取指令包括的第一信息读取数据,并将读取的数据发送至所述存储控制器,包括:所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据与所述读取指令包括的所述存储地址发送至所述存储控制器。所述存储控制器获取所述第二信息,并根据获取的所述第二信息与读取的数据生成存储指令以将所述数据存储至所述目的存储器,包括:所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
本申请第四方面还提出一种数据搬移方法,应用于如前述任一实施例示出的硬件电路,所述方法包括:所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,以及,根据所述第一信息生成读取指令,并将所述读取指令发送至源存储器;所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据发送至存储控制器;所述存储控制器获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至目的存储器。
在示出的一些实施例中,所述硬件电路还包括:第一存储器和第二存储器;所述第一存储器与所述访存控制器和所述第一控制器连接;所述第二存储器与所述访存控制器和所述第二控制器连接。相应地,所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,以及,根据所述第一信息生成读取指令,并将所述读取指令发送至源存储器,包括:所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,并将所述第二信息存储至信息存储器,得到存储所述第二信息的存储地址;以及,根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至源存储器。所述源存储器根据所述读取指令包括的第一信息读取数据,并将读取的数据发送至存储控制器,包括:所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据与所述读取指令包括的所述存储地址发送至存储控制器。所述存储控制器根据所述第二信息,并根据获取的所述第二信息与读取的数据生成存储指令以将所述数据存储至目的存储器,包括:所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至目的存储器。其中,所述源存储器为所述共享内存,所述目的存储器为所述外存,所述信息存储器为所述第一存储器,所述存储控制器为所述第一控制器;或所述源存储器为所述外存,所述目的存储器为所述共享内存,所述信息存储器为所述第二存储器,所述存储控制器为所述第二控制器。
在示出的一些实施例中,所述数据搬移指令包括针对线程组内各线程发起的指令;所述信息存储器包括若干存储单元。相应地,所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,并将所述第二信息存储至所述信息存储器,得到存储所述第二信息的存储地址,包括:所述访存控制器根据所述数据搬移指令,获取与所述各线程分别对应的所述第一信息与所述第二信息;以及,将所述各线程分别对应的第二信息发送至所述信息存储器。并且,所述方法还可包括:所述信息存储器确定所述若干存储单元中未存储数据的目标存储单元,并将各线程分别对应的第二信息存储至所述目标存储单元,以及将所述目标存储单元的存储地址发送至所述访存控制器以使所述访存控制器得到所述存储地址。
在示出的一些实施例中,所述目标存储单元包括与所述各线程分别对应的存储空间。相应地,所述信息存储器确定所述若干存储单元中未存储数据的目标存储单元,并将各线程分别对应的第二信息存储至所述目标存储单元,包括:所述信息存储器将各线程分别对应的第二信息存储至所述目标存储单元中,与所述各线程分别对应的存储空间。
在示出的一些实施例中,所述访存控制器根据所述存储地址与所述第一信息生成读取指令,包括:所述访存控制器根据各线程分别对应的第一信息表征的源地址,确定所述各线程中发生地址冲突的若干第一线程;将所述若干第一线程分别对应的第一信息,与所述存储地址进行组合,得到与若干第一线程分别对应的第一读取指令。
在示出的一些实施例中,所述访存控制器根据所述存储地址与所述第一信息生成读取指令,包括:所述访存控制器将所述各线程中未发生地址冲突的若干第二线程分别对应的第一信息,与所述存储地址组合,得到与所述若干第二线程对应的第二读取指令。
在示出的一些实施例中,所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据与所述读取指令包括的所述存储地址发送至所述存储控制器,包括:所述源存储器响应于所述第一读取指令,获取第一读取指令包括的第一信息,并根据获取的第一信息表征的源地址,读取与所述第一读取指令对应的第一线程所对应的第一数据;将读取的所述第一数据,所述第一读取指令包括的所述存储地址,以及与所述第一数据对应的线程ID发送至所述存储控制器;响应于所述第二读取指令,获取所述第二读取指令包括的与各第二线程分别对应的第一信息,并根据各第二线程分别对应的第一信息所表征的源地址,读取与各第二线程分别对应的第二数据;将读取的各第二 数据,所述第二读取指令包括的存储地址,以及与各第二数据对应的线程ID发送至所述存储控制器。
在示出的一些实施例中,所述目标存储单元还包括与所述各线程分别对应的访问标识;所述访问标识指示与所述线程对应的存储空间中的数据是否被访问。相应地,所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,包括:所述存储控制器根据所述存储地址,从所述信息存储器中获取目标线程所对应的第二信息。此外,所述方法还可包括:所述信息存储器响应于所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,将目标线程所对应的访问标识置为第一标识;其中,所述第一标识表征与所述目标线程对应的存储空间中的数据已被访问。
在示出的一些实施例中,所述方法还包括:所述信息存储器确定目标存储单元包括的与所述各线程分别对应的访问标识是否均为所述第一标识,如果是,释放目标存储单元存储的数据。
在示出的一些实施例中,所述访存控制器包括配置寄存器,地址寄存器以及运算单元。相应地,所述方法还可包括:所述地址寄存器响应于所述数据搬移指令指示搬移的目标数据包括多个源地址的数据,存储所述多个源地址中的预设源地址;所述配置寄存器存储用以基于所述预设源地址得到所述多个源地址中,除去所述预设源地址以外的其它源地址的运算关系;所述运算单元根据所述运算关系与所述预设源地址,生成针对所述多个源地址的数据进行搬移的多个指令以完成所述目标数据的搬移。
在示出的一些实施例中,所述硬件电路包括多个共享内存,与多个外存,以及在所述多个共享内存与第二控制器之间和所述多个外存与第一控制器之间连接了交叉开关矩阵;所述数据搬移指令用于在所述多个共享内存与所述多个外存之间进行数据搬移。相应地,所述存储控制器根据获取的所述第二信息与读取的数据生成存储指令以将所述数据存储至目的存储器,包括:所述存储控制器根据获取的所述第二信息与读取的目标数据生成存储指令,并通过所述交叉开关矩阵将所述存储指令发送至目的存储器以完成数据搬移。
在示出的一些实施例中,所述访存控制器还包括状态反馈单元。相应地,所述方法还可包括:所述状态反馈单元:响应于接收到处理器发送的数据搬移指令,向所述处理器发送第一状态信息以指示所述硬件电路当前为繁忙状态;响应于根据存储指令将所述目标数据存储至所述目的存储器,向所述处理器发送第二状态信息以指示所述硬件电路当前为空闲状态。
在示出的一些实施例中,所述第一存储器和/或所述第二存储器包括支持一读一写的缓存器。
本申请第五方面还提出一种芯片,包括如前述任一实施例示出的硬件电路。
本申请第六方面还提出一种电子设备,包括如前述任一实施例示出的硬件电路或前述任一实施例示出的芯片。
本申请第七方面还提出一种计算机可读存储介质,其上存储有计算机程序,所述程序被控制器执行时实现如前述任一实施例示出的数据搬移方法。
所述发明内容至少具有以下效果:
在本申请第一方面、第三方面、第五方面与第七方面记载的方案中,一方面可以形成源存储器与目的存储器之间点对点的数据搬移,从而释放线程寄存器资源,以及线程与线程寄存器之间的线程Crossbar(基于交叉开关矩阵的线程多对多选择器),进而降低线程功耗,提升线程Crossbar性能;另一方面,仅需要指令发射单元发送一条数据搬移指令即可完成数据搬移,从而提升数据搬移效率。
在本申请第二方面与第四方面记载的方案中,第一,可以形成源存储器(共享内存)与目的存储器(外存)之间点对点的数据搬移,从而释放线程寄存器资源,以及线程与线程寄存器之间的线程Crossbar,进而降低线程功耗,提升线程Crossbar性能;第二, 仅需要指令发射单元发送一条数据搬移指令即可完成数据搬移,从而提升数据搬移效率;第三,针对不同类型的数据搬移指令,分别通过硬件电路中不同的器件完成前述实施例公开的数据搬移方案,从而可以异步处理两种类型的数据搬移指令,提升数据搬移效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
为了更清楚地说明本申请一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请示出的一种硬件电路的结构示意图;
图2为本申请示出的一种硬件电路的结构示意图;
图3为本申请示出的一种硬件电路的结构示意图;
图4为本申请示出的一种硬件电路的结构示意图;
图5为本申请示出的一种硬件电路结构示意图;
图6为本申请示出的一种数据搬移方法的方法流程图。
具体实施方式
下面将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的设备和方法的例子。
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。还应当理解,本文中所使用的词语“如果”,取决于语境,可以被解释成为“在……时”或“当……时”或“响应于确定”。
有鉴于相关技术中存在的技术问题,本申请提出一种硬件电路(以下简称电路)。该电路中的各器件可以相互配合执行数据搬移方法。所述硬件电路可以是芯片或集成电路板中的一部分。
请参见图1,图1为本申请示出的一种硬件电路的结构示意图。所述硬件电路包括的存储器之间可以利用本申请公开的方法进行数据搬移。需要说明的是,图1示出的硬件电路结构仅为示意性说明,在实际情形中可以存在其他结构。本申请不对硬件电路的结构进行特别限定。
如图1所示,所述硬件电路可以包括访存控制器101;与所述访存控制器101连接的源存储器102、目的存储器103;以及与所述访存控制器101、所述源存储器102、所述目的存储器103连接的存储控制器104。
所述源存储器102指示发送数据的存储器;所述目的存储器103指示接收数据的存储器。在不同的场景中,所述源存储器102与所述目的存储器103可以指不同的存储器。例如,响应于将数据从共享内存搬移至外存,所述源存储器102可以是共享内存,所述目的存储器103可以是外存。再例如,响应于数据从外存搬移至共享内存,所述源存储器102可以是外存,所述目的存储器103可以是共享内存。
所述访存控制器101,可以通过开设接口,接收数据搬移指令,然后可以用于响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息。
所述数据搬移指令可以是响应于数据搬移需求,由指令发射单元针对线程发起的指令。所述数据搬移指令中可以包括用于读数据的第一信息和用于写数据的第二信息。
所述第一信息,可以用于从源存储器102中读数据。在一些实施例中,所述第一信息可以包括源地址等信息。通过所述源地址即可从源存储器102中读取目标数据。
所述第二信息,可以用于将读取的目标数据存储至目的存储器103。在一些实施例中,所述第二信息可以包括目的地址等信息。通过所述目的地址即可将读取到的目标数据存储至目的存储器103。
在一些实施例中,所述访存控制器101可以对接收到的数据搬移指令进行解析,得到所述第一信息与所述第二信息。
之后,所述访存控制器101可以用于根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器102。在一些实施例中,所述访存控制器101可以通过内置的指令编辑单元,根据所述第一信息,生成读取指令,并将该指令发送至源存储器102。
之后,所述源存储器102可以用于根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据发送至所述存储控制器104。在一些实施例中,所述源存储器102可以根据所述读取指令中包括的第一信息所表征的源地址进行寻址,然后读取出该源地址中存储的数据。之后可以根据所述读取指令包括的存储控制器地址(所述存储控制器的访问地址),将读取出的数据发送至所述存储控制器104。
之后,所述存储控制器104可以用于获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至所述目的存储器103。在一些实施例中,所述访存控制器101可以预先将所述第二信息发送至所述存储控制器104进行存储。所述存储控制器104可以从存储的至少一个第二信息中获取与目标数据对应的第二信息,即用于写目标数据的第二信息,然后可以通过指令编辑单元,根据获取的第二信息与目标数据,生成存储指令,并将所述存储指令发送至目的存储器103。所述目的存储器103可以响应于所述存储指令包括的第二信息表征的目的地址,将目标数据存储至所述目的地址对应的存储空间。
在所述硬件电路中,访存控制器可以响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,以及根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器。所述源存储器根据所述读取指令包括的第一信息读取数据,并将读取的目标数据发送至所述存储控制器。所述存储控制器可以获取所述第二信息,并根据获取的所述第二信息与目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
可见在所述硬件电路中,一方面可以形成源存储器与目的存储器之间点对点的数据搬移,从而释放线程寄存器资源,以及线程与线程寄存器之间的线程Crossbar(基于交叉开关矩阵的线程多到多选择器),进而降低线程功耗,提升线程Crossbar性能;另一方面,仅需要指令发射单元发送一条数据搬移指令即可完成数据搬移,从而提升指令发射单元性能。
在一些实施例中,所述硬件电路可以包括用于存储所述第二信息的信息存储器,由此可以释放存储控制器的存储空间,提升存储控制器的性能。
请参见图2,图2为本申请示出的一种硬件电路的结构示意图。所述硬件电路包括的存储器之间可以利用本申请公开的方法进行数据搬移。需要说明的是,图2示出的硬件电路结构仅为示意性说明,在实际情形中可以存在其他结构。本申请不对硬件电路的结构进行特别限定。
如图2所示,所述硬件电路可以包括访存控制器201;与所述访存控制器201连接的源存储器202、目的存储器203、信息存储器204;以及与所述源存储器202、所述目的存储器203以及信息存储器204连接的存储控制器205。其中,所述存储控制器205通过所述信息存储器204与所述访存控制器201连接。
所述访存控制器201在获取数据搬移指令中的第一信息与第二信息后,可以将所述第二信息存储至所述信息存储器204,得到存储所述第二信息的存储地址。
所述信息存储器204中可以包括若干的存储单元。所述信息存储器204可以将接收到的第二信息存储至空的存储单元中,然后将此存储单元对应的存储地址返回给所述访存控制器201。
之后,所述访存控制器201可以根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器202。在一些实施例中,所述访存控制器201可以通过内置的指令编辑单元,根据所述第一信息,从信息存储器204处得到的所述存储地址以及所述存储控制器205的访问地址,生成读取指令,并将该指令发送至源存储器202,由此可以使源存储器202将读取的目标数据发送至存储控制器205。
所述源存储器202,用于根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据与所述读取指令包括的所述存储地址发送至所述存储控制器205。在一些实施例中,所述源存储器202可以根据所述读取指令中包括的第一信息所表征的源地址进行寻址,然后读取出该源地址中存储的数据,作为目标数据。之后可以将读取出的目标数据与所述读取指令中包括的存储地址发送至所述存储控制器205。
所述存储控制器205,用于根据所述存储地址,从所述信息存储器204中获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至所述目的存储器203。在一些实施例中,所述存储控制器205可以根据所述存储地址生成第二信息读取指令并发送至所述信息存储器204。所述信息存储器204可以响应于该读取指令,根据所述存储地址进行寻址,并将所述存储地址中存储的第二信息返回所述存储控制器205。然后所述存储控制器205可以通过指令编辑单元,根据获取的第二信息与读取的目标数据,生成存储指令,并将所述存储指令发送至目的存储器203。所述目的存储器203可以响应于所述存储指令包括的第二信息表征的目的地址,将读取的目标数据存储至所述目的地址对应的存储空间。
由此可以通过所述信息存储器作为中间存储器件存储第二信息,从而释放存储控制器的存储空间,提升存储控制器的性能。
在一些实施例中,所述信息存储器可以包括支持一读一写的缓存器、支持读写乱序的缓存器中的至少一种,从而可以使存储访存控制器解析出的第二信息的过程和存储控制器从所述信息存储器中读取第二信息的过程并行,提升数据搬移效率。
本申请还提出一种硬件电路。请参见图3,图3为本申请示出的一种硬件电路的结构示意图。
如图3所示,所述硬件电路可以包括:访存控制器301;与所述访存控制器301连接的共享内存302、外存303;与所述共享内存302、所述访存控制器301和所述外存303连接的第一控制器304;以及与所述外存303、所述访存控制器301和所述共享内存302连接的第二控制器305。
其中,共享内存302用于为线程、线程组、线程块或进程提供共享数据服务。本申请不限定共享内存的类型。在一些实施例中,所述共享内存302可以采用SRAM(Static Random-Access Memory,静态随机存取存储器)。所述共享内存302可以存储计算系统执行当前任务需要使用的数据。
所述外存303,是相对于共享内存302来讲的概念,用于存储任务结果数据。在所述外存303与所述共享内存302之间经常会发生数据搬移。本申请不限定所述外存303的类型。在一些实施例中,所述外存303可以是DDR SDRAM(Double Data Rate Synchronous Dynamic Random-Access Memory,双倍速率同步动态随机存储器)。
响应于数据搬移指令为第一类型指令,可以将数据从共享内存302搬移至外存303。此时,可以将共享内存302视为源存储器,将外存303视为目的存储器。
响应于数据搬移指令为第二类型指令,可以将数据从外存303搬移至共享内存302。此时,可以将外存303视为源存储器,将共享内存302视为目的存储器。
所述第一控制器304与所述第二控制器305的作用可以相当于前述存储控制器。响 应于数据搬移指令为第一类型指令,可以采用第一控制器304获取第二信息,并根据第二信息与从共享内存302中读取出的目标数据,生成存储指令以将目标数据存储至外存303。响应于数据搬移指令为第二类型指令,可以采用所述第二控制器305获取第二信息,并根据第二信息与从外存303中读取出的目标数据,生成存储指令以将目标数据存储至共享内存302。
所述数据搬移指令可以是响应于数据搬移需求,由指令发射单元针对线程或线程组(Warp)发起的指令。可以理解的是,针对线程组发起的指令,实际是针对线程组内各线程发起的指令。以下将针对线程组发起的指令称为线程组指令,针对线程发起的指令称为线程指令。通过发起线程组指令的方式可以将各线程指令中共用的信息提取出来作为共享信息,从而与针对各线程单独发送指令的方式相比,可以简化指令格式,一方面提升指令发射单元性能,另一方面,提升指令处理效率,从而提升数据搬移效率。
以下以所述数据搬移指令为针对线程组中各线程发起的指令为例进行说明。
所述数据搬移指令的类型可以包括第一类型指令与第二类型指令。以下以所述数据搬移指令为第一类型指令为例进行说明。可以理解的是,针对第二类型指令的数据搬移方法可以参照针对第一类型指令的数据搬移过程,在此不作详述。
响应于所述数据搬移指令为第一类型指令,可以将共享内存302视为源存储器,将外存303视为目的存储器,将第一控制器304视为存储控制器。
所述访存控制器301,用于响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,以及,根据所述第一信息生成读取指令,并将所述读取指令发送至源存储器。
所述源存储器,用于根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据发送至存储控制器。
所述存储控制器,用于获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至目的存储器。
当所述数据搬移指令为第二类型指令时,所述源存储器为所述外存303,所述目的存储器为所述共享内存302,所述存储控制器为所述第二控制器305。
在所述硬件电路中,第一,可以形成源存储器(共享内存)与目的存储器(外存)之间点对点的数据搬移,从而释放线程寄存器资源,以及线程与线程寄存器之间的线程Crossbar,进而降低线程功耗,提升线程Crossbar性能;第二,仅需要指令发射单元发送一条数据搬移指令即可完成数据搬移,从而提升指令发射单元性能;第三,针对第一类型的数据搬移指令与第二类型的数据搬移指令,分别通过硬件电路中不同的器件完成前述实施例公开的数据搬移过程,从而可以异步处理两种类型的数据搬移指令,提升数据搬移效率。
在一些实施例中,所述硬件电路可以包括用于存储所述第二信息的信息存储器,由此可以释放存储控制器(第一控制器和第二控制器)的存储空间,提升存储控制器的性能。
请参见图4,图4为本申请示出的一种硬件电路的结构示意图。
如图4所示,所述硬件电路可以包括:访存控制器401;与所述访存控制器401连接的共享内存402、外存403、第一存储器404和第二存储器405;与所述共享内存402、所述第一存储器404和所述外存403连接的第一控制器406;以及与所述外存403、所述第二存储器405和所述共享内存402连接的第二控制器407。
所述第一存储器404与所述第二存储器405的作用可以相当于前述信息存储器。响应于数据搬移指令为第一类型指令,可以采用所述第一存储器404存储第二信息。响应于数据搬移指令为第二类型指令,可以采用所述第二存储器405存储第二信息。
以所述数据搬移指令为第一类型指令为例,可以将共享内存402视为源存储器,将外存403视为目的存储器,将第一存储器404视为信息存储器,将第一控制器406视为 存储控制器。
所述访存控制器401可以通过设置的接口,接收所述数据搬移指令,然后可以用于响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,以及将所述第二信息存储至信息存储器(第一存储器404),得到存储所述第二信息的存储地址。
在一些实施例中,所述数据搬移指令包括针对线程组内各线程发起的指令,所述访存控制器401可以用于根据所述数据搬移指令,获取与所述各线程分别对应的所述第一信息与所述第二信息。
在一些实施例中,可以对所述数据搬移指令进行拆分,得到针对线程组内各线程的线程指令,然后可以分别对各线程指令进行解析,得到与各线程对应的第一信息与第二信息。由此即实现对线程组指令包括的各线程指令进行异步处理,提升数据搬移效率。
然后所述访存控制器401,可以用于将所述各线程分别对应的第二信息发送至所述信息存储器(第一存储器404)。在一些实施例中,所述访存控制器401可以分别或者整体的将各线程的第二信息发送至信息存储器(第一存储器404)。
在一些实施例中,所述信息存储器(第一存储器404)可以包括若干存储单元;各存储单元可以与各存储地址具有一一对应的关系,例如每个存储单元具有一个对应的存储地址。
所述信息存储器(第一存储器404)在接收到第二信息后,可以用于确定所述若干存储单元中未存储数据的目标存储单元,并将各线程分别对应的第二信息存储至所述目标存储单元,以及将所述目标存储单元的存储地址发送至所述访存控制器401以使所述访存控制器401得到存储所述第二信息的存储地址。
在一些实施例中,可以将同一线程组内的各线程对应的第二信息存储在同一存储单元,这样,可以使各线程共用同一存储地址,从而简化对各线程指令进行异步处理时读取指令与存储指令的生成难度,提升数据搬移效率。
在一些实施例中,所述目标存储单元包括与所述各线程分别对应的存储空间。所述信息存储器(第一存储器404),可以用于将各线程分别对应的第二信息存储至所述目标存储单元中,与所述各线程分别对应的存储空间。
在一些实施例中,线程组内的各线程具有唯一线程ID(Identifier,标识),所述信息控制器(第一存储器404)中可以维护线程的线程ID与各存储空间之间的对应关系。在接收到某一线程对应的第二信息后,可以根据该线程的线程ID将第二信息存储至对应的存储空间。由此可以便于后续读取各线程对应的第二信息。
在一些实施例中,所述目标存储单元还包括与所述各线程分别对应的访问标识。所述访问标识可以指示与所述线程对应的存储空间中的数据是否被访问或者是否存储了数据。在一些实施例中,可以通过第一标识指示存储空间中的数据被访问,或者存储空间内没有数据。可以通过第二标识指示存储空间中存储了未被访问过的数据。
在将某一线程对应的第二信息存储至对应的存储空间后,所述信息存储器(第一存储器404),可以用于将与该线程对应的访问标识置为所述第二标识,以表示该存储空间已存储了未访问过的数据。由此可以便于统计各线程对应的第二信息是否被读取(访问)过,如果同一存储单元中各线程对应的访问标识均为第一标识,即可释放该存储单元,从而增加存储单元复用率。后续当该存储空间中的数据被访问后,信息存储器(第一存储器404)可以将该存储空间对应的线程的访问标识置为第一标识;或者,某一存储单元被释放后,信息存储器(第一存储器404)可以将各线程分别对应的访问标识置为第一标识;具体处理过程后续会进行详细说明。
在一些实施例中,所述信息存储器(第一存储器404)完成第二信息存储后会将所述目标存储单元的存储地址返回所述访存控制器401。所述访存控制器401可以用于根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至源存储器。
在一些实施例中,所述访存控制器401可以用于根据各线程分别对应的第一信息表征的源地址,确定所述各线程中是否存在发生地址冲突的若干第一线程。然后在存在若干个第一线程的情况下,可以将所述若干第一线程分别对应的第一信息,与所述存储地址进行组合,得到与各第一线程分别对应的第一读取指令。之后可以将得到的多个第一读取指令依次发送至源存储器(共享内存402),完成数据读取。由此可以有助于解决地址冲突带来的数据读取异常问题,进而提升数据读取正确率。
在一些实施例中,所述访存控制器401还可以用于将所述各线程中未发生地址冲突的若干第二线程分别对应的第一信息,与所述存储地址组合,得到与所述若干第二线程对应的第二读取指令。然后可以将所述第二读取指令发送至所述源存储器(共享内存402),完成针对各第二线程的数据读取。由此通过第二读取指令(线程组指令)的方式可以将各线程指令中共用的信息提取出来作为共享信息,从而与针对各线程单独发送读取指令的方式相比,可以简化指令格式,一方面提升访存控制器性能,另一方面,提升指令处理效率,从而提升数据搬移效率。
例如,未发生地址访问冲突的线程0-9需要分别从源存储器(共享内存402)中读取数据。其中线程0-9属于同一线程组。
所述访存控制器401可以根据线程0-9分别对应的数据读取操作,生成针对线程0-9的第二读取指令。该第二读取指令中可以仅包括一份与线程组基础信息(比如线程组号)相关的信息,与分别针对线程0-9生成数据读取指令相比,可以简化指令格式,一方面可以提升访存控制器生成指令的效率,从而提升其性能,另一方面,可以提升源存储器(共享内存402)对指令的处理效率,从而提升数据搬移效率。
所述源存储器(共享内存402)在接收到读取指令(第一读取指令或第二读取指令)后,可以用于根据所述读取指令包括的第一信息读取数据,并将读取的数据与所述读取指令包括的所述存储地址发送至存储控制器(第一控制器406或第二控制器407)。
在一些实施例中,所述源存储器(共享内存402)可以用于响应于所述第一读取指令,获取第一读取指令包括的第一信息,并根据获取的第一信息表征的源地址,读取与所述第一读取指令对应的第一线程所对应的第一数据。然后可以将读取的所述第一数据,所述第一读取指令包括的所述存储地址,以及与所述第一数据对应的线程ID发送至所述存储控制器(第一控制器406)。
在一些实施例中,所述第一读取指令可以包括第一线程的线程ID,与第一线程对应的第一信息,与第一线程对应的第二信息的存储地址以及存储控制器(第一控制器406)地址。所述源存储器(共享内存402)可以根据第一信息表征的源地址进行寻址,读取存储的第一数据。然后可以根据所述存储控制器(第一控制器406)地址,将线程ID,所述第一数据,第二信息的存储地址发送至所述存储控制器(第一控制器406)以将第一数据存储至目的存储器(外存403)。由此可以针对第一读取指令完成数据读取。
在一些实施例中,所述源存储器(共享内存402)可以用于响应于所述第二读取指令,获取所述第二读取指令包括的与各第二线程分别对应的第一信息,并根据各第二线程分别对应的第一信息所表征的源地址,读取与各第二线程分别对应的第二数据。然后可以将读取的各第二数据,所述第二读取指令包括的存储地址,以及与各第二数据对应的线程ID发送至所述存储控制器(第一控制器406)。
在一些实施例中,所述第二读取指令可以包括各第二线程的线程ID,与各第二线程对应的第一信息,与各第二线程对应的第二信息的存储地址以及存储控制器(第一控制器406)地址。所述源存储器(共享内存402)可以分别将各第二线程作为当前线程,并执行:
通过当前线程的线程ID,从第二读取指令中获取与当前线程对应的第一信息。然后通过第一信息表征的源地址进行寻址并读取与当前线程对应的第二数据。之后可以根据所述存储控制器(第一控制器406)地址,将读取的第二数据,当前线程的线程ID, 以及当前线程对应的第二信息的存储地址发送至所述存储控制器(第一控制器406)以将当前线程对应的第二数据存储至目的存储器(外存403)。由此,可以针对第二读取指令完成数据读取。
所述存储控制器(第一控制器406)在接收到从源存储器(共享内存402)读取到的各第二线程对应的第二数据、第二信息的存储地址以及线程ID等信息后,可以用于分别针对每个第二线程,根据该第二线程对应的第二信息的存储地址、对应的线程ID,从所述信息存储器(第一存储器404)中获取该第二线程对应的第二信息,并根据获取的所述第二信息与该第二线程对应的第二数据生成存储指令以将该第二数据存储至目的存储器。
所述存储控制器(第一控制器406)获取第二信息的方式可以是多种多样的,在一些实施例中,所述存储控制器(第一控制器406)可以仅根据所述存储地址,从所述信息存储器(第一存储器404)中获取第二信息。
在一些实施例中,在存储单元包括与多个线程对应的存储空间,并且建立有线程ID与各存储空间的对应关系的情况下,所述存储控制器(第一控制器406)可以根据线程ID以及存储地址,构建第二信息读取指令,并发送至信息存储器(第一存储器404)。该信息存储器(第一存储器404)可以响应于所述第二信息读取指令,根据存储地址进行寻址,读取与该线程ID对应的第二信息,并将该第二信息返回所述存储控制器(第一控制器406)。
所述存储控制器(第一控制器406)在接收到与某一线程对应的第二信息后,可以用于根据获取的所述第二信息与读取的数据生成存储指令,并将所述存储指令发送至目的存储器(外存403)。所述目的存储器(外存403)可以根据第二信息表征的目的地址进行寻址,并将从源存储器(共享内存402)处读取的数据存储至对应地址。由此即针对数据搬移指令(第一类型指令),完成从共享内存402向外存403搬移数据。
在所述硬件电路中,可以通过所述信息存储器(第一存储器、第二存储器)作为中间存储器件存储第二信息,从而释放存储控制器(第一控制器、第二控制器)的存储空间,提升存储控制器的性能。
在一些实施例中,所述信息存储器(第一存储器和/或第二存储器)可以为支持一读一写的缓存器,从而可以使存储访存控制器解析出的第二信息的过程和存储控制器(第一控制器和/或第二控制器)从所述信息存储器中读取第二信息的过程并行,提升数据搬移效率。
在一些实施例中,在存储控制器(第一控制器406)从信息存储器(第一存储器404)中完成第二信息读取后,所述信息存储器(第一存储器404)可以用于将读取的第二信息对应的目标线程的访问标识置为第一标识;其中,所述第一标识表征与所述目标线程对应的存储空间中的数据已被访问。所述信息存储器(第一存储器404)可以周期性确定目标存储单元包括的与所述各线程分别对应的访问标识是否均为所述第一标识,如果是,释放目标存储单元存储的数据。由此可以增加存储单元复用率。
在一些实施例中,所述访存控制器可以包括配置寄存器,地址寄存器以及运算单元。
所述地址寄存器,可以用于响应于所述数据搬移指令指示搬移的目标数据包括多个源地址的数据,存储所述多个源地址中的预设源地址。所述配置寄存器,可以用于存储用以基于所述预设源地址得到所述多个源地址中除去所述预设源地址以外的其它源地址的运算关系。所述运算单元,可以根据所述运算关系与所述预设源地址,生成针对所述多个源地址的数据进行搬移的多个指令以完成所述目标数据的搬移。
在一些实施例中,所述多个源地址之间具有一定空间关系。例如,所述多个源地址可以看成一个立方体。所述多个源地址的数据可以组成一个数据块。
所述预设源地址,可以是所述多个源地址中的任意地址。在一些实施例中,可 以将多个源地址中最小的地址作为所述预设源地址。通过所述预设源地址和所述运算关系得到其他源地址。
所述预设源地址可以存储在所述地址寄存器中,所述运算关系可以存储在所述配置寄存器中,所述访存控制器,可以通过所述运算单元,生成针对所述多个源地址的数据进行搬移的多个指令。其中,所述多个指令包括的源地址即为通过运算关系和预设源地址计算出来的。然后所述访存控制器可以针对所述多个指令分别执行前述任一实施例示出的数据搬移过程,完成目标数据(数据块)的搬移。
在本例中,针对多个源地址的数据的搬移只需要一条指令,所述访存控制器可以根据所述地址寄存器、配置寄存器以及运算单元将所述一条指令拆分为多条指令,从而完成多个源地址的数据的搬移,进而提升指令发射单元的工作性能。
在一些实施例中,所述硬件电路包括多个共享内存,与多个外存,以及在所述多个共享内存与第二控制器之间、所述多个外存与第一控制器之间连接了Crossbar。所述数据搬移指令用于在所述多个共享内存与所述多个外存之间进行数据搬移。
所述Crossbar通过交叉开关矩阵架构,将各共享内存与各外存连接,从而使得各共享内存可以与各外存之间进行数据搬移。所述存储控制器(第一控制器)可以用于根据获取的所述第二信息与读取的目标数据生成存储指令,并通过所述交叉开关矩阵将所述存储指令发送至目的存储器(外存)以完成数据搬移。由此本申请示出的数据搬移方法可以将点对点的数据搬移扩大为多点至多点的数据搬移,提升了数据搬移方法的适用范围。
在一些实施例中,所述访存控制器还包括状态反馈单元。
其中,所述状态反馈单元用于反馈访存控制器当前的工作状态。
在一些实施例中,所述状态反馈单元,可以用于响应于接收到处理器发送的数据搬移指令,向所述处理器发送第一状态信息以指示所述硬件电路当前为繁忙状态。
所述状态反馈单元,可以用于响应于根据存储指令将所述目标数据存储至所述目的存储器(外存),向所述处理器发送第二状态信息以指示所述硬件电路当前为空闲状态。
由此可以使处理器实时掌握访存控制器以及硬件电路的工作状态,便于处理器进行高效地进行任务调度。
请参见图5,图5为本申请示出的一种硬件电路结构示意图。
如图5所示,所述硬件电路包括:访存控制器;与所述访存控制器连接的多个共享内存、多个外存、第一存储器和第二存储器;与所述多个共享内存、所述第一存储器和所述多个外存连接的第一控制器;与所述多个外存、所述第二存储器和所述多个共享内存连接的第二控制器;以及在所述多个共享内存与第二控制器之间、所述多个外存与第一控制器之间连接了Crossbar。为了便于观看,图5中仅展示了访存控制器与共享内存0、外存0之间的连接关系,访存控制器与其他共享内存、其他外存之间的连接关系与之相似。
其中,所述访存控制器包括状态反馈单元,配置寄存器,地址寄存器以及运算单元。
假设需要将共享内存0中地址0-地址3(以下简称地址0-3)这4个地址的数据存储至外存1内的地址5(以下简称地址5)中。处理器可以通过指令发射单元将地址0-3作为第一信息(源地址),地址5作为第二信息(目的地址)生成第一类型指令,并将该指令发射至访存控制器。
访存控制器在接收到第一类型指令后,可以通过状态反馈单元向所述处理器反馈表征硬件电路当前状态为繁忙的第一状态信息。还可以将地址0作为预设源地址存储至地址寄存器,将步长1作为运算关系存储至配置寄存器。然后访存控制器可以通过运算单元,生成源地址分别为地址0-地址3,目的地址为地址5的4条指令。
之后所述访存控制器可以分别针对所述4条指令,执行前述数据搬移方法,直至将地址0-3中的数据均搬移至地址5。由此可以减少指令发射单元生成的指令数量,提升其工作性能。
所述访存控制器完成数据搬移后,还可以通过所述状态反馈单元,向所述处理器发送第二状态信息以指示硬件电路当前为空闲状态。由此处理器可以实时掌控硬件电路的工作状态,便于进行任务调度。
本申请还提出一种数据搬移方法。该方法可以应用于前述任一实施例示出的硬件电路。
请参见图6,图6为本申请示出的一种数据搬移方法的方法流程图。如图6所示,所述方法可以包括S602~S606。
S602,所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息;以及,根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器。
S604,所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将所述目标数据发送至所述存储控制器。
S606,所述存储控制器获取所述第二信息,并根据获取的所述第二信息与所述目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
该方法可以获取数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,并利用第一信息生成读取指令,从源存储器读取目标数据,然后利用第二信息与读取到的目标数据生成存储指令,将读到的目标数据存储至目的存储器以完成数据搬移。其中,所述源存储器指示发送数据的存储器。例如,共享内存。所述目的存储器指示接收数据的存储器。例如,外存。
可见,一方面该方法可以形成源存储器与目的存储器之间点对点的数据搬移,从而释放线程寄存器资源,以及线程与线程寄存器之间的线程Crossbar,进而降低线程功耗,提升线程Crossbar性能;另一方面,仅需要指令发射单元发送一条数据搬移指令即可完成数据搬移,从而提升数据搬移效率。
在示出的一些实施例中,所述硬件电路还包括信息存储器;所述信息存储器与所述访存控制器和所述存储控制器连接。相应地,所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,并将所述第二信息存储至所述信息存储器,得到存储所述第二信息的存储地址;以及,根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器。所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据与所述读取指令包括的所述存储地址发送至所述存储控制器。所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
由此可以将第二信息存储至信息存储器,从而释放存储控制器存储空间,提升存储控制器性能。
本申请还提出一种数据搬移方法。该方法可以应用于前述任一实施例示出的硬件电路。所述方法可以包括S702~S706。
S702,所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,以及,根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器。
S704,所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据发送至所述存储控制器。
S706,所述存储控制器获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
其中,所述数据搬移指令可以是第一类型的数据搬移指令或第二类型的数据搬移指令。其中,响应于第一类型的数据搬移指令,所述源存储器为所述共享内存,所述目的存储器为所述外存,所述存储控制器为所述第一控制器。响应于第二类型的数据搬移指令,所述源存储器为所述外存,所述目的存储器为所述共享内存,所述存储控制器为所述第二控制器。
该方法可以针对第一类型的数据搬移指令与第二类型的数据搬移指令,分别通过硬件电路中不同的器件完成前述实施例公开的数据搬移方案,从而可以异步处理两种类型的数据搬移指令,提升数据搬移效率。
在示出的一些实施例中,所述硬件电路还包括第一存储器和第二存储器;所述第一存储器与所述访存控制器和所述第一控制器连接;所述第二存储器与所述访存控制器和所述第二控制器连接。相应地,所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,以及,根据所述第一信息生成读取指令,并将所述读取指令发送至源存储器,包括:所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,并将所述第二信息存储至信息存储器,得到存储所述第二信息的存储地址;以及,根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至源存储器。所述源存储器根据所述读取指令包括的第一信息读取数据,并将读取的数据发送至存储控制器,包括:所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据与所述读取指令包括的所述存储地址发送至存储控制器。所述存储控制器根据所述第二信息,并根据获取的所述第二信息与读取的数据生成存储指令以将所述数据存储至目的存储器,包括:所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,并根据获取的所述第二信息与读取的目标数据生成存储指令以将所述目标数据存储至目的存储器。其中,所述源存储器为所述共享内存,所述目的存储器为所述外存,所述信息存储器为所述第一存储器,所述存储控制器为所述第一控制器;或者,所述源存储器为所述外存,所述目的存储器为所述共享内存,所述信息存储器为所述第二存储器,所述存储控制器为所述第二控制器。
在示出的一些实施例中,所述数据搬移指令包括针对线程组内各线程发起的指令;所述信息存储器包括若干存储单元。相应地,所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,并将所述第二信息存储至所述信息存储器,得到存储所述第二信息的存储地址,包括:所述访存控制器根据所述数据搬移指令,获取与所述各线程分别对应的所述第一信息与所述第二信息;以及,将所述各线程分别对应的第二信息发送至所述信息存储器。并且,所述方法还可包括:所述信息存储器确定所述若干存储单元中未存储数据的目标存储单元,并将各线程分别对应的第二信息存储至所述目标存储单元,以及将所述目标存储单元的存储地址发送至所述访存控制器以使所述访存控制器得到所述存储地址。
在示出的一些实施例中,所述目标存储单元包括与所述各线程分别对应的存储空间。相应地,所述信息存储器确定所述若干存储单元中未存储数据的目标存储单元,并将各线程分别对应的第二信息存储至所述目标存储单元,包括:所述信息存储器将各线程分别对应的第二信息存储至所述目标存储单元中,与所述各线程分别对应的存储空间。
在示出的一些实施例中,所述访存控制器根据所述存储地址与所述第一信息生成读取指令,包括:所述访存控制器根据各线程分别对应的第一信息表征的源地址,确定所述各线程中发生地址冲突的若干第一线程;将所述若干第一线程分别对应的第一信息,与所述存储地址进行组合,得到与若干第一线程分别对应的第一读取指令。
在示出的一些实施例中,所述访存控制器根据所述存储地址与所述第一信息生成读取指令,包括:所述访存控制器将所述各线程中未发生地址冲突的若干第二线程分 别对应的第一信息,与所述存储地址组合,得到与所述若干第二线程对应的第二读取指令。
在示出的一些实施例中,所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的目标数据与所述读取指令包括的所述存储地址发送至所述存储控制器,包括:所述源存储器响应于所述第一读取指令,获取第一读取指令包括的第一信息,并根据获取的第一信息表征的源地址,读取与所述第一读取指令对应的第一线程所对应的第一数据;将读取的所述第一数据,所述第一读取指令包括的所述存储地址,以及与所述第一数据对应的线程ID发送至所述存储控制器;响应于所述第二读取指令,获取所述第二读取指令包括的与各第二线程分别对应的第一信息,并根据各第二线程分别对应的第一信息所表征的源地址,读取与各第二线程分别对应的第二数据;将读取的各第二数据,所述第二读取指令包括的存储地址,以及与各第二数据对应的线程ID发送至所述存储控制器。
在示出的一些实施例中,所述目标存储单元还包括与所述各线程分别对应的访问标识;所述访问标识指示与所述线程对应的存储空间中的数据是否被访问。相应地,所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,包括:所述存储控制器根据所述存储地址,从所述信息存储器中获取目标线程所对应的第二信息。此外,所述方法还可包括:所述信息存储器响应于所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,将目标线程所对应的访问标识置为第一标识;其中,所述第一标识表征与所述目标线程对应的存储空间中的数据已被访问。
在示出的一些实施例中,所述方法还包括:所述信息存储器确定目标存储单元包括的与所述各线程分别对应的访问标识是否均为所述第一标识,如果是,释放目标存储单元存储的数据。
在示出的一些实施例中,所述访存控制器包括配置寄存器,地址寄存器以及运算单元。相应地,所述方法还可包括:所述地址寄存器响应于所述数据搬移指令指示搬移的目标数据包括多个源地址的数据,存储所述多个源地址中的预设源地址;所述配置寄存器存储用以基于所述预设源地址得到所述多个源地址中,除去所述预设源地址以外的其它源地址的运算关系;所述运算单元根据所述运算关系与所述预设源地址,生成针对所述多个源地址的数据进行搬移的多个指令以完成所述目标数据的搬移。
在示出的一些实施例中,所述硬件电路包括多个共享内存,与多个外存,以及在所述多个共享内存与第二控制器之间和所述多个外存与第一控制器之间连接了交叉开关矩阵;所述数据搬移指令用于在所述多个共享内存与所述多个外存之间进行数据搬移。相应地,所述存储控制器根据获取的所述第二信息与读取的数据生成存储指令以将所述数据存储至目的存储器,包括:所述存储控制器根据获取的所述第二信息与读取的目标数据生成存储指令,并通过所述交叉开关矩阵将所述存储指令发送至目的存储器以完成数据搬移。
在示出的一些实施例中,所述访存控制器还包括状态反馈单元。相应地,所述方法还可包括,所述状态反馈单元:响应于接收到处理器发送的数据搬移指令,向所述处理器发送第一状态信息以指示所述硬件电路当前为繁忙状态;响应于根据存储指令将所述目标数据存储至所述目的存储器,向所述处理器发送第二状态信息以指示所述硬件电路当前为空闲状态。
在示出的一些实施例中,所述第一存储器和/或所述第二存储器包括支持一读一写的缓存器。各方法实施例对应的效果可以参照前述对应硬件电路的实施例说明,在此不作详述。
本申请还提出一种芯片。该芯片可以包括前述任一实施例示出的硬件电路。当该芯片内部进行数据搬移时,可以利用芯片内部的硬件电路实现如前述任一实施例示出的数据搬移方法,从而一方面可以形成源存储器与目的存储器之间点对点的数据搬移, 从而释放线程寄存器资源,以及线程与线程寄存器之间的线程Crossbar,进而降低线程功耗,提升线程Crossbar性能;另一方面,仅需要指令发射单元发送一条数据搬移指令即可完成数据搬移,从而提升数据搬移效率,进而提升芯片性能。
本申请还提出一种电子设备,包括所述任一实施例示出的硬件电路或前述芯片。
例如,该电子设备可以是手机等智能终端,或者也可以是具有摄像头并可以进行图像处理的其他设备。示例性的,当该电子设备需要数据搬移时,可以通过设备内部的芯片或硬件电路实现如前述任一实施例示出的数据搬移方法,从而一方面可以形成源存储器与目的存储器之间点对点的数据搬移,从而释放线程寄存器资源,以及线程与线程寄存器之间的线程Crossbar,进而降低线程功耗,提升线程Crossbar性能;另一方面,仅需要指令发射单元发送一条数据搬移指令即可完成数据搬移,从而提升数据搬移效率,进而提升设备性能。
本申请还提出一种计算机可读存储介质,其上存储有计算机程序,所述程序被控制器执行时实现如前述任一实施例示出的数据搬移方法。
本领域技术人员应明白,本申请一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本申请一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、光学存储器等)上实施的计算机程序产品的形式。
本申请中记载的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”包括三种方案:A、B、以及“A和B”。
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
所述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
本申请中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本申请中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本申请中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。
本申请中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理系统。通常,中央处理系统将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理系统以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或 多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及0xCD_00ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。
虽然本申请包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本申请内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所述的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,所述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所述的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。
以上所述仅为本申请一个或多个实施例的较佳实施例而已,并不用以限制本申请一个或多个实施例,凡在本申请一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请一个或多个实施例保护的范围之内。

Claims (20)

  1. 一种硬件电路,包括:
    访存控制器,用于响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,并根据所述第一信息生成读取指令;
    与所述访存控制器连接的源存储器,作为指示发送数据的存储器,用于根据从所述访存控制器接收的所述读取指令包括的第一信息读取目标数据;
    与所述访存控制器连接的目的存储器,作为指示接收数据的存储器;
    与所述源存储器、所述目的存储器、所述访存控制器连接的存储控制器,用于获取所述第二信息,并根据获取的所述第二信息与从所述源存储器接收的所述目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
  2. 根据权利要求1所述的硬件电路,其中,所述硬件电路还包括信息存储器;
    所述信息存储器与所述访存控制器和所述存储控制器连接;
    所述访存控制器,用于将所述第二信息存储至所述信息存储器,得到存储所述第二信息的存储地址;以及,根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器;
    所述源存储器,用于根据所述读取指令包括的第一信息读取所述目标数据,并将读取的所述目标数据与所述读取指令包括的所述存储地址发送至所述存储控制器;
    所述存储控制器,用于根据所述存储地址,从所述信息存储器中获取所述第二信息,并根据获取的所述第二信息与所述目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
  3. 一种硬件电路,包括:
    访存控制器,用于响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,并根据所述第一信息生成读取指令;
    与所述访存控制器连接的源存储器,包括共享内存和/或外存,用于根据从所述访存控制器接收的所述读取指令包括的第一信息读取目标数据;
    与所述访存控制器连接的目的存储器,包括所述共享内存和/或所述外存;
    与所述源存储器、所述目的存储器、所述访存控制器连接的存储控制器,包括与所述共享内存、所述访存控制器和所述外存连接的第一控制器和/或与所述外存、所述访存控制器和所述共享内存连接的第二控制器,用于获取所述第二信息,并根据获取的所述第二信息与从所述源存储器接收的所述目标数据生成存储指令以将所述目标数据存储至所述目的存储器;
    其中,所述源存储器为所述共享内存,所述目的存储器为所述外存,所述存储控制器为所述第一控制器;
    或者,所述源存储器为所述外存,所述目的存储器为所述共享内存,所述存储控制器为所述第二控制器。
  4. 根据权利要求3所述的硬件电路,其中,所述硬件电路还包括信息存储器,所述信息存储器包括第一存储器和第二存储器;
    所述第一存储器与所述访存控制器和所述第一控制器连接;
    所述第二存储器与所述访存控制器和所述第二控制器连接;
    所述访存控制器,用于将所述第二信息存储至所述信息存储器,得到存储所述第二信息的存储地址;以及,根据所述存储地址与所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器;
    所述源存储器,用于根据所述读取指令包括的第一信息读取所述目标数据,并将读取的所述目标数据与所述读取指令包括的所述存储地址发送至所述存储控制器;
    所述存储控制器,用于根据所述存储地址,从所述信息存储器中获取所述第二信息,并根据获取的所述第二信息与所述目标数据生成存储指令以将所述目标数据存储至所 述目的存储器;
    其中,所述源存储器为所述共享内存,所述目的存储器为所述外存,所述信息存储器为所述第一存储器,所述存储控制器为所述第一控制器;
    或者,所述源存储器为所述外存,所述目的存储器为所述共享内存,所述信息存储器为所述第二存储器,所述存储控制器为所述第二控制器。
  5. 根据权利要求4所述的硬件电路,其中,
    所述数据搬移指令包括针对线程组内各线程发起的指令;
    所述信息存储器包括若干存储单元;
    所述访存控制器,用于根据所述数据搬移指令,获取与所述各线程分别对应的所述第一信息与所述第二信息,并将所述各线程分别对应的第二信息发送至所述信息存储器;
    所述信息存储器,用于确定所述若干存储单元中未存储数据的目标存储单元,将各线程分别对应的第二信息存储至所述目标存储单元,并将所述目标存储单元的存储地址发送至所述访存控制器。
  6. 根据权利要求5所述的硬件电路,其中,
    所述目标存储单元包括与所述各线程分别对应的存储空间;
    所述信息存储器,用于将所述各线程分别对应的第二信息,存储至所述目标存储单元中与所述各线程分别对应的存储空间。
  7. 根据权利要求6所述的硬件电路,其中,
    所述访存控制器,用于根据所述各线程分别对应的第一信息表征的源地址,确定所述各线程中发生地址冲突的若干第一线程;
    将所述若干第一线程分别对应的第一信息,与所述存储地址进行组合,得到与所述若干第一线程分别对应的第一读取指令。
  8. 根据权利要求7所述的硬件电路,其中,
    所述访存控制器,还用于将所述各线程中未发生地址冲突的若干第二线程分别对应的第一信息,与所述存储地址组合,得到与所述若干第二线程对应的第二读取指令。
  9. 根据权利要求8所述的硬件电路,其中,所述源存储器,用于:
    响应于所述第一读取指令,
    获取所述第一读取指令包括的第一信息,并
    根据获取的所述第一信息表征的源地址,读取与所述第一读取指令对应的第一线程所对应的第一数据;
    将读取的所述第一数据,所述第一读取指令包括的所述存储地址,以及与所述第一数据对应的线程标识ID发送至所述存储控制器;
    响应于所述第二读取指令,
    获取所述第二读取指令包括的与各所述第二线程分别对应的第一信息,并
    根据各所述第二线程分别对应的第一信息所表征的源地址,读取与各所述第二线程分别对应的第二数据;
    将读取的各所述第二数据,所述第二读取指令包括的所述存储地址,以及与各所述第二数据对应的线程ID发送至所述存储控制器。
  10. 根据权利要求9所述的硬件电路,其中,
    所述目标存储单元还配置有与所述各线程分别对应的访问标识;所述访问标识指示与所述线程对应的存储空间中的数据是否被访问;
    所述存储控制器,用于根据所述存储地址,从所述信息存储器中获取目标线程所对应的第二信息;
    所述信息存储器,用于响应于所述存储控制器根据所述存储地址,从所述信息存储器中获取所述第二信息,将所述目标线程所对应的访问标识置为第一标识;其中,所述第一标识表征与所述目标线程对应的存储空间中的数据已被访问。
  11. 根据权利要求10所述的硬件电路,所述信息存储器,用于确定所述目标存储单元包括的、与所述各线程分别对应的访问标识是否均为所述第一标识,如果是,释放所述目标存储单元存储的数据。
  12. 根据权利要求3至11任一所述的硬件电路,所述访存控制器包括配置寄存器,地址寄存器以及运算单元;
    所述地址寄存器,用于响应于所述数据搬移指令指示搬移的目标数据包括多个源地址的数据,存储所述多个源地址中的预设源地址;
    所述配置寄存器,用于存储用以基于所述预设源地址得到所述多个源地址中除去所述预设源地址以外的其它源地址的运算关系;
    所述运算单元,用于根据所述运算关系与所述预设源地址,生成针对所述多个源地址的数据进行搬移的多个指令以完成所述目标数据的搬移。
  13. 根据权利要求3至12任一所述的硬件电路,所述硬件电路包括多个共享内存,与多个外存,以及在所述多个共享内存与第二控制器之间、所述多个外存与第一控制器之间连接了交叉开关矩阵;所述数据搬移指令用于在所述多个共享内存与所述多个外存之间进行数据搬移;
    所述存储控制器,用于根据获取的所述第二信息与所述目标数据生成存储指令,并通过所述交叉开关矩阵将所述存储指令发送至所述目的存储器以完成数据搬移。
  14. 根据权利要求3至13任一所述的硬件电路,所述访存控制器还包括状态反馈单元;
    所述状态反馈单元,用于响应于接收到处理器发送的数据搬移指令,向所述处理器发送第一状态信息以指示所述硬件电路当前为繁忙状态;
    响应于根据所述存储指令将所述目标数据存储至所述目的存储器,向所述处理器发送第二状态信息以指示所述硬件电路当前为空闲状态。
  15. 根据权利要求4至13任一项所述的硬件电路,所述第一存储器和/或所述第二存储器包括支持一读一写的缓存器。
  16. 一种数据搬移方法,应用于如权利要求1或2所述的硬件电路,所述方法包括:
    所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息;以及,根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器;
    所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的所述目标数据发送至所述存储控制器;
    所述存储控制器获取所述第二信息,并根据获取的所述第二信息与所述目标数据生成存储指令以将所述目标数据存储至所述目的存储器。
  17. 一种数据搬移方法,应用于如权利要求3至15任一所述的硬件电路,所述方法包括:
    所述访存控制器响应于接收的数据搬移指令,获取所述数据搬移指令包括的用于读数据的第一信息与用于写数据的第二信息,以及,根据所述第一信息生成读取指令,并将所述读取指令发送至所述源存储器;
    所述源存储器根据所述读取指令包括的第一信息读取目标数据,并将读取的所述目标数据发送至所述存储控制器;
    所述存储控制器获取所述第二信息,并根据获取的所述第二信息与所述目标数据生成存储指令以将所述目标数据存储至所述目的存储器;
    其中,所述源存储器为所述共享内存,所述目的存储器为所述外存,所述存储控制器为所述第一控制器;或所述源存储器为所述外存,所述目的存储器为所述共享内存,所述存储控制器为所述第二控制器。
  18. 一种芯片,包括如权利要求1至15任一所述的硬件电路。
  19. 一种电子设备,包括如权利要求1至15任一所述的硬件电路或权利要求18所述的芯片。
  20. 一种计算机可读存储介质,其上存储有计算机程序,所述程序被控制器执行时实现如权利要求16或17所述的数据搬移方法。
PCT/CN2021/134610 2021-04-29 2021-11-30 一种硬件电路、数据搬移方法、芯片和电子设备 WO2022227563A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110476825 2021-04-29
CN202110476825.0 2021-04-29

Publications (1)

Publication Number Publication Date
WO2022227563A1 true WO2022227563A1 (zh) 2022-11-03

Family

ID=77095209

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134610 WO2022227563A1 (zh) 2021-04-29 2021-11-30 一种硬件电路、数据搬移方法、芯片和电子设备

Country Status (2)

Country Link
CN (1) CN113220346B (zh)
WO (1) WO2022227563A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220346B (zh) * 2021-04-29 2024-06-07 上海阵量智能科技有限公司 一种硬件电路、数据搬移方法、芯片和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016127552A1 (zh) * 2015-02-12 2016-08-18 深圳市中兴微电子技术有限公司 一种直接内存存取dma控制器及数据传输的方法
CN107783918A (zh) * 2016-08-31 2018-03-09 北京信威通信技术股份有限公司 一种数据传输的方法及装置
CN111190842A (zh) * 2019-12-30 2020-05-22 Oppo广东移动通信有限公司 直接存储器访问、处理器、电子设备和数据搬移方法
CN111782154A (zh) * 2020-07-13 2020-10-16 北京四季豆信息技术有限公司 数据搬移方法、装置及系统
CN112506437A (zh) * 2020-12-10 2021-03-16 上海阵量智能科技有限公司 芯片、数据搬移方法和电子设备
CN113220346A (zh) * 2021-04-29 2021-08-06 上海阵量智能科技有限公司 一种硬件电路、数据搬移方法、芯片和电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109791522B (zh) * 2017-09-05 2021-01-15 华为技术有限公司 数据迁移的方法、系统及智能网卡
CN111984395B (zh) * 2019-05-22 2022-12-13 中移(苏州)软件技术有限公司 一种数据迁移方法、系统及计算机可读存储介质
CN112578997B (zh) * 2019-09-30 2022-07-22 华为云计算技术有限公司 一种数据迁移方法、系统及相关设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016127552A1 (zh) * 2015-02-12 2016-08-18 深圳市中兴微电子技术有限公司 一种直接内存存取dma控制器及数据传输的方法
CN107783918A (zh) * 2016-08-31 2018-03-09 北京信威通信技术股份有限公司 一种数据传输的方法及装置
CN111190842A (zh) * 2019-12-30 2020-05-22 Oppo广东移动通信有限公司 直接存储器访问、处理器、电子设备和数据搬移方法
CN111782154A (zh) * 2020-07-13 2020-10-16 北京四季豆信息技术有限公司 数据搬移方法、装置及系统
CN112506437A (zh) * 2020-12-10 2021-03-16 上海阵量智能科技有限公司 芯片、数据搬移方法和电子设备
CN113220346A (zh) * 2021-04-29 2021-08-06 上海阵量智能科技有限公司 一种硬件电路、数据搬移方法、芯片和电子设备

Also Published As

Publication number Publication date
CN113220346A (zh) 2021-08-06
CN113220346B (zh) 2024-06-07

Similar Documents

Publication Publication Date Title
CN111913652B (zh) 包括处理电路的存储器件、存储器控制器和存储系统
KR101719092B1 (ko) 하이브리드 메모리 디바이스
JP5301381B2 (ja) データドリブン型アーキテクチャメッシュアレイ中のメモリアクセスデバイス制御
KR102401594B1 (ko) 고성능 트랜잭션 기반 메모리 시스템
US10936533B2 (en) GPU remote communication with triggered operations
US11880329B2 (en) Arbitration based machine learning data processor
KR102532173B1 (ko) 메모리 액세스 기술 및 컴퓨터 시스템
US7802025B2 (en) DMA engine for repeating communication patterns
US9032162B1 (en) Systems and methods for providing memory controllers with memory access request merging capabilities
WO2022121278A1 (zh) 芯片、数据搬移方法和电子设备
CN111258935B (zh) 数据传输装置和方法
US8996788B2 (en) Configurable flash interface
KR20200123260A (ko) 캐시 및 다중 독립 어레이를 갖는 메모리용 인터페이스
CN114880259B (zh) 数据处理方法、装置、系统、电子设备及存储介质
CN114827048B (zh) 一种动态可配高性能队列调度方法、系统、处理器及协议
WO2022227563A1 (zh) 一种硬件电路、数据搬移方法、芯片和电子设备
KR101103619B1 (ko) 멀티 포트 메모리 및 그 억세스 제어 방법
CN115981751A (zh) 一种近存计算系统以及近存计算方法、装置、介质及设备
CN111258769A (zh) 数据传输装置和方法
US11677902B2 (en) Data processing method and related product
CN115777098A (zh) 通过循环fifo分散和聚集流式传输数据
US10909043B2 (en) Direct memory access (DMA) controller, device and method using a write control module for reorganization of storage addresses in a shared local address space
KR20210049183A (ko) 판독 식별(rid) 번호에 의한 메모리 어드레싱
US20200278794A1 (en) Dual speed memory
KR20210108487A (ko) 저장 디바이스 동작 오케스트레이션

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21939022

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21939022

Country of ref document: EP

Kind code of ref document: A1