WO2021223639A1 - 数据处理装置以及相关产品 - Google Patents

数据处理装置以及相关产品 Download PDF

Info

Publication number
WO2021223639A1
WO2021223639A1 PCT/CN2021/090623 CN2021090623W WO2021223639A1 WO 2021223639 A1 WO2021223639 A1 WO 2021223639A1 CN 2021090623 W CN2021090623 W CN 2021090623W WO 2021223639 A1 WO2021223639 A1 WO 2021223639A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
address
module
storage
discrete
Prior art date
Application number
PCT/CN2021/090623
Other languages
English (en)
French (fr)
Inventor
马旭研
吴健华
刘少礼
葛祥轩
刘瀚博
张磊
Original Assignee
安徽寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 安徽寒武纪信息科技有限公司 filed Critical 安徽寒武纪信息科技有限公司
Priority to US17/619,760 priority Critical patent/US20230214327A1/en
Publication of WO2021223639A1 publication Critical patent/WO2021223639A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0886Variable-length word access
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement

Definitions

  • the present disclosure relates to the field of computer technology, in particular to a data processing device and related products.
  • a data processing device which includes: a decoding module, a discrete address determination module, a continuous data cache module, a data reading and writing module, and a storage module,
  • the decoding module is used to decode the received processing instruction to obtain the decoded processing instruction; determine multiple data corresponding to the processing instruction, and the source data base address and destination data base address of the multiple data , The data offset address of the discrete data and the data size of the continuous data, the source data of the multiple data includes discrete data or continuous data; the decoding module is also used to according to the base address of the continuous data and the data size of the continuous data, Determine the first storage address of continuous data;
  • the discrete address determining module is connected to the decoding module and the data reading and writing module, and is configured to determine the second storage address of the discrete data according to the base address of the discrete data and the data offset address of the discrete data; Sending the second storage address to the data reading and writing module;
  • the continuous data buffer module is connected to the decoding module and the data read/write module, and is used to establish a buffer space for continuous data; buffer the continuous data of the first storage address in the buffer space and send it to the The data reading and writing module, or the continuous data received from the data reading and writing module is buffered in the buffer space and sent to the first storage address;
  • the data reading and writing module is connected to the storage module, and is configured to read discrete data from the storage module according to the second storage address of the discrete data, and send the read discrete data to the continuous data buffer module; Or receive the continuous data of the continuous data buffer module, and write the received continuous data into the storage module according to the storage address of the discrete data,
  • the data read-write module includes a combined request cache sub-module, which is used to cache storage addresses corresponding to multiple read requests during the period when the data read-write module reads discrete data, so that each read request can be read in a combined manner. Get one or more discrete data.
  • an artificial intelligence chip which includes the data processing device described above.
  • an electronic device including the artificial intelligence chip as described above.
  • a board card comprising: a storage device, an interface device, a control device, and the artificial intelligence chip as described above; wherein the artificial intelligence chip and the storage device , The control device and the interface device are respectively connected; the storage device is used to store data; the interface device is used to implement data transmission between the artificial intelligence chip and external equipment; the control device, It is used to monitor the state of the artificial intelligence chip.
  • processing instructions can be decoded and executed, discrete data can be transported to continuous data addresses, or continuous data can be stored in discrete multiple data addresses, so as to implement discrete data vector operations and vector operations.
  • Vector data restoration thereby simplifying the processing process and reducing data overhead.
  • the storage address corresponding to the read request can be cached when reading discrete data, so that each data read request can be combined to read one or more discrete data, thereby improving data reading efficient.
  • Fig. 1 shows a schematic diagram of a processor of a data processing device according to an embodiment of the present disclosure
  • Figure 2 shows a block diagram of a data processing device according to an embodiment of the present disclosure
  • Fig. 3 shows a block diagram of a data processing device according to an embodiment of the present disclosure
  • Fig. 4 shows a structural block diagram of a board according to an embodiment of the present disclosure.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • the data processing device can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processing for performing artificial intelligence operations.
  • Device IPU
  • Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on.
  • the artificial intelligence processor may, for example, include GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • GPU Graphics Processing Unit
  • NPU Neuro-Network Processing Unit
  • DSP Digital Signal Process, digital signal processing unit
  • field programmable gate array Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks run by the processing unit.
  • Fig. 1 shows a schematic diagram of a processor of a data processing device according to an embodiment of the present disclosure.
  • the processor 100 includes multiple processing units 101 and a storage unit 102.
  • the multiple processing units 101 are used to execute instruction sequences, and the storage unit 102 is used to store data, and may include random access memory (RAM, Random Access Memory). And the register file.
  • the multiple processing units 101 in the processor 100 can not only share part of the storage space, for example, share part of the RAM storage space and the register file, but also have their own storage space at the same time.
  • Fig. 2 shows a block diagram of a data processing device according to an embodiment of the present disclosure.
  • the device includes: a decoder module (Decoder, DEC for short) 21, a discrete address determination module 22, a continuous data buffer module (Continuous Data Buffer, CDB) 23, a data read and write module 24, and a storage module 25:
  • the decoding module 21 is configured to decode the received processing instruction to obtain the decoded processing instruction; determine the multiple data corresponding to the processing instruction, and the source data base address and destination data base address of the multiple data. The address, the data offset address of the discrete data, and the data size of the continuous data.
  • the source data of the multiple data includes discrete data or continuous data; the decoding module is also used to determine the base address of the continuous data and the data size of the continuous data. , Determine the first storage address of continuous data;
  • the discrete address determining module 22 is connected to the decoding module 21 and the data reading and writing module 24, and is configured to determine the second storage address of the discrete data according to the base address of the discrete data and the data offset address of the discrete data; Sending the second storage address to the data reading and writing module;
  • the continuous data buffer module 23 is connected to the decoding module 21 and the data read/write module 24, and is used to establish a buffer space for continuous data; buffer the continuous data at the first storage address in the buffer space and Sending to the data reading and writing module, or buffering the continuous data received from the data reading and writing module in the buffer space and sending to the first storage address;
  • the data reading and writing module 24 is connected to the storage module 25, and is configured to read discrete data from the storage module according to the second storage address of the discrete data, and send the read discrete data to the continuous data buffer Module; or receive the continuous data of the continuous data buffer module, and write the received continuous data into the storage module according to the storage address of the discrete data,
  • the data reading and writing module includes a merge request buffer (Merge Request Buffer, referred to as MRB) submodule, which is used to buffer storage addresses corresponding to multiple read requests during the period when the data reading and writing module reads discrete data, so as to Make each read request merge to read one or more discrete data.
  • MRB Merge Request Buffer
  • the data processing device can realize the vector address access (Vector Address Access, referred to as VAA) function, and can support corresponding functional commands, for example, data handling commands (Gather Load), which are used to transfer a set of discrete addresses
  • VAA vector Address Access
  • functional commands for example, data handling commands (Gather Load), which are used to transfer a set of discrete addresses
  • the content is aggregated into a continuous data vector;
  • the discrete store instruction (Scatter Store) is used to store the continuous data vector in a set of discrete addresses;
  • the vector extension instruction Vector Extension
  • the present disclosure does not limit the number and types of commands supported by the device.
  • the decoding module 21 can obtain the processing instruction to be decoded from the instruction dispatch queue (Issue Queue, ISQ for short) upstream of the device, and decode (also referred to as decoding) the processing instruction to obtain the decoded Processing instructions.
  • the decoded processing instruction includes an operation code and an operation field.
  • the operation code is used to indicate the processing type of the processing instruction, and the operation field is used to indicate the data to be processed and data parameters.
  • the decoding module 21 can determine multiple data corresponding to the decoded processing instruction and data parameters of the multiple data according to the operation domain, such as the source data base address, destination data base address, and The data offset address of discrete data and the data size of continuous data (Single Point Data Size), etc.
  • the source data of the multiple data includes discrete data or continuous data.
  • the processing instruction is a data handling instruction
  • the source data of the multiple data is discrete data
  • the destination data of the multiple data is continuous data
  • the decoded processing instruction is a discrete storage instruction
  • the source data of the multiple data is continuous data
  • the destination data of the multiple data is discrete data.
  • the decoding module 21 can store the number of data to be processed (Single Point Data Number) and the data parameters of each data, and send the base address of the discrete data and the data offset address of the discrete data to the discrete address determination module 22.
  • the discrete address determining module 22 may determine the second storage address of the discrete data according to the base address of the received discrete data and the data offset address of the discrete data.
  • the data offset address of the discrete data may include the offset value of the discrete data (Offset), the offset value base address (Offset Vector Base Address) and the offset value width (Offset Size) stored in an external memory (such as RAM) .
  • the discrete address determination module 22 can read the offset value (Offset) of each discrete data from the external memory (such as RAM) through BUS PORT SHARE according to the offset value base address and the offset value width; The offset value base address of the discrete data stored in the storage module 25 and the offset value are respectively calculated for the second storage address of each discrete data in the storage module 25; then the second storage address of each discrete data is sent in order To the data reading and writing module 24.
  • the offset value (Offset) of each discrete data from the external memory (such as RAM) through BUS PORT SHARE according to the offset value base address and the offset value width;
  • the offset value base address of the discrete data stored in the storage module 25 and the offset value are respectively calculated for the second storage address of each discrete data in the storage module 25; then the second storage address of each discrete data is sent in order To the data reading and writing module 24.
  • the data reading and writing module 24 can read the discrete data from the storage module 25 or write the discrete data into the storage module 25 according to the second storage address of the discrete data.
  • the processing instruction is a data transfer instruction
  • the data read/write module 24 can read the discrete data from the storage module 25 in sequence according to the second storage address, and send the read discrete data to the continuous data buffer module 23
  • the data reading and writing module 24 can receive the continuous data sent by the continuous data buffer module 23, and write the received continuous data into the storage module 25 according to the second storage address of the discrete data.
  • the data reading and writing module may include a combined request cache submodule, which is used to cache the storage addresses corresponding to multiple read requests during the period when the data reading and writing module reads discrete data, so that each The read requests are combined to read one or more discrete data, thereby improving the efficiency of data reading.
  • a combined request cache submodule which is used to cache the storage addresses corresponding to multiple read requests during the period when the data reading and writing module reads discrete data, so that each The read requests are combined to read one or more discrete data, thereby improving the efficiency of data reading.
  • the storage module 25 may be a cache (VAA Cache, VAC for short) that implements vector address access, and the present disclosure does not limit the specific type of the storage module 25.
  • VAA Cache VAC for short
  • the decoding module 21 is also used to determine the first storage address of the continuous data according to the base address of the continuous data and the data size of the continuous data; and send the first storage address to the continuous data buffer module 23
  • the first storage address may be an address where continuous data is stored in an external memory (such as RAM).
  • Single Point Continuous Addr[n] represents the data address of the nth continuous data
  • Continuous Data Base Address represents the base address of the continuous data
  • Single Point Data Size represents the data size of the continuous data.
  • the base address is, for example, Addr1[0, 3]
  • the size of a single data is 4 bits
  • n is 3, it can be determined that the data address of the third continuous data is Addr1[8, 11].
  • the continuous data buffer module 23 can establish a buffer space for continuous data; buffer the continuous data of the first storage address in the buffer space and send it to the data read/write module, or The continuous data received from the data reading and writing module is buffered in the buffer space and sent to the first storage address.
  • the data read and write module 24 sends the read discrete data to the continuous data buffer module 23, and the continuous data buffer module 23 buffers the discrete data as continuous data in the buffer space, and Send the buffered continuous data to the first storage address.
  • the continuous data cache module 23 can read continuous data from the first storage address of an external memory (such as RAM) through BUS PORT SHARE, and cache the read continuous data to The buffer space is sequentially sent to the data reading and writing module 24, so that the data reading and writing module 24 stores each continuous data in the second storage address of the storage module 25 to obtain discrete data.
  • processing instructions can be decoded and executed, discrete data can be transported to continuous data addresses, or continuous data can be stored in discrete multiple data addresses, so as to implement discrete data vector operations and vector operations.
  • Vector data restoration thereby simplifying the processing process and reducing data overhead.
  • the storage address corresponding to the read request can be cached when reading discrete data, so that each data read request can be combined to read one or more discrete data, thereby improving data reading efficient.
  • Fig. 3 shows a block diagram of a data processing device according to an embodiment of the present disclosure.
  • the discrete address determining module 22 includes:
  • the Load Offset Buffer (LOB) 221 is used to determine the offset value storage address of each discrete data according to the offset value base address and offset value width of the discrete data, and determine the offset value storage address of each discrete data from the The offset value storage address reads the offset value of each discrete data;
  • the discrete address generation submodule (Scatter Addr Generate, SAG for short) 222 is used to determine the second storage address of each discrete data according to the base address of the discrete data and the offset value of each discrete data, and to store the second storage address. The address is sent to the data reading and writing module.
  • the offset loading submodule 221 can cache the base address of the discrete data and the data offset address of the discrete data sent by the decoding module 21; according to the offset value base address and the offset value width in the data offset address, pass BUS PORT SHARE reads the offset value (Offset) of each discrete data from an external memory (such as RAM); caches the read offset value, and stores each discrete data in the base of the storage module 25 The address and offset value are sent to the discrete address generation sub-module 222 in order.
  • an external memory such as RAM
  • the discrete address generation sub-module 222 may sequentially calculate the second storage address of each discrete data according to the base address and offset value of each discrete data.
  • Single Point Scatter Addr[n] represents the second storage address of the nth discrete data
  • Scatter Data Base Address represents the base address of the discrete data
  • Offset Address[n] represents the offset of the nth discrete data. Shift value.
  • the base address is Addr2[4] and the offset value is [24, 27], it can be determined that the second storage address of the nth discrete data is Addr2[28, 31].
  • the discrete address generation sub-module 222 may send the calculated second storage address of the discrete data to the data reading and writing module 24 in order, so that the data reading and writing module 24 can read or write the discrete data. data.
  • the data reading and writing module 24 may include:
  • a discrete address queue (Scatter Addr Queue, SAQ for short) sub-module 241 is used to receive and store the second storage address of discrete data
  • the Load Store Queue (LSQ for short) sub-module 242 is used to read discrete data from the storage module according to the second storage address of the discrete data, and send the read discrete data to the continuous data buffer Module; or receive the continuous data of the continuous data buffer module, and write the received continuous data into the storage module according to the storage address of the discrete data.
  • the discrete address queue sub-module 241 may receive and buffer the second storage address of each discrete data to form a discrete address queue, and establish the correspondence between each discrete data and the cache address in the cache space of the continuous data cache module 23. So that the read discrete data can be placed in the buffer space accordingly.
  • the storage load queue submodule 242 may include multiple storage load queues, for example, 4 storage load queues LSQ_0, LSQ_1, LSQ_2, LSQ_3 in FIG. 3, so as to improve the reading or writing of discrete data.
  • each storage load queue LSQ can be, for example, a first-in first-out FIFO memory.
  • the discrete address queue submodule 241 sends the second storage address of each discrete data to each storage load queue in order; when reading discrete data, each storage load queue reads the discrete data from the storage module, and reads the The discrete data is sent to the corresponding buffer address in the buffer space; when writing discrete data, each storage load queue receives the continuous data of each buffer address in the buffer space, and writes each continuous data to the corresponding second in the storage module. Storage address.
  • the merge request cache sub-module 243 of the data reading and writing module 24 may include merge request caches MRB_0, MRB_1, MRB_2 corresponding to the storage load queues LSQ_0, LSQ_1, LSQ_2, LSQ_3 , MRB_3, respectively connected to the corresponding storage load queue and storage module 25.
  • the merge request cache sub-module 243 is used to:
  • the target address is cached, and the read request is sent to the storage module, where the read request is used to request the storage module to return Multiple data of the target cache line where the target address is located;
  • the storage module When the storage module returns multiple data of the target cache line, backfill one or more data to the storage load queue sub-module, and the one or more data is sent to the merge request cache sub-module Among the requested data, data whose address is in the target cache line.
  • the corresponding merge request buffer MRB can buffer multiple discrete data addresses (for example, 8 addresses).
  • Those skilled in the art can set the number of addresses that can be cached in each merge request cache MRB according to the actual situation, which is not limited in the present disclosure.
  • the second storage address of the discrete data is sent to the storage load queue LSQ, and the storage load queue LSQ sends a read request, and the read request includes the waiting
  • the storage module will return multiple data of the target cache line (Cache Line) where the target address is located according to the read request.
  • the storage load queue LSQ may first send the read request to the merge request buffer MRB.
  • the merge request cache MRB receives a read request, it determines whether it has cached an address that is in the same cache line as the target address of the read request. If the merge request cache MRB has cached an address that is in the same cache line as the target address of the read request, when the memory module returns the data at this address, it will return the data of the entire cache line, that is, it can merge the data returned to the target address. There is no need to repeatedly send read requests. In this case, the merge request cache MRB may not send the read request to the storage module, so as to reduce the number of requests.
  • the merge request cache MRB may cache the target address and send the read request to the storage module, so that the storage module returns multiple data of the target cache line where the target address is located according to the read request.
  • the target address can be directly cached; if the merge request cache MRB is full, the address with the earliest cache time in the MRB can be deleted, and the cached address The target address.
  • the merge request cache MRB may backfill one or more data to the storage load queue LSQ, thereby completing the data reading process.
  • the one or more data is data whose address is in the target cache line among the data for which the read request has been sent to the MRB. That is, at least the data at the target address can be read, and other data in the same target cache line may also be read.
  • the merge request cache sub-module is further configured to delete the target address associated with the read request. That is, after the data reading process of the target address of the read request is completed, the target address may be deleted, so as to release the cache space of the merge request cache MRB.
  • each read request can be combined to read one or more discrete data.
  • each read request may read two or more data, which significantly improves the data. Reading efficiency.
  • the processing instruction includes a data transfer instruction.
  • the decoded processing instruction is a data transfer instruction
  • the source data of the multiple data is discrete data
  • the destination data of the multiple data is Is continuous data
  • the source data base address is the base address of discrete data
  • the destination data base address is the base address of continuous data
  • the data read/write module is used for:
  • the continuous data buffer module is used for:
  • the continuous data in the buffer space When the continuous data in the buffer space reaches a first preset quantity, the continuous data in the buffer space is sent to the first storage address of the external memory.
  • a data handling instruction (Gather Load) is used to aggregate the contents of a group of discrete addresses into a continuous data vector.
  • the processing instruction decoded by the decoding module 21 is a data transfer instruction
  • the source data is discrete data
  • the destination data is continuous data.
  • the decoding module 21 can store the number of discrete data, and send the base address of the discrete data and the data offset address of the discrete data to the discrete address determination module 22, so that the discrete address determination module 22 calculates the second storage of each discrete data.
  • the address is sent to the data reading and writing module 24 in order; on the other hand, the decoding module 21 can determine the first storage address of the continuous data according to the base address of the continuous data and the data size of the continuous data; and send the first storage address To the continuous data buffer module 23. The specific processing process will not be repeated.
  • the data reading and writing module 24 can allocate a buffer space ID (Buffer ID) to each discrete data point in the discrete address queue SAQ according to the buffer pointer that can be allocated in the sequence of the continuous data buffer module 23, thereby Establish the correspondence between the second storage address of each discrete data and the Buffer ID. Then, the discrete address queue sub-module 241 sends the second storage address of each discrete data to each storage load queue in order, and each storage load queue reads each discrete data from the storage module 25 according to the second storage address, and reads each discrete data. The obtained discrete data is sent to the buffer space of the continuous data buffer module 23.
  • Buffer ID buffer space ID
  • the continuous data buffer module 23 may sequentially buffer each discrete data in the buffer space according to the Buffer ID of each discrete data to form continuous data (which may be referred to as vector data).
  • the continuous data in the buffer space reaches the first preset quantity, the continuous data in the buffer space is sent to the first storage address of the external memory through BUS PORT SHARE.
  • the first preset number may be equal to the number of continuous data that can be cached in the buffer space, that is, when the buffer space is full, the continuous data is sent to the first storage address of the external memory; the first preset number may also be The number of continuous data that can be cached is smaller than the cache space, and the present disclosure does not limit this.
  • vector data of a preset length can be obtained, and the execution of the data handling instruction is completed. Furthermore, the vector data can be further processed through the data operation instructions, for example, the four arithmetic operations between two or more vector data, such as the difference operation between two vector data.
  • the discrete data can be transported to the continuous address space through data transfer instructions to aggregate into vector data and then perform vector operations. In this way, the operation of discrete data points is converted into vector operation, which simplifies the processing process and reduces the data overhead.
  • the processing instruction includes a discrete storage instruction.
  • the decoded processing instruction is a discrete storage instruction
  • the source data of the multiple data is continuous data
  • the destination data of the multiple data is Is discrete data
  • the source data base address is the base address of continuous data
  • the destination data base address is the base address of discrete data
  • the continuous data buffer module is used for:
  • the data reading and writing module is used for:
  • the received continuous data is written into the storage module.
  • a discrete store instruction (Scatter Store) is used to store a continuous data vector in a set of discrete addresses.
  • the processing instruction decoded by the decoding module 21 is a discrete storage instruction
  • the source data is continuous data
  • the destination data is discrete data.
  • the decoding module 21 can store the number of discrete data, and send the base address of the discrete data and the data offset address of the discrete data to the discrete address determination module 22, so that the discrete address determination module 22 calculates the second storage of each discrete data.
  • the address is sent to the data reading and writing module 24 in order; on the other hand, the decoding module 21 can determine the first storage address of the continuous data according to the base address of the continuous data and the data size of the continuous data; and send the first storage address To the continuous data buffer module 23. The specific processing process will not be repeated.
  • the continuous data buffer module 23 can establish a buffer space for continuous data; according to the first storage address of the continuous data, share the data to an external memory (such as DRAM or CT-RAM) through BUS PORT SHARE. ) Send a data read request; and sequentially backfill the continuous data returned by the external memory into the buffer space.
  • an external memory such as DRAM or CT-RAM
  • BUS PORT SHARE Send a data read request; and sequentially backfill the continuous data returned by the external memory into the buffer space.
  • the continuous data in the buffer space reaches the second preset quantity, the continuous data in the buffer space is sent to the data reading and writing module 24.
  • the second preset number may be equal to the number of continuous data that can be cached in the buffer space, that is, continuous data is sent when the buffer space is full; the second preset number may also be less than the number of continuous data that can be buffered in the buffer space , This disclosure does not limit this.
  • the data reading and writing module 24 can allocate a buffer space ID (Buffer ID) to each discrete data point in the discrete address queue SAQ according to the buffer pointer that can be allocated in the sequence of the continuous data buffer module 23, thereby A correspondence relationship between the second storage address of each discrete data and the Buffer ID (also referred to as an index index) is established. Then, the discrete address queue submodule 241 sends the second storage address and Buffer ID of each discrete data to each storage load queue LSQ in sequence.
  • Buffer ID buffer space ID
  • the storage load queue LSQ when the storage load queue LSQ is ready to write data to the storage module, the data at the Buffer ID (also called exponential index) of the discrete data to be written should have been sent (or called Backfilled) to the storage load queue LSQ.
  • the storage load queue LSQ writes the discrete data to the corresponding second storage address in the storage module. In this way, discrete data is continuously written in sequence, and the process of writing all data can be completed.
  • the storage module 25 can also write discrete data into an external memory (such as DRAM or CT-RAM) through BUS PORT SHARE.
  • the decoding module 21 may read the state information of the storage load queue LSQ, the continuous data buffer module CDB, etc., so as to determine the execution state of the current instruction, and determine whether the execution of the current instruction has ended. After the execution of the current discrete storage instruction ends, the decoding module 21 may also send a clean VAC operation to the storage module (VAC) 25 to clear the data in the storage module and start executing a new instruction.
  • VAC storage module
  • the individual data of the vector data can be discretely stored in a discrete address space through discrete storage instructions, and multiple discrete data can be obtained, so that a large number of discrete paired data points can be performed in application scenarios such as image recognition.
  • the calculated vector is scattered and stored as discrete data points to obtain discrete processing results, thereby simplifying the processing process and reducing data overhead.
  • the processing instruction includes a data transfer instruction.
  • the decoded processing instruction is a data transfer instruction
  • the source data of the multiple data is discrete data
  • the destination data of the multiple data is For continuous data
  • the source data base address is the base address of discrete data
  • the destination data base address is the base address of continuous data.
  • the decoding module is also used for:
  • the decoded processing instruction is a data transport instruction, determining the transport mode of the data transport instruction;
  • the transfer mode of the data transfer instruction is the multi-vector transfer mode
  • the discrete address determination module is also used for:
  • the data reading and writing module is also used for:
  • the continuous data buffer module is also used for:
  • the transfer mode for which the data transfer instruction can be preset includes a single vector transfer mode (for example, expressed as Mode0), a multi-vector transfer mode (for example, expressed as Mode1), and the like.
  • the single vector transfer mode the data transfer instruction can aggregate multiple discrete data points into one vector data; in the multi-vector transfer mode, the data transfer instruction can aggregate multiple discrete data points into two or more vectors data.
  • the transport mode can be determined according to the field used to indicate the transport mode in the operation field of the data transport instruction, such as the field Mode0 or Mode1, which is not limited in the present disclosure.
  • the decoding module 21 may determine the transfer mode of the data transfer instruction (for example, expressed as Gather Load Offset Mode). If the transport mode of the data transport instruction is the multi-vector transport mode, the decoding module 21 can determine the source data address and the destination data address of each data point to be transported.
  • the source data address represents the current data storage address of multiple data points in the data storage space, which is discrete multiple data addresses; the destination data address represents the data address of the data storage space to which multiple data points will be transported, and is continuous Data address.
  • the data storage space where the source data address is located and the data storage space where the destination data address is located may be the same or different, which is not limited in the present disclosure.
  • the multi-vector transfer mode at least two vectors can be obtained.
  • the first vector can be set as the first vector data, and the other vectors are the second vector data (including at least one vector data).
  • the operation field of the data transfer instruction may include the source data base address (Source Data Base Address), destination data base address (Destination Data Base Address), and data offset address of the multiple first data of the first vector data; it also includes the corresponding The offset step size (Offset Stride) and the destination base address step size (Destination Base Address Stride) of the multiple second data in the second vector data.
  • the decoding module 21 can store the number of discrete data, and send the source data base address and data offset address of the multiple first data, and the offset step length of the multiple second data to the discrete address determining module 22, so that The discrete address determining module 22 calculates the fifth storage address of each first data in the storage module 25 and the sixth storage address of each second data in the storage module 25 (that is, the discrete source address).
  • the offset loading submodule 221 of the discrete address determining module 22 can be based on the source data base address and data offset address (including offset value base address and offset value width) of each first data. , Read the offset value (Offset) of each first data from external memory (such as RAM) through BUS PORT SHARE, buffer the read offset value, and store each first data in the storage module The base address and offset value in 25 are sent to the discrete address generation sub-module 222 in order.
  • the discrete address generation sub-module 222 can calculate the fifth storage address of each discrete data in sequence according to the base address and offset value of each first data, and send it to the data read/write module in sequence twenty four.
  • the fifth storage address can be expressed as:
  • Single Point Src Addr[2n] represents the fifth storage address of the n-th first data
  • Source Data Base Address represents the base address of the first data
  • Offset Address[n] represents the n-th first data.
  • the offset value of the data When the base address is Addr3[15], for example, and the offset value is [24,27], it can be determined that the fifth storage address of the nth first data is Addr3[39,42].
  • the discrete address generation sub-module 222 can directly obtain the corresponding second data based on the fifth storage address of the first data according to the offset stride.
  • the sixth storage address of the data can directly obtain the corresponding second data based on the fifth storage address of the first data according to the offset stride.
  • Single Point Src Addr[2n+1] represents the sixth storage address of the second data corresponding to the nth first data. For example, if the fifth storage address of the nth first data is Addr3[39,42] and the offset step is 8 bits, it can be determined that the sixth storage address of the nth second data is Addr3[ 47,50].
  • the offset step length can have multiple values, for example, the offset step length includes 4 bits and 8 bits. , 12 digits, etc. In this way, the sixth storage address of each group of second data can be determined according to different offset steps.
  • the offset step size can be set according to the actual situation, which is not limited in the present disclosure.
  • the storage address of the corresponding second data can be directly determined according to the storage address and offset step size of the first data, so that two or more data points can be obtained through one reading, so that the instruction can be compared Fewer addresses read more data (for example, read 100 data points through the offset value base address of 50 data points), thereby significantly reducing data overhead.
  • the discrete address queue sub-module 241 of the data reading and writing module 24 can receive and buffer the fifth storage address and the sixth storage address, and establish the buffer space of each discrete data and continuous data buffer module 23 Correspondence of the cache address, so that the read discrete data can be placed in the cache space accordingly.
  • each discrete data point in the discrete address queue SAQ is allocated a buffer ID (Buffer ID), thereby establishing the relationship between the second storage address of each discrete data and the Buffer ID. Correspondence between.
  • the discrete address queue sub-module 241 sends the fifth storage address and the sixth storage address to each storage load queue in sequence; each storage load queue reads the first discrete first from the storage module. The data and the second data, the read discrete data are sent to the corresponding cache address in the cache space in order.
  • the storage load queue can read the first data first and then the second data, or can read the first data and the second data at the same time, which is not limited in the present disclosure.
  • the decoding module 21 can determine the third storage address of each first data in the external memory and each first data according to the destination data base address and data size of the first data, and the destination base address step size of the multiple second data. Second, the fourth storage address of the data in the external memory (that is, the continuous destination address); and the third storage address and the fourth storage address are sent to the continuous data cache module 23.
  • the operation field of the data handling instruction may include the destination data base address (Destination Data Base Address), the size of a single data (Single Point Data Size), and the destination base address step (Destination Base Address Stride). )Wait. Since the destination data address is a continuous data address, the destination data address of each first data (referred to as the third storage address) can be determined directly according to the data size of the first data and the serial number of each first data. In the case of two vectors obtained by transport, the third storage address can be expressed as:
  • Single Point Dest Addr[n] represents the third storage address of the nth first data.
  • the destination data base address is, for example, Addr4[0,3]
  • the size of a single data is 4 bits
  • n is 3, it can be determined that the third storage address of the third first data is Addr4[8,11].
  • the fourth storage address of the corresponding second data can be obtained directly on the basis of the third storage address of the first data according to the destination base address step.
  • the fourth storage address can be expressed as:
  • Single Point Dest Addr[2n+1] represents the fourth storage address of the second data corresponding to the nth first data.
  • the third storage address of the nth first data is determined to be Addr4[8,11] according to the destination data base address and data size, and the destination base address step is 48 bits
  • the nth address can be determined
  • the fourth storage address of the second data is Addr4[56,59].
  • the destination base address step size can have multiple values, for example, the destination base address step size includes 48 bits, 96 bits etc. to store multiple vector data separately. In this way, the fourth storage address of each group of second data can be determined according to different destination base address steps.
  • the target base address step size can be set according to actual conditions, and the present disclosure does not limit this.
  • the destination data address of the corresponding second data can be directly determined according to the destination data address of the first data and the destination base address step size, so as to store each data of two or more vector data and significantly reduce the data. Overhead.
  • the continuous data buffer module 23 may separately establish buffer spaces for the plurality of first data and the plurality of second data; after receiving the first data and the second data from the data reading and writing module For the second data, each first data and each second data can be sequentially buffered in the buffer space according to the Buffer ID of each first data and each second data to form continuous vector data.
  • the first data and the second data in the cache space reach a third preset quantity, sending the continuous data in the cache space to the third storage address and the fourth storage address of the external memory.
  • the third preset number may be equal to the number of continuous data that can be cached in the buffer space, that is, when the buffer space is full, the continuous data is sent to the third storage address and the fourth storage address of the external memory; the third The preset number may also be less than the number of continuous data that can be cached in the cache space, which is not limited in the present disclosure.
  • each first data and each second data are transported in sequence, and N continuous first data in the third storage address and N continuous second data stored in the fourth storage address are obtained, which realizes discrete The process in which the first data and the second data are aggregated into the first vector data and the second vector data, so as to realize data transfer and provide a data basis for subsequent processing.
  • the multiple vector data can be further processed through data operation instructions, such as the four arithmetic operations between two or more vector data, such as Perform difference calculations on two vector data, etc.
  • discrete pairs or groups of data can be transferred to multiple continuous address spaces through data transfer instructions in the multi-vector transfer mode, and aggregated into multiple vector data, which can be used in image recognition and other application scenarios
  • data transfer instructions in the multi-vector transfer mode
  • multiple vector data involved in the operation can be directly obtained through one instruction, and the operation of discrete data points is converted into vector operation, thereby simplifying the processing process. Reduce data overhead.
  • the processing instruction includes a vector expansion instruction.
  • the decoded processing instruction is a vector expansion instruction
  • the source data of the multiple data is continuous data
  • the destination data of the multiple data is For continuous data
  • the source data base address is the base address of continuous data
  • the destination data base address is the base address of continuous data.
  • the decoding module is also used for:
  • the decoded processing instruction is a vector expansion instruction
  • the continuous data buffer module is also used for:
  • the seventh storage address read the plurality of third data from an external memory and cache them in the cache space;
  • the decoding module is also used for:
  • the destination data base address and data size of the plurality of fourth data determine the eighth storage address of the plurality of fourth data, and send the plurality of fourth data and the eighth storage address to all The continuous data buffer module;
  • the continuous data buffer module is also used for:
  • the vector extension instruction (Vector Extension) is used to implement the extension and storage of a continuous data vector according to the extension parameters.
  • the processing instruction decoded by the decoding module 21 is a vector expansion instruction
  • the source data (may be called the third data)
  • the destination data (may be called the fourth data) are continuous data.
  • the source data base address of the plurality of third data, the destination data base address of the plurality of fourth data, the data size, and the expansion parameter in the operation domain of the vector expansion instruction can be determined.
  • the decoding module 21 may determine the seventh storage address of the plurality of third data in the external memory according to the source data base address and the data size of the plurality of third data, and store all the third data.
  • the seventh storage address is sent to the continuous data buffer module.
  • the continuous data buffer module 23 may establish buffer spaces for the plurality of third data and the plurality of fourth data respectively.
  • the continuous data cache module 23 can read a plurality of third data from the external memory and cache them in the cache space according to the seventh storage address; when the third data in the cache space reaches the fourth preset amount, the cache The plurality of third data are sent to the decoding module.
  • the fourth preset number may be equal to the number of continuous data that can be cached in the buffer space, that is, when the buffer space is full, the continuous data is sent to the decoding module; the fourth preset number may also be less than the number of continuous data that can be buffered in the buffer space.
  • the number of continuous data is not limited in this disclosure.
  • the decoding module 21 may expand the plurality of third data according to the plurality of third data from the continuous data buffer module 23 and the expansion parameter to obtain a plurality of fourth data.
  • the third data is M
  • the extended parameter includes M extended parameter bits corresponding to the M third data
  • M is an integer greater than 1
  • the expansion of the plurality of third data according to the plurality of third data from the continuous data buffer module and the expansion parameter to obtain a plurality of fourth data includes:
  • the plurality of fourth data is obtained.
  • the extended parameter may include M extended parameter bits, which respectively represent the number of times k m of copying the M third data.
  • M 5
  • the extended parameter may be expressed as [1,2,0,3,1], Indicates that the 5 third data are copied once, twice, 0 times, 3 times, and once respectively.
  • the third data for the m-th (1 ⁇ m ⁇ M), the m-th third data corresponding to the m-th bit is extended parameter k m (k m ⁇ 0), then It may determine the position of the m-th data having a K m of the m-th third data.
  • M third data [A,B,C,D,E] the expansion parameter is [1,2,0,3,1]
  • the obtained multiple fourth data is [A,B ,B,D,D,D,E]
  • the quantity of the fourth data may be different from the quantity of the third data.
  • the extended parameter may also include other extended content (for example, the value of each data point is enlarged or reduced by a certain multiple), and the extended parameter may also include other expressions, which can be set by those skilled in the art according to the actual situation. This is not limited.
  • the decoding module 21 may determine the eighth storage address of the plurality of fourth data according to the destination data base address and data size of the plurality of fourth data, and combine the plurality of fourth data The fourth data and the eighth storage address are sent to the continuous data buffer module.
  • the continuous data cache module 23 may cache the plurality of fourth data in the cache space; when the fourth data in the cache space reaches a fifth preset amount, cache the The plurality of fourth data is sent to the eighth storage address of the external memory.
  • the fifth preset number may be equal to the number of continuous data that can be cached in the cache space, that is, when the cache space is full, the continuous data is sent to the external memory; the fifth preset number may also be less than the number of continuous data that can be cached in the cache space.
  • the number of continuous data is not limited in this disclosure.
  • the vector can be expanded through the vector expansion instruction, so that when the vector data needs to be expanded in application scenarios such as image recognition, the original vector can be expanded into a new vector and stored in a continuous address space, thereby simplifying Processing process, reduce data overhead.
  • the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist.
  • the modules are integrated together.
  • the above-mentioned integrated unit/module can be realized in the form of hardware or software program module.
  • the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Static random access memory SRAM Static Random-Access Memory
  • enhanced dynamic random access memory EDRAM Enhanced Dynamic Random Access Memory
  • high-bandwidth memory HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • an artificial intelligence chip is also disclosed, which includes the above-mentioned data processing device.
  • an electronic device is also disclosed, and the electronic device includes the aforementioned artificial intelligence chip.
  • a board card which includes a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip; wherein, the artificial intelligence chip is connected to the storage device and the control device. And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.
  • Fig. 4 shows a structural block diagram of a board card according to an embodiment of the present disclosure.
  • the board card may include other supporting components in addition to the chip 389 described above.
  • the supporting components include, but are not limited to: a storage device 390, Interface device 391 and control device 392;
  • the storage device 390 is connected to the artificial intelligence chip through a bus for storing data.
  • the storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips).
  • the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the artificial intelligence chip.
  • the interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the above other interfaces, as long as the interface unit can realize the switching function.
  • the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the artificial intelligence chip.
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the artificial intelligence chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, which can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
  • an electronic device which includes the aforementioned artificial intelligence chip.
  • Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • a data processing device comprising a decoding module, a discrete address determination module, a continuous data buffer module, a data reading and writing module, and a storage module,
  • the decoding module is used to decode the received processing instruction to obtain the decoded processing instruction; determine multiple data corresponding to the processing instruction, and the source data base address and destination data base address of the multiple data , The data offset address of the discrete data and the data size of the continuous data, the source data of the multiple data includes discrete data or continuous data; the decoding module is also used to according to the base address of the continuous data and the data size of the continuous data, Determine the first storage address of continuous data;
  • the discrete address determining module is connected to the decoding module and the data reading and writing module, and is configured to determine the second storage address of the discrete data according to the base address of the discrete data and the data offset address of the discrete data; Sending the second storage address to the data reading and writing module;
  • the continuous data buffer module is connected to the decoding module and the data read/write module, and is used to establish a buffer space for continuous data; buffer the continuous data of the first storage address in the buffer space and send it to the The data reading and writing module, or the continuous data received from the data reading and writing module is buffered in the buffer space and sent to the first storage address;
  • the data reading and writing module is connected to the storage module, and is configured to read discrete data from the storage module according to the second storage address of the discrete data, and send the read discrete data to the continuous data buffer module; Or receive the continuous data of the continuous data buffer module, and write the received continuous data into the storage module according to the storage address of the discrete data,
  • the data read-write module includes a combined request cache sub-module, which is used to cache storage addresses corresponding to multiple read requests during the period when the data read-write module reads discrete data, so that each read request can be read in a combined manner. Get one or more discrete data.
  • A2 The device according to A1, wherein the data offset address includes an offset value base address and an offset value width, and the discrete address determination module includes:
  • the offset loading sub-module is used to determine the offset value storage address of each discrete data according to the offset value base address and offset value width of the discrete data, and read each discrete data from the offset value storage address of each discrete data The offset value of the data;
  • the discrete address generation sub-module is used to determine the second storage address of each discrete data according to the base address of the discrete data and the offset value of each discrete data, and send the second storage address to the data reading and writing module .
  • A3 The device according to A1 or A2, wherein the data reading and writing module includes:
  • Discrete address queue sub-module for receiving and storing the second storage address of discrete data
  • the storage and loading queue sub-module is used to read discrete data from the storage module according to the second storage address of the discrete data, and send the read discrete data to the continuous data buffer module; or receive the continuous data buffer For the continuous data of the module, the received continuous data is written into the storage module according to the storage address of the discrete data.
  • A4 The device according to A3, wherein the merge request cache submodule is connected to the storage load queue submodule and the storage module, and the merge request cache submodule is configured to:
  • the target address is cached, and the read request is sent to the storage module, where the read request is used to request the storage module to return Multiple data of the target cache line where the target address is located;
  • the storage module When the storage module returns multiple data of the target cache line, backfill one or more data to the storage load queue sub-module, and the one or more data is sent to the merge request cache sub-module Among the requested data, data whose address is in the target cache line.
  • the merge request cache submodule is further configured to:
  • A6 The device according to any one of A1-A5, wherein the processing instruction includes a data transfer instruction, and when the decoded processing instruction is a data transfer instruction, the source data of the multiple data is discrete data, the The destination data of the multiple data is continuous data, the source data base address is the base address of discrete data, and the destination data base address is the base address of continuous data,
  • the data reading and writing module is used for:
  • the continuous data buffer module is used for:
  • the continuous data in the buffer space When the continuous data in the buffer space reaches a first preset quantity, the continuous data in the buffer space is sent to the first storage address of the external memory.
  • the processing instruction includes a discrete storage instruction
  • the decoded processing instruction is a discrete storage instruction
  • the source data of the multiple data is continuous data
  • the The destination data of the multiple data is discrete data
  • the source data base address is the base address of continuous data
  • the destination data base address is the base address of discrete data
  • the continuous data buffer module is used for:
  • the data reading and writing module is used for:
  • the received continuous data is written into the storage module.
  • the processing instruction includes a data transfer instruction, and when the decoded processing instruction is a data transfer instruction, the source data of the multiple data is discrete data, and the multiple The destination data of each data is continuous data, the source data base address is the base address of discrete data, and the destination data base address is the base address of continuous data,
  • the decoding module is also used for:
  • the decoded processing instruction is a data transport instruction, determining the transport mode of the data transport instruction;
  • the transfer mode of the data transfer instruction is the multi-vector transfer mode
  • the discrete address determination module is also used for:
  • the data reading and writing module is also used for:
  • the continuous data buffer module is also used for:
  • the processing instruction includes a vector expansion instruction
  • the decoded processing instruction is a vector expansion instruction
  • the source data of the multiple data is continuous data
  • the The destination data of the multiple data is continuous data
  • the source data base address is the base address of continuous data
  • the destination data base address is the base address of continuous data
  • the decoding module is also used for:
  • the decoded processing instruction is a vector expansion instruction
  • the continuous data buffer module is also used for:
  • the seventh storage address read the plurality of third data from an external memory and cache them in the cache space;
  • the decoding module is also used for:
  • the destination data base address and data size of the plurality of fourth data determine the eighth storage address of the plurality of fourth data, and send the plurality of fourth data and the eighth storage address to all The continuous data buffer module;
  • the continuous data buffer module is also used for:
  • the third data is M
  • the extended parameter includes M extended parameter bits corresponding to the M third data
  • M is an integer greater than 1
  • the expansion of the plurality of third data according to the plurality of third data from the continuous data buffer module and the expansion parameter to obtain a plurality of fourth data includes:
  • the plurality of fourth data is obtained.
  • An artificial intelligence chip comprising the data processing device according to any one of A1-A10.
  • A12 An electronic device including the artificial intelligence chip as described in A11.
  • a board card comprising: a storage device, an interface device, a control device, and the artificial intelligence chip as described in A11;
  • the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.

Abstract

一种数据处理装置以及相关产品。所述产品包括控制模块,所述控制模块包括:指令缓存单元、指令处理单元和存储队列单元;所述指令缓存单元,用于存储人工神经网络运算关联的计算指令;所述指令处理单元,用于对所述计算指令解析得到多个运算指令;所述存储队列单元,用于存储指令队列,该指令队列包括:按该队列的前后顺序待执行的多个运算指令或计算指令。通过以上装置可以提高相关产品在进行神经网络模型的运算时的运算效率。

Description

数据处理装置以及相关产品
本申请要求在2020年05月08日提交中国专利局、申请号为202010382526.6、申请名称为“数据处理装置以及相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,特别是涉及一种数据处理装置以及相关产品。
背景技术
随着人工智能技术的发展,其在图像识别等领域取得了较好的效果。在图像识别过程中,可能需要对大量离散的数据点进行处理(例如进行差值运算等),然而,相关技术中对离散数据点的处理过程较为复杂,数据开销较大。
发明内容
基于此,有必要针对上述技术问题,提供一种数据处理装置以及相关产品。
根据本公开的一方面,提供了一种数据处理装置,该装置包括:解码模块、离散地址确定模块、连续数据缓存模块、数据读写模块及存储模块,
所述解码模块,用于对接收到的处理指令进行解码,得到解码后的处理指令;确定所述处理指令对应的多个数据,以及所述多个数据的源数据基地址、目的数据基地址、离散数据的数据偏移地址及连续数据的数据尺寸,所述多个数据的源数据包括离散数据或连续数据;所述解码模块还用于根据连续数据的基地址及连续数据的数据尺寸,确定连续数据的第一存储地址;
所述离散地址确定模块,连接到所述解码模块和所述数据读写模块,用于根据离散数据的基地址及离散数据的数据偏移地址,确定离散数据的第二存储地址;将所述第二存储地址发送给所述数据读写模块;
所述连续数据缓存模块,连接到所述解码模块和所述数据读写模块,用于建立连续数据的缓存空间;将所述第一存储地址的连续数据缓存到所述缓存空间并发送到所述数据读写模块,或者将从所述数据读写模块接收到的连续数据缓存到所述缓存空间并发送到第一存储地址;
所述数据读写模块,连接到所述存储模块,用于根据离散数据的第二存储地址从所述存储模块读取离散数据,将读取到的离散数据发送给所述连续数据缓存模块;或者接收所述连续数据缓存模块的连续数据,根据离散数据的存储地址将接收到的连续数据写入所述存储模块,
其中,所述数据读写模块包括合并请求缓存子模块,用于在所述数据读写模块读取离散数据期间,缓存多个读取请求对应的存储地址,以使每个读取请求合并读取到一个或多个离散数据。
根据本公开的另一方面,提供了一种人工智能芯片,所述芯片包括如上所述的数据处理装置。
根据本公开的另一方面,提供了一种电子设备,所述电子设备包括如上所述的人工智能芯片。
根据本公开的另一方面,提供了一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如上所述的人工智能芯片;其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;所述存储器件,用于存储数据;所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;所述控制器件,用于对所述人工智能芯片的状态进行监控。
根据本公开的实施例,能够解码并执行处理指令,将离散数据搬运到连续的数据地址,或将连续数据存储到离散的多个数据地址,以便于实现离散数据的向量运算以及向量运算后的向量数据还原,从而简化处理过程,减小数据开销。并且,根据本公开的实施例能够在读取离散数据时通过缓存读取请求对应的存储地址,以使每个数据读取请求合并读取到一个或多个离散数据,从而提高数据的读取效率。
通过权要中的技术特征进行推导,能够达到对应背景技术中的技术问题的有益效果。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。
图1示出根据本公开实施例的数据处理装置的处理器的示意图;
图2示出根据本公开实施例的数据处理装置的框图;
图3示出根据本公开实施例的数据处理装置的框图;
图4示出根据本公开实施例的板卡的结构框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
应当理解,本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本公开说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
根据本公开实施例的数据处理装置可应用于处理器中,该处理器可以是通用处理器,例如CPU(Central Processing Unit,中央处理器),也可以是用于执行人工智能运算的人工智能处理器(IPU)。人工智能运算可包括机器学习运算,类脑运算等。其中,机器学习运算包括神经网络运算、k-means运算、支持向量机运算等。该人工智能处理器可例如包括GPU(Graphics Processing Unit,图形处理单元)、NPU(Neural-Network Processing Unit,神经网络处理单元)、DSP(Digital Signal Process,数字信号处理单元)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片中的一种或组合。本公开对处理器的具体类型不作限制。
在一种可能的实现方式中,本公开中所提及的处理器可包括多个处理单元,每个处理单元可以独立运行所分配到的各种任务,如:卷积运算任务、池化任务或全连接任务等。本公开对处理单元及处理单元所运行的任务不作限制。
图1示出根据本公开实施例的数据处理装置的处理器的示意图。如图1所示,处理器100包括多个处理单元101以及存储单元102,多个处理单元101用于执行指令序列,存储单元102用于存储数据,可包括随机存储器(RAM,Random Access Memory)和寄存器堆。处理器100中的多个处理单元101既可共用部分存储空间,例如共用部分RAM存储空间和寄存器堆,又可同时拥有各自的存储空间。
图2示出根据本公开一实施例的数据处理装置的框图。如图2所示,该装置包括:解码模块(Decoder,简称DEC)21、离散地址确定模块22、连续数据缓存模块(Continuous Data Buffer,简称CDB)23、数据读写模块24及存储模块25:
所述解码模块21,用于对接收到的处理指令进行解码,得到解码后的处理指令;确定所述处理指令对应的多个数据,以及所述多个数据的源数据基地址、目的数据基地址、离散数据的数据偏移地址及连续数据的数据尺寸,所述多个数据的源数据包括离散数据或连续数据;所述解码模块还用于根据连续数据的基地址及连续数据的数据尺寸,确定连续数据的第一存储地址;
所述离散地址确定模块22,连接到所述解码模块21和所述数据读写模块24,用于根据离散数据的基地址及离散数据的数据偏移地址,确定离散数据的第二存储地址;将所述第二存储地址发送给所述数据读写模块;
所述连续数据缓存模块23,连接到所述解码模块21和所述数据读写模块24,用于建立连续数据的 缓存空间;将所述第一存储地址的连续数据缓存到所述缓存空间并发送到所述数据读写模块,或者将从所述数据读写模块接收到的连续数据缓存到所述缓存空间并发送到第一存储地址;
所述数据读写模块24,连接到所述存储模块25,用于根据离散数据的第二存储地址从所述存储模块读取离散数据,将读取到的离散数据发送给所述连续数据缓存模块;或者接收所述连续数据缓存模块的连续数据,根据离散数据的存储地址将接收到的连续数据写入所述存储模块,
其中,所述数据读写模块包括合并请求缓存(Merge Request Buffer,简称MRB)子模块,用于在所述数据读写模块读取离散数据期间,缓存多个读取请求对应的存储地址,以使每个读取请求合并读取到一个或多个离散数据。
根据本公开实施例的数据处理装置可实现向量地址访问(Vector Address Access,简称VAA)的功能,能够支持相应的功能指令,例如,数据搬运指令(Gather Load),用于将一组离散地址的内容聚合成连续的数据向量;离散存储指令(Scatter Store),用于将连续的数据向量分散存储到一组离散的地址中;向量扩展指令(Vector Extension),用于根据扩展参数实现一个连续的数据向量的扩展及存储等。本公开对该装置所支持的指令的数量及种类不作限制。
举例来说,解码模块21可从该装置上游的指令分发队列(Issue Queue,简称ISQ)中获取待解码的处理指令,并对该处理指令进行解码(也可称为译码),得到解码后的处理指令。该解码后的处理指令包括操作码及操作域,操作码用于指示该处理指令的处理类型,操作域用于指示待处理的数据及数据参数。
在一种可能的实现方式中,解码模块21可根据操作域确定解码后的处理指令对应的多个数据以及多个数据的数据参数,例如多个数据的源数据基地址、目的数据基地址、离散数据的数据偏移地址及连续数据的数据尺寸(Single Point Data Size)等。其中,多个数据的源数据包括离散数据或连续数据,例如在处理指令为数据搬运指令时,所述多个数据的源数据为离散数据,所述多个数据的目的数据为连续数据;在解码后的处理指令为离散存储指令时,所述多个数据的源数据为连续数据,所述多个数据的目的数据为离散数据。其中,解码模块21可存储待处理的数据的数量(Single Point Data Number)及各个数据的数据参数,并将离散数据的基地址及离散数据的数据偏移地址发送到离散地址确定模块22。
在一种可能的实现方式中,离散地址确定模块22可根据接收到的离散数据的基地址及离散数据的数据偏移地址,确定离散数据的第二存储地址。其中,离散数据的数据偏移地址可包括离散数据的偏移值(Offset)存储在外部存储器(例如RAM)中的偏移值基地址(Offset Vector Base Address)及偏移值宽度(Offset Size)。离散地址确定模块22可根据偏移值基地址及偏移值宽度,通过总线端口共享(BUS PORT SHARE)从外部存储器(例如RAM)读取各个离散数据的偏移值(Offset);进而根据各个离散数据存储在存储模块25中的偏移值基地址以及该偏移值,分别计算出各个离散数据的在存储模块25中的第二存储地址;再将各个离散数据的第二存储地址顺序发送给数据读写模块24。
在一种可能的实现方式中,数据读写模块24可根据离散数据的第二存储地址,从存储模块25读取离散数据,或将离散数据写入存储模块25。例如,在处理指令为数据搬运指令时,数据读写模块24可根据第二存储地址从存储模块25按顺序读取各个离散数据,并将读取到的离散数据发送到连续数据缓存模块23中;在处理指令为离散存储指令时,数据读写模块24可接收连续数据缓存模块23发送的连续数据,并根据离散数据的第二存储地址将接收到的连续数据写入存储模块25中。
在一种可能的实现方式中,数据读写模块可包括合并请求缓存子模块,用于在所述数据读写模块读取离散数据期间,缓存多个读取请求对应的存储地址,以使每个读取请求合并读取到一个或多个离散数据,从而提高数据的读取效率。
在一种可能的实现方式中,存储模块25可以为实现向量地址访问的高速缓存(VAA Cache,简称VAC),本公开对存储模块25的具体类型不作限制。
在一种可能的实现方式中,解码模块21还用于根据连续数据的基地址及连续数据的数据尺寸,确定连续数据的第一存储地址;并将第一存储地址发送到连续数据缓存模块23,该第一存储地址可为连续数据存储在外部存储器(例如RAM)中的地址。
Single Point Continuous Addr[n]=Continuous Data Base Address+(n-1)*Single Point Data Size     (1)
在公式(1)中,Single Point Continuous Addr[n]表示第n个连续数据的数据地址,Continuous Data Base Address表示连续数据的基地址,Single Point Data Size表示连续数据的数据尺寸。在基地址例如为Addr1[0,3],单个数据的尺寸为4位,n为3时,可确定出第3个连续数据的数据地址为Addr1[8,11]。
在一种可能的实现方式中,连续数据缓存模块23可建立连续数据的缓存空间;将所述第一存储地址的连续数据缓存到所述缓存空间并发送到所述数据读写模块,或者将从所述数据读写模块接收到的连续数据缓存到所述缓存空间并发送到第一存储地址。
例如,在处理指令为数据搬运指令时,数据读写模块24将读取到的离散数据发送到连续数据缓存模块23中,连续数据缓存模块23在缓存空间中将离散数据缓存为连续数据,并将缓存的连续数据发送到第一存储地址。在处理指令为离散存储指令时,连续数据缓存模块23可通过总线端口共享(BUS PORT SHARE)从外部存储器(例如RAM)的第一存储地址读取连续数据,将读取到的连续数据缓存到缓存空间,并顺序发送到数据读写模块24,以使数据读写模块24将各个连续数据存储到存储模块25的第二存储地址中,得到离散数据。
根据本公开的实施例,能够解码并执行处理指令,将离散数据搬运到连续的数据地址,或将连续数据存储到离散的多个数据地址,以便于实现离散数据的向量运算以及向量运算后的向量数据还原,从而简化处理过程,减小数据开销。并且,根据本公开的实施例能够在读取离散数据时通过缓存读取请求对应的存储地址,以使每个数据读取请求合并读取到一个或多个离散数据,从而提高数据的读取效率。
图3示出根据本公开一实施例的数据处理装置的框图。如图3所示,在一种可能的实现方式中,离散地址确定模块22包括:
偏移加载子模块(Load Offset Buffer,简称LOB)221,用于根据离散数据的偏移值基地址及偏移值宽度,分别确定各个离散数据的偏移值存储地址,并从各个离散数据的偏移值存储地址读取各个离散数据的偏移值;
离散地址生成子模块(Scatter Addr Generate,简称SAG)222,用于根据离散数据的基地址以及各个离散数据的偏移值,分别确定各个离散数据的第二存储地址,并将所述第二存储地址发送给所述数据读写模块。
举例来说,偏移加载子模块221可缓存解码模块21发送的离散数据的基地址及离散数据的数据偏移地址;根据数据偏移地址中的偏移值基地址及偏移值宽度,通过总线端口共享(BUS PORT SHARE)从外部存储器(例如RAM)读取各个离散数据的偏移值(Offset);缓存读取到的偏移值,并将各个离散数据存储在存储模块25中的基地址及偏移值按顺序发送到离散地址生成子模块222。
在一种可能的实现方式中,离散地址生成子模块222可根据各个离散数据的基地址及偏移值,按顺序计算各个离散数据的第二存储地址。
Single Point Scatter Addr[n]=Scatter Data Base Address+Offset Address[n]     (2)
在公式(2)中,Single Point Scatter Addr[n]表示第n个离散数据的第二存储地址,Scatter Data Base Address表示离散数据的基地址,Offset Address[n]表示第n个离散数据的偏移值。在基地址例如为Addr2[4],偏移值为[24,27]时,可确定出第n个离散数据的第二存储地址为Addr2[28,31]。
在一种可能的实现方式中,离散地址生成子模块222可将计算得到的离散数据的第二存储地址按顺序发送到数据读写模块24,以便于数据读写模块24读取或写入离散数据。
通过这种方式,可确定出各个离散数据的第二存储地址。
如图3所示,在一种可能的实现方式中,数据读写模块24可包括:
离散地址队列(Scatter Addr Queue,简称SAQ)子模块241,用于接收并存储离散数据的第二存储地址;
存储加载队列(Load Store Queue,简称LSQ)子模块242,用于根据离散数据的第二存储地址,从所述存储模块读取离散数据,将读取到的离散数据发送给所述连续数据缓存模块;或者接收所述连 续数据缓存模块的连续数据,根据离散数据的存储地址将接收到的连续数据写入所述存储模块。
举例来说,离散地址队列子模块241可接收并缓存各个离散数据的第二存储地址,形成离散地址队列,并建立各个离散数据与连续数据缓存模块23的缓存空间中的缓存地址的对应关系,以使读取到的离散数据能够相应地放置到缓存空间中。
在一种可能的实现方式中,存储加载队列子模块242可包括多个存储加载队列,例如图3中为4个存储加载队列LSQ_0、LSQ_1、LSQ_2、LSQ_3,以便提高离散数据的读取或写入速度,每个存储加载队列LSQ可例如为先进先出FIFO存储器。离散地址队列子模块241将各个离散数据的第二存储地址按顺序分别发送到各个存储加载队列;在读取离散数据时,各个存储加载队列分别从存储模块读取离散数据,将读取到的离散数据发送到缓存空间中相应的缓存地址;在写入离散数据时,各个存储加载队列分别接收缓存空间中各个缓存地址的连续数据,并将各个连续数据分别写入存储模块中相应的第二存储地址。
通过这种方式,可实现离散数据的读取或写入过程,从而完成相应的处理指令。
在一种可能的实现方式中,如图3所示,数据读写模块24的合并请求缓存子模块243可包括与存储加载队列LSQ_0、LSQ_1、LSQ_2、LSQ_3对应的合并请求缓存MRB_0、MRB_1、MRB_2、MRB_3,分别连接到对应的存储加载队列及存储模块25。其中,合并请求缓存子模块243用于:
在接收到存储加载队列子模块的读取请求时,判断是否缓存有与所述读取请求的目标地址处于同一缓存行的地址;
在未缓存与所述目标地址处于同一缓存行的地址时,缓存所述目标地址,并向所述存储模块发送所述读取请求,其中,所述读取请求用于请求所述存储模块返回所述目标地址所在的目标缓存行的多个数据;
在所述存储模块返回所述目标缓存行的多个数据时,向所述存储加载队列子模块回填一个或多个数据,所述一个或多个数据为已向所述合并请求缓存子模块发送读取请求的数据中,地址在所述目标缓存行的数据。
举例来说,对于任意一个存储加载队列LSQ,对应的合并请求缓存MRB可缓存多个离散数据的地址(例如8个地址)。本领域技术人员可根据实际情况设置每个合并请求缓存MRB这可缓存的地址数量,本公开对此不作限制。
在一种可能的实现方式中,在数据读写模块读取离散数据时,离散数据的第二存储地址被发送到存储加载队列LSQ,存储加载队列LSQ发出读取请求,读取请求中包括待读取的数据的目标地址,存储模块会根据读取请求返回目标地址所在的目标缓存行(Cache Line)的多个数据。
在一种可能的实现方式中,存储加载队列LSQ可将读取请求先发送到合并请求缓存MRB。合并请求缓存MRB在接收到读取请求时,判断自身是否已经缓存有与读取请求的目标地址处于同一缓存行的地址。如果合并请求缓存MRB已缓存有与读取请求的目标地址处于同一缓存行的地址,则存储模块返回该地址的数据时,会返回整个缓存行的数据,也即能够合并返回目标地址的数据,不需要重复发送读取请求。在该情况下,合并请求缓存MRB可不向存储模块发送该读取请求,以便减少请求的次数。
在一种可能的实现方式中,如果合并请求缓存MRB未缓存与所述目标地址处于同一缓存行的地址,则无法合并返回目标地址的数据,需要向存储模块发送所述读取请求。在该情况下,合并请求缓存MRB可缓存所述目标地址,并向所述存储模块发送所述读取请求,以使存储模块根据读取请求返回目标地址所在的目标缓存行的多个数据。
在一种可能的实现方式中,如果合并请求缓存MRB未存满,则可直接缓存所述目标地址;如果合并请求缓存MRB已存满,则可删除MRB中缓存时间最早的地址,并缓存所述目标地址。
在一种可能的实现方式中,在存储模块返回所述目标缓存行的多个数据时,合并请求缓存MRB可向存储加载队列LSQ回填一个或多个数据,从而完成数据读取过程。其中,该一个或多个数据为已向MRB发送读取请求的数据中,地址在所述目标缓存行的数据。也即,至少能够读取到目标地址的数据,还可能读取到同在目标缓存行中的其他数据。
在一种可能的实现方式中,所述合并请求缓存子模块还用于:删除与所述读取请求的目标地址。 也就是说,在完成读取请求的目标地址的数据读取过程后,可删除该目标地址,以便释放合并请求缓存MRB的缓存空间。
通过这种方式,可使得每个读取请求合并读取到一个或多个离散数据,在实际应用中,每个读取请求可能读取到两个或两个以上的数据,显著提高了数据的读取效率。
下面对通过数据处理装置执行各种处理指令的过程进行说明。
在一种可能的实现方式中,所述处理指令包括数据搬运指令,在解码后的处理指令为数据搬运指令时,所述多个数据的源数据为离散数据,所述多个数据的目的数据为连续数据,所述源数据基地址为离散数据的基地址,所述目的数据基地址为连续数据的基地址,所述数据读写模块用于:
根据离散数据的存储地址从存储模块读取离散数据;
将读取到的离散数据发送给所述连续数据缓存模块;
所述连续数据缓存模块用于:
将从所述数据读写模块接收到的离散数据缓存到所述缓存空间,得到连续数据;
在所述缓存空间中的连续数据达到第一预设数量时,将所述缓存空间中的连续数据发送到外部存储器的第一存储地址。
举例来说,数据搬运指令(Gather Load)用于将一组离散地址的内容聚合成连续的数据向量。当解码模块21解码后的处理指令为数据搬运指令时,源数据为离散数据,目的数据为连续数据。一方面,解码模块21可存储离散数据的数量,将离散数据的基地址及离散数据的数据偏移地址发送到离散地址确定模块22,以使离散地址确定模块22计算各个离散数据的第二存储地址,并按顺序发送到数据读写模块24;另一方面,解码模块21可根据连续数据的基地址及连续数据的数据尺寸,确定连续数据的第一存储地址;并将第一存储地址发送到连续数据缓存模块23。具体处理过程不再重复描述。
在一种可能的实现方式中,数据读写模块24可根据连续数据缓存模块23的顺序可分配的缓存指针,为离散地址队列SAQ中的各个离散数据点分配缓存空间ID(Buffer ID),从而建立各个离散数据的第二存储地址与Buffer ID之间的对应关系。然后,离散地址队列子模块241将各个离散数据的第二存储地址按顺序分别发送到各个存储加载队列,各个存储加载队列分别根据第二存储地址从存储模块25读取各个离散数据,并将读取到的离散数据发送到连续数据缓存模块23的缓存空间中。
在一种可能的实现方式中,连续数据缓存模块23可根据各个离散数据的Buffer ID,在缓存空间中按顺序缓存各个离散数据,形成连续数据(可称为向量数据)。在缓存空间中的连续数据达到第一预设数量时,通过总线端口共享(BUS PORT SHARE)将所述缓存空间中的连续数据发送到外部存储器的第一存储地址。其中,该第一预设数量可以等于缓存空间可缓存的连续数据的数量,也即在缓存空间存满时,将连续数据发送到外部存储器的第一存储地址;该第一预设数量也可以小于缓存空间可缓存的连续数据的数量,本公开对此不作限制。
在所有离散数据均被发送到第一存储地址的情况下,可得到预设长度的向量数据,数据搬运指令执行完成。进而可通过数据运算指令对向量数据进行进一步的处理,例如两个或两个以上的向量数据之间的四则运算,例如对两个向量数据进行差值运算等。
通过这种方式,可以在图像识别等应用场景中需要对大量离散的成对数据点进行运算的情况下,通过数据搬运指令将离散数据搬运到连续地址空间中聚合为向量数据再进行向量运算,从而将离散的数据点的运算转换为向量运算,简化处理过程,减小数据开销。
在一种可能的实现方式中,所述处理指令包括离散存储指令,在解码后的处理指令为离散存储指令时,所述多个数据的源数据为连续数据,所述多个数据的目的数据为离散数据,所述源数据基地址为连续数据的基地址,所述目的数据基地址为离散数据的基地址,所述连续数据缓存模块用于:
根据从外部存储器的第一存储地址读取连续数据;
将读取到的连续数据缓存到所述缓存空间;
在所述缓存空间中的连续数据达到第二预设数量时,将所述缓存空间中的连续数据发送到所述数据读写模块;
所述数据读写模块用于:
接收所述连续数据缓存模块的连续数据;
根据离散数据的存储地址,将接收到的连续数据写入所述存储模块。
举例来说,离散存储指令(Scatter Store)用于将连续的数据向量分散存储到一组离散的地址中。当解码模块21解码后的处理指令为离散存储指令时,源数据为连续数据,目的数据为离散数据。一方面,解码模块21可存储离散数据的数量,将离散数据的基地址及离散数据的数据偏移地址发送到离散地址确定模块22,以使离散地址确定模块22计算各个离散数据的第二存储地址,并按顺序发送到数据读写模块24;另一方面,解码模块21可根据连续数据的基地址及连续数据的数据尺寸,确定连续数据的第一存储地址;并将第一存储地址发送到连续数据缓存模块23。具体处理过程不再重复描述。
在一种可能的实现方式中,连续数据缓存模块23可建立连续数据的缓存空间;根据连续数据的第一存储地址,通过总线端口共享(BUS PORT SHARE)向外部存储器(例如DRAM或者CT-RAM)发送数据读取请求;并将外部存储器返回的连续数据按顺序回填到缓存空间中。当缓存空间中的连续数据达到第二预设数量时,将所述缓存空间中的连续数据发送到数据读写模块24。其中,该第二预设数量可以等于缓存空间可缓存的连续数据的数量,也即在缓存空间存满时发送连续数据;该第二预设数量也可以小于缓存空间可缓存的连续数据的数量,本公开对此不作限制。
在一种可能的实现方式中,数据读写模块24可根据连续数据缓存模块23的顺序可分配的缓存指针,为离散地址队列SAQ中的各个离散数据点分配缓存空间ID(Buffer ID),从而建立各个离散数据的第二存储地址与Buffer ID(也可称为指数Index索引)之间的对应关系。然后,离散地址队列子模块241将各个离散数据的第二存储地址及Buffer ID按顺序分别发送到各个存储加载队列LSQ。
在一种可能的实现方式中,当存储加载队列LSQ准备向存储模块写入数据时,待写入的离散数据的Buffer ID(也可称为指数索引)处的数据应该已发送(或称为已回填)到存储加载队列LSQ中。在该情况下,存储加载队列LSQ将该离散数据写入到存储模块中相应的第二存储地址。这样,按顺序不断写入离散数据,能够完成所有数据的写入过程。在全部数据写到存储模块后,存储模块25还可以通过总线端口共享(BUS PORT SHARE)将离散数据写入外部存储器(例如DRAM或者CT-RAM)中。
在一种可能的实现方式中,解码模块21可读取存储加载队列LSQ、连续数据缓存模块CDB等的状态信息,以便确定当前指令的执行状态,并确定当前指令是否执行结束。在当前的离散存储指令执行结束后,解码模块21还可发送清除存储(Clean VAC)操作到存储模块(VAC)25中,以便清除存储模块中的数据,开始执行新的一条指令。
通过这种方式,能够通过离散存储指令将向量数据的各个数据离散存储到离散的地址空间中,得到离散的多个数据,从而能够在图像识别等应用场景中对大量离散的成对数据点进行向量运算后,将运算后的向量分散存储为离散的数据点,得到离散的处理结果,从而简化处理过程,减小数据开销。
在一种可能的实现方式中,所述处理指令包括数据搬运指令,在解码后的处理指令为数据搬运指令时,所述多个数据的源数据为离散数据,所述多个数据的目的数据为连续数据,所述源数据基地址为离散数据的基地址,所述目的数据基地址为连续数据的基地址,所述解码模块,还用于:
在解码后的处理指令为数据搬运指令时,确定所述数据搬运指令的搬运模式;
在所述数据搬运指令的搬运模式为多向量搬运模式时,确定所述数据搬运指令的操作域中的多个第一数据的源数据基地址、目的数据基地址及数据偏移地址,以及多个第二数据的偏移步长和目的基地址步长;
根据所述多个第一数据的目的数据基地址及连续数据的数据尺寸,确定所述多个第一数据的第三存储地址;
根据所述多个第一数据的第三存储地址及所述多个第二数据的目的基地址步长,确定所述多个第二数据的第四存储地址;
所述离散地址确定模块,还用于:
根据所述多个第一数据的源数据基地址及数据偏移地址,分别确定所述多个第一数据的第五存储地址;
根据所述多个第一数据的第五存储地址及所述多个第二数据的偏移步长,分别确定所述多个 第二数据的第六存储地址;
将所述第五存储地址和所述第六存储地址发送给所述数据读写模块;
所述数据读写模块,还用于:
根据所述第五存储地址和所述第六存储地址从存储模块读取第一数据和第二数据;
将读取到的第一数据和第二数据发送给所述连续数据缓存模块;
所述连续数据缓存模块,还用于:
分别为所述多个第一数据和所述多个第二数据建立缓存空间;
将从所述数据读写模块接收到的第一数据和第二数据分别缓存到所述缓存空间;
在所述缓存空间中的第一数据和第二数据达到第三预设数量时,将所述缓存空间中的连续数据发送到外部存储器的所述第三存储地址和所述第四存储地址。
举例来说,可预先设定数据搬运指令的搬运模式包括单向量搬运模式(例如表示为Mode0)、多向量搬运模式(例如表示为Mode1)等。在单向量搬运模式下,数据搬运指令可将多个离散数据点聚合为一个向量数据;在多向量搬运模式下,数据搬运指令可将多个离散数据点聚合为两个或两个以上的向量数据。其中,可根据数据搬运指令的操作域中用于指示搬运模式的字段来确定搬运模式,例如字段Mode0或Mode1,本公开对此不作限制。
当需要搬运成对或成组的离散点进行运算(例如差值运算)的情况下,由于相邻点一般都在一个相同的缓存行(cache line),因此可通过多向量搬运模式实现取一次数据点,得到两个或两个以上的数据点的目的,最终生成两个或两个以上的不同向量,便于进行向量运算产生最终的结果(例如差值向量)。
在一种可能的实现方式中,如果解码后的处理指令为数据搬运指令,则解码模块21可确定该数据搬运指令的搬运模式(例如表示为Gather Load Offset Mode)。如果该数据搬运指令的搬运模式为多向量搬运模式,则解码模块21可确定待搬运的各个数据点的源数据地址和目的数据地址。源数据地址表示多个数据点在数据存储空间中当前的数据存储地址,为离散的多个数据地址;目的数据地址表示多个数据点将要被搬运到的数据存储空间的数据地址,为连续的数据地址。源数据地址所在的数据存储空间与目的数据地址所在的数据存储空间可以相同或不同,本公开对此不作限制。
在一种可能的实现方式中,在多向量搬运模式下,可得到至少两个向量,可设定第一个向量为第一向量数据,其它的向量为第二向量数据(包括至少一个向量数据)。数据搬运指令的操作域中可包括第一向量数据的多个第一数据的源数据基地址(Source Data Base Address)、目的数据基地址(Destination Data Base Address)及数据偏移地址;还包括对应于第二向量数据的多个第二数据的偏移步长(Offset Stride)和目的基地址步长(Destination Base Address Stride)。
一方面,解码模块21可存储离散数据的数量,将多个第一数据的源数据基地址及数据偏移地址、多个第二数据的偏移步长发送到离散地址确定模块22,以使离散地址确定模块22计算各个第一数据在存储模块25中的第五存储地址和各个第二数据在存储模块25中的第六存储地址(也即离散的源地址)。
在一种可能的实现方式中,离散地址确定模块22的偏移加载子模块221可根据各个第一数据的源数据基地址及数据偏移地址(包括偏移值基地址及偏移值宽度),通过总线端口共享(BUS PORT SHARE)从外部存储器(例如RAM)读取各个第一数据的偏移值(Offset),缓存读取到的偏移值,并将各个第一数据存储在存储模块25中的基地址及偏移值按顺序发送到离散地址生成子模块222。
在一种可能的实现方式中,离散地址生成子模块222可根据各个第一数据的基地址及偏移值,按顺序计算各个离散数据的第五存储地址,并按顺序发送到数据读写模块24。在搬运得到两个向量的情况下,对于第n个第一数据(1≤n≤N,N为第一数据的数量),第五存储地址可表示为:
Single Point Src Addr[2n]=Source Data Base Address+Offset Address[n]    (3)
在公式(3)中,Single Point Src Addr[2n]表示第n个第一数据的第五存储地址,Source Data Base Address表示第一数据的基地址,Offset Address[n]表示第n个第一数据的偏移值。在基地址例如为Addr3[15],偏移值为[24,27]时,可确定出第n个第一数据的第五存储地址为Addr3[39,42]。
在一种可能的实现方式中,在多向量搬运模式下,离散地址生成子模块222可根据偏移步长(Offset  Stride)直接在第一数据的第五存储地址的基础上得到对应的第二数据的第六存储地址。
Single Point Src Addr[2n+1]=Source Data Base Address+Offset Address[n]+Offset Stride     (4)
在公式(4)中,Single Point Src Addr[2n+1]表示与第n个第一数据对应的第二数据的第六存储地址。例如,在第n个第一数据的第五存储地址为Addr3[39,42],偏移步长为8位的情况下,可确定出第n个第二数据的第六存储地址为Addr3[47,50]。
在一种可能的实现方式中,在需要读取多组第二数据以形成多个第二向量的情况下,偏移步长可以有多个数值,例如偏移步长包括4位、8位、12位等。这样,可根据不同的偏移步长分别确定各组第二数据的第六存储地址。本领域技术人员可根据实际情况设定偏移步长的数量及取值,本公开对此不作限制。
通过这种方式,可根据第一数据的存储地址及偏移步长直接确定对应的第二数据的存储地址,以便通过一次读取得到两个或两个以上的数据点,使得指令可通过较少的地址读取较多的数据(例如通过50个数据点的偏移值基地址读取100个数据点),从而显著减小数据开销。
在一种可能的实现方式中,数据读写模块24的离散地址队列子模块241可接收并缓存第五存储地址和第六存储地址,并建立各个离散数据与连续数据缓存模块23的缓存空间中的缓存地址的对应关系,以使读取到的离散数据能够相应地放置到缓存空间中。例如,根据连续数据缓存模块23的顺序可分配的缓存指针,为离散地址队列SAQ中的各个离散数据点分配缓存空间ID(Buffer ID),从而建立各个离散数据的第二存储地址与Buffer ID之间的对应关系。
在一种可能的实现方式中,离散地址队列子模块241将各个第五存储地址和第六存储地址按顺序分别发送到各个存储加载队列;各个存储加载队列分别从存储模块读取离散的第一数据和第二数据,将读取到的离散数据按顺序发送到缓存空间中相应的缓存地址。
其中,存储加载队列可以先读取第一数据再读取第二数据,也可以同时读取第一数据和第二数据,本公开对此不作限制。
另一方面,解码模块21可根据第一数据的目的数据基地址及数据尺寸、多个第二数据的目的基地址步长,确定各个第一数据在外部存储器中的第三存储地址和各个第二数据在外部存储器中的第四存储地址(也即连续的目的地址);并将第三存储地址和第四存储地址发送到连续数据缓存模块23。
在一种可能的实现方式中,数据搬运指令的操作域中可包括目的数据基地址(Destination Data Base Address)、单个数据的尺寸(Single Point Data Size)、目的基地址步长(Destination Base Address Stride)等。由于目的数据地址为连续的数据地址,因此可直接根据第一数据的数据尺寸以及各个第一数据的序号依次确定各个第一数据的目的数据地址(称为第三存储地址)。在搬运得到两个向量的情况下,第三存储地址可表示为:
Single Point Dest Addr[2n]=Destination Data Base Address+(n-1)*Single Point Data Size    (5)
在公式(5)中,Single Point Dest Addr[n]表示第n个第一数据的第三存储地址。在目的数据基地址例如为Addr4[0,3],单个数据的尺寸为4位,n为3时,可确定出第3个第一数据的第三存储地址为Addr4[8,11]。
在多向量搬运模式下,可根据目的基地址步长直接在第一数据的第三存储地址的基础上得到对应的第二数据的第四存储地址。在搬运得到两个向量的情况下,第四存储地址可表示为:
Single Point Dest Addr[2n+1]=Destination Data Base Address+(n-1)*Single Point Data Size+Destination Base Address Stride     (6)
在公式(6)中,Single Point Dest Addr[2n+1]表示与第n个第一数据对应的第二数据的第四存储地址。例如,在根据目的数据基地址和数据尺寸确定出第n个第一数据的第三存储地址为Addr4[8,11],目的基地址步长为48位的情况下,可确定出第n个第二数据的第四存储地址为Addr4[56,59]。
在一种可能的实现方式中,在需要读取多组第二数据以形成多个第二向量的情况下,目的基地址步长可以有多个数值,例如目的基地址步长包括48位、96位等,以便分别存储多个向量数据。这样,可根据不同的目的基地址步长分别确定各组第二数据的第四存储地址。本领域技术人员可根据实际情况设定目的基地址步长的数量及取值,本公开对此不作限制。
通过这种方式,可根据第一数据的目的数据地址及目的基地址步长直接确定对应的第二数据的目的数据地址,以便存储两个或两个以上向量数据的各个数据,显著减小数据开销。
在一种可能的实现方式中,连续数据缓存模块23可分别为所述多个第一数据和所述多个第二数据建立缓存空间;在从所述数据读写模块接收到第一数据和第二数据时,可根据各个第一数据和各个第二数据的Buffer ID,在缓存空间中按顺序缓存各个第一数据和各个第二数据,形成连续的向量数据。在所述缓存空间中的第一数据和第二数据达到第三预设数量时,将所述缓存空间中的连续数据发送到外部存储器的所述第三存储地址和所述第四存储地址。其中,该第三预设数量可以等于缓存空间可缓存的连续数据的数量,也即在缓存空间存满时,将连续数据发送到外部存储器的第三存储地址和第四存储地址;该第三预设数量也可以小于缓存空间可缓存的连续数据的数量,本公开对此不作限制。
这样,依次对各个第一数据和各个第二数据进行搬运,得到第三存储地址中的N个连续的第一数据以及第四存储地址中存储的N个连续的第二数据,实现了离散的第一数据和第二数据聚合为第一向量数据和第二向量数据的过程,从而实现数据搬运,为后续的处理提供数据基础。
在一种可能的实现方式中,完成多向量的数据搬运后,可通过数据运算指令对多个向量数据进行进一步的处理,例如两个或两个以上的向量数据之间的四则运算,例如对两个向量数据进行差值运算等。
通过这种方式,能够通过多向量搬运模式的数据搬运指令将离散的成对或成组数据搬运到多个连续地址空间中,分别聚合为多个向量数据,从而能够在图像识别等应用场景中需要对大量离散的成对(或成组)数据点进行运算的情况下,通过一条指令直接获取参与运算的多个向量数据,并将离散数据点的运算转换为向量运算,从而简化处理过程,减小数据开销。
在一种可能的实现方式中,所述处理指令包括向量扩展指令,在解码后的处理指令为向量扩展指令时,所述多个数据的源数据为连续数据,所述多个数据的目的数据为连续数据,所述源数据基地址为连续数据的基地址,所述目的数据基地址为连续数据的基地址,所述解码模块,还用于:
在解码后的处理指令为向量扩展指令时,确定所述向量扩展指令的操作域中的多个第三数据的源数据基地址、多个第四数据的目的数据基地址、数据尺寸及扩展参数;
根据所述多个第三数据的源数据基地址及数据尺寸,确定所述多个第三数据的第七存储地址,并将所述第七存储地址发送到所述连续数据缓存模块;
所述连续数据缓存模块,还用于:
分别为所述多个第三数据和所述多个第四数据建立缓存空间;
根据所述第七存储地址,从外部存储器读取所述多个第三数据并缓存到所述缓存空间;
在所述缓存空间中的第三数据达到第四预设数量时,将缓存的多个第三数据发送到所述解码模块;
所述解码模块,还用于:
根据来自所述连续数据缓存模块的多个第三数据及所述扩展参数,对所述多个第三数据进行扩展,得到多个第四数据;
根据所述多个第四数据的目的数据基地址及数据尺寸,确定所述多个第四数据的第八存储地址,并将所述多个第四数据及所述第八存储地址发送到所述连续数据缓存模块;
所述连续数据缓存模块,还用于:
将所述多个第四数据缓存到所述缓存空间;
在所述缓存空间中的第四数据达到第五预设数量时,将缓存的多个第四数据发送到所述外部存储器的第八存储地址。
举例来说,向量扩展指令(Vector Extension)用于根据扩展参数实现一个连续的数据向量的扩展及存储。当解码模块21解码后的处理指令为向量扩展指令时,源数据(可称为第三数据)和目的数据(可称为第四数据)均为连续数据。在该情况下,可确定向量扩展指令的操作域中的多个第三数据的源数据基地址、多个第四数据的目的数据基地址、数据尺寸及扩展参数。
在一种可能的实现方式中,解码模块21可根据所述多个第三数据的源数据基地址及数据尺寸,确 定多个第三数据的在外部存储器中的第七存储地址,并将所述第七存储地址发送到所述连续数据缓存模块。
在一种可能的实现方式中,连续数据缓存模块23可分别为所述多个第三数据和所述多个第四数据建立缓存空间。并且,连续数据缓存模块23可根据第七存储地址,从外部存储器读取多个第三数据并缓存到缓存空间;在所述缓存空间中的第三数据达到第四预设数量时,将缓存的多个第三数据发送到所述解码模块。其中,该第四预设数量可以等于缓存空间可缓存的连续数据的数量,也即在缓存空间存满时,将连续数据发送到解码模块;该第四预设数量也可以小于缓存空间可缓存的连续数据的数量,本公开对此不作限制。
在一种可能的实现方式中,解码模块21可根据来自连续数据缓存模块23的多个第三数据及所述扩展参数,对所述多个第三数据进行扩展,得到多个第四数据。
在一种可能的实现方式中,所述第三数据为M个,所述扩展参数包括与M个第三数据对应的M个扩展参数位,M为大于1的整数,
所述根据来自所述连续数据缓存模块的多个第三数据及所述扩展参数,对所述多个第三数据进行扩展,得到多个第四数据,包括:
根据第m个第三数据以及与所述第m个第三数据对应的第m个扩展参数位,确定第m个数据位置的k m个数据,1≤m≤M,k m≥0;
根据M个数据位置的数据,得到所述多个第四数据。
举例来说,扩展参数可包括M个扩展参数位,分别表示M个第三数据的复制次数k m,例如M=5时,扩展参数可表示为[1,2,0,3,1],表示对5个第三数据分别复制1次、2次、0次、3次、1次。
在一种可能的实现方式中,对于第m个第三数据(1≤m≤M),该第m个第三数据对应的第m个扩展参数位为k m(k m≥0),则可确定第m个数据位置具有k m个的第m个第三数据。这样,分别对M个第三数据进行扩展处理,可确定M个数据位置的数据。例如,M个第三数据[A,B,C,D,E],扩展参数为[1,2,0,3,1],经扩展后,得到的多个第四数据为[A,B,B,D,D,D,E],构成一个新的向量数据。其中,第四数据的数量可与第三数据的数量不同。
应当理解,扩展参数还可以包括其他扩展内容(例如将各个数据点的值放大或缩小一定的倍数),并且扩展参数还可以包括其他表示方式,本领域技术人员可根据实际情况设置,本公开对此不作限制。
在一种可能的实现方式中,解码模块21可根据所述多个第四数据的目的数据基地址及数据尺寸,确定所述多个第四数据的第八存储地址,并将所述多个第四数据及所述第八存储地址发送到所述连续数据缓存模块。
在一种可能的实现方式中,连续数据缓存模块23可将所述多个第四数据缓存到所述缓存空间;在所述缓存空间中的第四数据达到第五预设数量时,将缓存的多个第四数据发送到所述外部存储器的第八存储地址。其中,该第五预设数量可以等于缓存空间可缓存的连续数据的数量,也即在缓存空间存满时,将连续数据发送到外部存储器;该第五预设数量也可以小于缓存空间可缓存的连续数据的数量,本公开对此不作限制。
通过这种方式,能够通过向量扩展指令对向量进行扩展,从而能够在图像识别等应用场景中需要对向量数据进行扩展处理时,将原向量扩展为新的向量并存储到连续地址空间,从而简化处理过程,减小数据开销。
需要说明的是,对于前述的各装置实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。
应该理解,上述的装置实施例仅是示意性的,本公开的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。
另外,若无特别说明,在本公开各个实施例中的各功能单元/模块可以集成在一个单元/模块中, 也可以是各个单元/模块单独物理存在,也可以两个或两个以上单元/模块集成在一起。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。若无特别说明,所述人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,所述存储单元可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)、混合存储立方HMC(Hybrid Memory Cube)等等。
所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
在一种可能的实现方式中,还公开了一种人工智能芯片,其包括了上述数据处理装置。
在一种可能的实现方式中,还公开了一种电子设备,改电子设备包括上述的人工智能芯片。
在一种可能的实现方式中,还公开了一种板卡,其包括存储器件、接口装置和控制器件以及上述人工智能芯片;其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;所述存储器件,用于存储数据;所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;所述控制器件,用于对所述人工智能芯片的状态进行监控。
图4示出根据本公开实施例的板卡的结构框图,参阅图4,上述板卡除了包括上述芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392;
所述存储器件390与所述人工智能芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元393。每一组所述存储单元与所述人工智能芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述人工智能芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组所述存储单元中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。
在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。
所述接口装置与所述人工智能芯片电连接。所述接口装置用于实现所述人工智能芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。优选的,当采用PCIE3.0X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,所述接口装置还可以是其他的接口,本公开并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述人工智能芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。
所述控制器件与所述人工智能芯片电连接。所述控制器件用于对所述人工智能芯片的状态进行监控。具体的,所述人工智能芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述人工智能芯片可以包括多个处理芯片、多个处理核或多 个处理电路,可以带动多个负载。因此,所述人工智能芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述人工智能芯片中多个处理芯片、多个处理和或多个处理电路的工作状态的调控。
在一种可能的实现方式中,公开了一种电子设备,其包括了上述人工智能芯片。电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
依据以下条款可更好地理解前述内容:
A1.一种数据处理装置,所述装置包括解码模块、离散地址确定模块、连续数据缓存模块、数据读写模块及存储模块,
所述解码模块,用于对接收到的处理指令进行解码,得到解码后的处理指令;确定所述处理指令对应的多个数据,以及所述多个数据的源数据基地址、目的数据基地址、离散数据的数据偏移地址及连续数据的数据尺寸,所述多个数据的源数据包括离散数据或连续数据;所述解码模块还用于根据连续数据的基地址及连续数据的数据尺寸,确定连续数据的第一存储地址;
所述离散地址确定模块,连接到所述解码模块和所述数据读写模块,用于根据离散数据的基地址及离散数据的数据偏移地址,确定离散数据的第二存储地址;将所述第二存储地址发送给所述数据读写模块;
所述连续数据缓存模块,连接到所述解码模块和所述数据读写模块,用于建立连续数据的缓存空间;将所述第一存储地址的连续数据缓存到所述缓存空间并发送到所述数据读写模块,或者将从所述数据读写模块接收到的连续数据缓存到所述缓存空间并发送到第一存储地址;
所述数据读写模块,连接到所述存储模块,用于根据离散数据的第二存储地址从所述存储模块读取离散数据,将读取到的离散数据发送给所述连续数据缓存模块;或者接收所述连续数据缓存模块的连续数据,根据离散数据的存储地址将接收到的连续数据写入所述存储模块,
其中,所述数据读写模块包括合并请求缓存子模块,用于在所述数据读写模块读取离散数据期间,缓存多个读取请求对应的存储地址,以使每个读取请求合并读取到一个或多个离散数据。
A2.根据A1所述的装置,所述数据偏移地址包括偏移值基地址及偏移值宽度,所述离散地址确定模块包括:
偏移加载子模块,用于根据离散数据的偏移值基地址及偏移值宽度,分别确定各个离散数据的偏移值存储地址,并从各个离散数据的偏移值存储地址读取各个离散数据的偏移值;
离散地址生成子模块,用于根据离散数据的基地址以及各个离散数据的偏移值,分别确定各个离散数据的第二存储地址,并将所述第二存储地址发送给所述数据读写模块。
A3.根据A1或A2所述的装置,所述数据读写模块包括:
离散地址队列子模块,用于接收并存储离散数据的第二存储地址;
存储加载队列子模块,用于根据离散数据的第二存储地址,从所述存储模块读取离散数据,将读取到的离散数据发送给所述连续数据缓存模块;或者接收所述连续数据缓存模块的连续数据,根据离散数据的存储地址将接收到的连续数据写入所述存储模块。
A4.根据A3所述的装置,所述合并请求缓存子模块连接到所述存储加载队列子模块及所述存储模块,所述合并请求缓存子模块用于:
在接收到存储加载队列子模块的读取请求时,判断是否缓存有与所述读取请求的目标地址处于同 一缓存行的地址;
在未缓存与所述目标地址处于同一缓存行的地址时,缓存所述目标地址,并向所述存储模块发送所述读取请求,其中,所述读取请求用于请求所述存储模块返回所述目标地址所在的目标缓存行的多个数据;
在所述存储模块返回所述目标缓存行的多个数据时,向所述存储加载队列子模块回填一个或多个数据,所述一个或多个数据为已向所述合并请求缓存子模块发送读取请求的数据中,地址在所述目标缓存行的数据。
A5.根据A4所述的装置,所述合并请求缓存子模块还用于:
删除与所述读取请求的目标地址。
A6.根据A1-A5中任意一项所述的装置,所述处理指令包括数据搬运指令,在解码后的处理指令为数据搬运指令时,所述多个数据的源数据为离散数据,所述多个数据的目的数据为连续数据,所述源数据基地址为离散数据的基地址,所述目的数据基地址为连续数据的基地址,
所述数据读写模块用于:
根据离散数据的存储地址从存储模块读取离散数据;
将读取到的离散数据发送给所述连续数据缓存模块;
所述连续数据缓存模块用于:
将从所述数据读写模块接收到的离散数据缓存到所述缓存空间,得到连续数据;
在所述缓存空间中的连续数据达到第一预设数量时,将所述缓存空间中的连续数据发送到外部存储器的第一存储地址。
A7.根据A1-A5中任意一项所述的装置,所述处理指令包括离散存储指令,在解码后的处理指令为离散存储指令时,所述多个数据的源数据为连续数据,所述多个数据的目的数据为离散数据,所述源数据基地址为连续数据的基地址,所述目的数据基地址为离散数据的基地址,
所述连续数据缓存模块用于:
根据从外部存储器的第一存储地址读取连续数据;
将读取到的连续数据缓存到所述缓存空间;
在所述缓存空间中的连续数据达到第二预设数量时,将所述缓存空间中的连续数据发送到所述数据读写模块;
所述数据读写模块用于:
接收所述连续数据缓存模块的连续数据;
根据离散数据的存储地址,将接收到的连续数据写入所述存储模块。
A8.根据A1-A5中任意一项所述装置,所述处理指令包括数据搬运指令,在解码后的处理指令为数据搬运指令时,所述多个数据的源数据为离散数据,所述多个数据的目的数据为连续数据,所述源数据基地址为离散数据的基地址,所述目的数据基地址为连续数据的基地址,
所述解码模块,还用于:
在解码后的处理指令为数据搬运指令时,确定所述数据搬运指令的搬运模式;
在所述数据搬运指令的搬运模式为多向量搬运模式时,确定所述数据搬运指令的操作域中的多个第一数据的源数据基地址、目的数据基地址及数据偏移地址,以及多个第二数据的偏移步长和目的基地址步长;
根据所述多个第一数据的目的数据基地址及连续数据的数据尺寸,确定所述多个第一数据的第三存储地址;
根据所述多个第一数据的第三存储地址及所述多个第二数据的目的基地址步长,确定所述多个第二数据的第四存储地址;
所述离散地址确定模块,还用于:
根据所述多个第一数据的源数据基地址及数据偏移地址,分别确定所述多个第一数据的第五存储地址;
根据所述多个第一数据的第五存储地址及所述多个第二数据的偏移步长,分别确定所述多个第二数据的第六存储地址;
将所述第五存储地址和所述第六存储地址发送给所述数据读写模块;
所述数据读写模块,还用于:
根据所述第五存储地址和所述第六存储地址从存储模块读取第一数据和第二数据;
将读取到的第一数据和第二数据发送给所述连续数据缓存模块;
所述连续数据缓存模块,还用于:
分别为所述多个第一数据和所述多个第二数据建立缓存空间;
将从所述数据读写模块接收到的第一数据和第二数据分别缓存到所述缓存空间;
在所述缓存空间中的第一数据和第二数据达到第三预设数量时,将所述缓存空间中的连续数据发送到外部存储器的所述第三存储地址和所述第四存储地址。
A9.根据A1-A5中任意一项所述的装置,所述处理指令包括向量扩展指令,在解码后的处理指令为向量扩展指令时,所述多个数据的源数据为连续数据,所述多个数据的目的数据为连续数据,所述源数据基地址为连续数据的基地址,所述目的数据基地址为连续数据的基地址,
所述解码模块,还用于:
在解码后的处理指令为向量扩展指令时,确定所述向量扩展指令的操作域中的多个第三数据的源数据基地址、多个第四数据的目的数据基地址、数据尺寸及扩展参数;
根据所述多个第三数据的源数据基地址及数据尺寸,确定所述多个第三数据的第七存储地址,并将所述第七存储地址发送到所述连续数据缓存模块;
所述连续数据缓存模块,还用于:
分别为所述多个第三数据和所述多个第四数据建立缓存空间;
根据所述第七存储地址,从外部存储器读取所述多个第三数据并缓存到所述缓存空间;
在所述缓存空间中的第三数据达到第四预设数量时,将缓存的多个第三数据发送到所述解码模块;
所述解码模块,还用于:
根据来自所述连续数据缓存模块的多个第三数据及所述扩展参数,对所述多个第三数据进行扩展,得到多个第四数据;
根据所述多个第四数据的目的数据基地址及数据尺寸,确定所述多个第四数据的第八存储地址,并将所述多个第四数据及所述第八存储地址发送到所述连续数据缓存模块;
所述连续数据缓存模块,还用于:
将所述多个第四数据缓存到所述缓存空间;
在所述缓存空间中的第四数据达到第五预设数量时,将缓存的多个第四数据发送到所述外部存储器的第八存储地址。
A10.根据A9所述的装置,所述第三数据为M个,所述扩展参数包括与M个第三数据对应的M个扩展参数位,M为大于1的整数,
所述根据来自所述连续数据缓存模块的多个第三数据及所述扩展参数,对所述多个第三数据进行扩展,得到多个第四数据,包括:
根据第m个第三数据以及与所述第m个第三数据对应的第m个扩展参数位,确定第m个数据位置的k m个数据,1≤m≤M,k m≥0;
根据M个数据位置的数据,得到所述多个第四数据。
A11、一种人工智能芯片,所述芯片包括如A1-A10中任意一项所述的数据处理装置。
A12、一种电子设备,所述电子设备包括如A11所述的人工智能芯片。
A13、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如A11所述的人工智能芯片;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;
所述存储器件,用于存储数据;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;
所述控制器件,用于对所述人工智能芯片的状态进行监控。
以上对本公开实施例进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明仅用于帮助理解本公开的方法及其核心思想。同时,本领域技术人员依据本公开的思想,基于本公开的具体实施方式及应用范围上做出的改变或变形之处,都属于本公开保护的范围。综上所述,本说明书内容不应理解为对本公开的限制。

Claims (13)

  1. 一种数据处理装置,其特征在于,所述装置包括解码模块、离散地址确定模块、连续数据缓存模块、数据读写模块及存储模块,
    所述解码模块,用于对接收到的处理指令进行解码,得到解码后的处理指令;确定所述处理指令对应的多个数据,以及所述多个数据的源数据基地址、目的数据基地址、离散数据的数据偏移地址及连续数据的数据尺寸,所述多个数据的源数据包括离散数据或连续数据;所述解码模块还用于根据连续数据的基地址及连续数据的数据尺寸,确定连续数据的第一存储地址;
    所述离散地址确定模块,连接到所述解码模块和所述数据读写模块,用于根据离散数据的基地址及离散数据的数据偏移地址,确定离散数据的第二存储地址;将所述第二存储地址发送给所述数据读写模块;
    所述连续数据缓存模块,连接到所述解码模块和所述数据读写模块,用于建立连续数据的缓存空间;将所述第一存储地址的连续数据缓存到所述缓存空间并发送到所述数据读写模块,或者将从所述数据读写模块接收到的连续数据缓存到所述缓存空间并发送到第一存储地址;
    所述数据读写模块,连接到所述存储模块,用于根据离散数据的第二存储地址从所述存储模块读取离散数据,将读取到的离散数据发送给所述连续数据缓存模块;或者接收所述连续数据缓存模块的连续数据,根据离散数据的存储地址将接收到的连续数据写入所述存储模块,
    其中,所述数据读写模块包括合并请求缓存子模块,用于在所述数据读写模块读取离散数据期间,缓存多个读取请求对应的存储地址,以使每个读取请求合并读取到一个或多个离散数据。
  2. 根据权利要求1所述的装置,其特征在于,所述数据偏移地址包括偏移值基地址及偏移值宽度,所述离散地址确定模块包括:
    偏移加载子模块,用于根据离散数据的偏移值基地址及偏移值宽度,分别确定各个离散数据的偏移值存储地址,并从各个离散数据的偏移值存储地址读取各个离散数据的偏移值;
    离散地址生成子模块,用于根据离散数据的基地址以及各个离散数据的偏移值,分别确定各个离散数据的第二存储地址,并将所述第二存储地址发送给所述数据读写模块。
  3. 根据权利要求1或2所述的装置,其特征在于,所述数据读写模块包括:
    离散地址队列子模块,用于接收并存储离散数据的第二存储地址;
    存储加载队列子模块,用于根据离散数据的第二存储地址,从所述存储模块读取离散数据,将读取到的离散数据发送给所述连续数据缓存模块;或者接收所述连续数据缓存模块的连续数据,根据离散数据的存储地址将接收到的连续数据写入所述存储模块。
  4. 根据权利要求3所述的装置,其特征在于,所述合并请求缓存子模块连接到所述存储加载队列子模块及所述存储模块,所述合并请求缓存子模块用于:
    在接收到存储加载队列子模块的读取请求时,判断是否缓存有与所述读取请求的目标地址处于同一缓存行的地址;
    在未缓存与所述目标地址处于同一缓存行的地址时,缓存所述目标地址,并向所述存储模块发送所述读取请求,其中,所述读取请求用于请求所述存储模块返回所述目标地址所在的目标缓存行的多个数据;
    在所述存储模块返回所述目标缓存行的多个数据时,向所述存储加载队列子模块回填一个或多个数据,所述一个或多个数据为已向所述合并请求缓存子模块发送读取请求的数据中,地址在所述目标缓存行的数据。
  5. 根据权利要求4所述的装置,其特征在于,所述合并请求缓存子模块还用于:
    删除与所述读取请求的目标地址。
  6. 根据权利要求1-5中任意一项所述的装置,其特征在于,所述处理指令包括数据搬运指令,在解码后的处理指令为数据搬运指令时,所述多个数据的源数据为离散数据,所述多个数据的目的数据为连续数据,所述源数据基地址为离散数据的基地址,所述目的数据基地址为连续数据的基地址,
    所述数据读写模块用于:
    根据离散数据的存储地址从存储模块读取离散数据;
    将读取到的离散数据发送给所述连续数据缓存模块;
    所述连续数据缓存模块用于:
    将从所述数据读写模块接收到的离散数据缓存到所述缓存空间,得到连续数据;
    在所述缓存空间中的连续数据达到第一预设数量时,将所述缓存空间中的连续数据发送到外部存储器的第一存储地址。
  7. 根据权利要求1-5中任意一项所述的装置,其特征在于,所述处理指令包括离散存储指令,在解码后的处理指令为离散存储指令时,所述多个数据的源数据为连续数据,所述多个数据的目的数据为离散数据,所述源数据基地址为连续数据的基地址,所述目的数据基地址为离散数据的基地址,
    所述连续数据缓存模块用于:
    根据从外部存储器的第一存储地址读取连续数据;
    将读取到的连续数据缓存到所述缓存空间;
    在所述缓存空间中的连续数据达到第二预设数量时,将所述缓存空间中的连续数据发送到所述数据读写模块;
    所述数据读写模块用于:
    接收所述连续数据缓存模块的连续数据;
    根据离散数据的存储地址,将接收到的连续数据写入所述存储模块。
  8. 根据权利要求1-5中任意一项所述的装置,其特征在于,所述处理指令包括数据搬运指令,在解码后的处理指令为数据搬运指令时,所述多个数据的源数据为离散数据,所述多个数据的目的数据为连续数据,所述源数据基地址为离散数据的基地址,所述目的数据基地址为连续数据的基地址,
    所述解码模块,还用于:
    在解码后的处理指令为数据搬运指令时,确定所述数据搬运指令的搬运模式;
    在所述数据搬运指令的搬运模式为多向量搬运模式时,确定所述数据搬运指令的操作域中的多个第一数据的源数据基地址、目的数据基地址及数据偏移地址,以及多个第二数据的偏移步长和目的基地址步长;
    根据所述多个第一数据的目的数据基地址及连续数据的数据尺寸,确定所述多个第一数据的第三存储地址;
    根据所述多个第一数据的第三存储地址及所述多个第二数据的目的基地址步长,确定所述多个第二数据的第四存储地址;
    所述离散地址确定模块,还用于:
    根据所述多个第一数据的源数据基地址及数据偏移地址,分别确定所述多个第一数据的第五存储地址;
    根据所述多个第一数据的第五存储地址及所述多个第二数据的偏移步长,分别确定所述多个第二数据的第六存储地址;
    将所述第五存储地址和所述第六存储地址发送给所述数据读写模块;
    所述数据读写模块,还用于:
    根据所述第五存储地址和所述第六存储地址从存储模块读取第一数据和第二数据;
    将读取到的第一数据和第二数据发送给所述连续数据缓存模块;
    所述连续数据缓存模块,还用于:
    分别为所述多个第一数据和所述多个第二数据建立缓存空间;
    将从所述数据读写模块接收到的第一数据和第二数据分别缓存到所述缓存空间;
    在所述缓存空间中的第一数据和第二数据达到第三预设数量时,将所述缓存空间中的连续数据发送到外部存储器的所述第三存储地址和所述第四存储地址。
  9. 根据权利要求1-5中任意一项所述的装置,其特征在于,所述处理指令包括向量扩展指令,在解码后的处理指令为向量扩展指令时,所述多个数据的源数据为连续数据,所述多个数据的目的数据为连续数据,所述源数据基地址为连续数据的基地址,所述目的数据基地址为连续数据的基地址,
    所述解码模块,还用于:
    在解码后的处理指令为向量扩展指令时,确定所述向量扩展指令的操作域中的多个第三数据的源数据基地址、多个第四数据的目的数据基地址、数据尺寸及扩展参数;
    根据所述多个第三数据的源数据基地址及数据尺寸,确定所述多个第三数据的第七存储地址,并将所述第七存储地址发送到所述连续数据缓存模块;
    所述连续数据缓存模块,还用于:
    分别为所述多个第三数据和所述多个第四数据建立缓存空间;
    根据所述第七存储地址,从外部存储器读取所述多个第三数据并缓存到所述缓存空间;
    在所述缓存空间中的第三数据达到第四预设数量时,将缓存的多个第三数据发送到所述解码模块;
    所述解码模块,还用于:
    根据来自所述连续数据缓存模块的多个第三数据及所述扩展参数,对所述多个第三数据进行扩展,得到多个第四数据;
    根据所述多个第四数据的目的数据基地址及数据尺寸,确定所述多个第四数据的第八存储地址,并将所述多个第四数据及所述第八存储地址发送到所述连续数据缓存模块;
    所述连续数据缓存模块,还用于:
    将所述多个第四数据缓存到所述缓存空间;
    在所述缓存空间中的第四数据达到第五预设数量时,将缓存的多个第四数据发送到所述外部存储器的第八存储地址。
  10. 根据权利要求9所述的装置,其特征在于,所述第三数据为M个,所述扩展参数包括与M个第三数据对应的M个扩展参数位,M为大于1的整数,
    所述根据来自所述连续数据缓存模块的多个第三数据及所述扩展参数,对所述多个第三数据进行扩展,得到多个第四数据,包括:
    根据第m个第三数据以及与所述第m个第三数据对应的第m个扩展参数位,确定第m个数据位置的k m个数据,1≤m≤M,k m≥0;
    根据M个数据位置的数据,得到所述多个第四数据。
  11. 一种人工智能芯片,其特征在于,所述芯片包括如权利要求1-10中任意一项所述的数据处理装置。
  12. 一种电子设备,其特征在于,所述电子设备包括如权利要求11所述的人工智能芯片。
  13. 一种板卡,其特征在于,所述板卡包括:存储器件、接口装置和控制器件以及如权利要求11所述的人工智能芯片;
    其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;
    所述存储器件,用于存储数据;
    所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;
    所述控制器件,用于对所述人工智能芯片的状态进行监控。
PCT/CN2021/090623 2020-05-08 2021-04-28 数据处理装置以及相关产品 WO2021223639A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/619,760 US20230214327A1 (en) 2020-05-08 2021-04-28 Data processing device and related product

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010382526.6A CN113626080B (zh) 2020-05-08 2020-05-08 数据处理装置以及相关产品
CN202010382526.6 2020-05-08

Publications (1)

Publication Number Publication Date
WO2021223639A1 true WO2021223639A1 (zh) 2021-11-11

Family

ID=78377235

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/090623 WO2021223639A1 (zh) 2020-05-08 2021-04-28 数据处理装置以及相关产品

Country Status (3)

Country Link
US (1) US20230214327A1 (zh)
CN (1) CN113626080B (zh)
WO (1) WO2021223639A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301454A (zh) * 2016-04-15 2017-10-27 北京中科寒武纪科技有限公司 支持离散数据表示的人工神经网络反向训练装置和方法
CN107301453A (zh) * 2016-04-15 2017-10-27 北京中科寒武纪科技有限公司 支持离散数据表示的人工神经网络正向运算装置和方法
CN107608715A (zh) * 2017-07-20 2018-01-19 上海寒武纪信息科技有限公司 用于执行人工神经网络正向运算的装置及方法
US20190138922A1 (en) * 2016-04-15 2019-05-09 Cambricon Technologies Corporation Limited Apparatus and methods for forward propagation in neural networks supporting discrete data
CN111047022A (zh) * 2018-10-12 2020-04-21 中科寒武纪科技股份有限公司 一种计算装置及相关产品

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015011570A (ja) * 2013-06-28 2015-01-19 株式会社東芝 記憶装置、電子機器、記憶装置の制御方法、記憶装置の制御プログラム
GB2543303B (en) * 2015-10-14 2017-12-27 Advanced Risc Mach Ltd Vector data transfer instruction
EP3699826A1 (en) * 2017-04-20 2020-08-26 Shanghai Cambricon Information Technology Co., Ltd Operation device and related products
CN107833176A (zh) * 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 一种信息处理方法及相关产品
CN109711539B (zh) * 2018-12-17 2020-05-29 中科寒武纪科技股份有限公司 运算方法、装置及相关产品
CN110647722B (zh) * 2019-09-20 2024-03-01 中科寒武纪科技股份有限公司 数据处理方法及装置以及相关产品

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107301454A (zh) * 2016-04-15 2017-10-27 北京中科寒武纪科技有限公司 支持离散数据表示的人工神经网络反向训练装置和方法
CN107301453A (zh) * 2016-04-15 2017-10-27 北京中科寒武纪科技有限公司 支持离散数据表示的人工神经网络正向运算装置和方法
US20190138922A1 (en) * 2016-04-15 2019-05-09 Cambricon Technologies Corporation Limited Apparatus and methods for forward propagation in neural networks supporting discrete data
CN107608715A (zh) * 2017-07-20 2018-01-19 上海寒武纪信息科技有限公司 用于执行人工神经网络正向运算的装置及方法
CN111047022A (zh) * 2018-10-12 2020-04-21 中科寒武纪科技股份有限公司 一种计算装置及相关产品

Also Published As

Publication number Publication date
CN113626080B (zh) 2023-10-03
CN113626080A (zh) 2021-11-09
US20230214327A1 (en) 2023-07-06

Similar Documents

Publication Publication Date Title
US11403044B2 (en) Method and apparatus for performing multi-object transformations on a storage device
JP5431003B2 (ja) リコンフィギュラブル回路及びリコンフィギュラブル回路システム
US20070162637A1 (en) Method, apparatus and program storage device for enabling multiple asynchronous direct memory access task executions
JP2010027032A (ja) Fifo装置及びfifoバッファへのデータ格納方法
CN107315568B (zh) 一种用于执行向量逻辑运算的装置
US11714651B2 (en) Method and tensor traversal engine for strided memory access during execution of neural networks
WO2021223639A1 (zh) 数据处理装置以及相关产品
WO2021223643A1 (zh) 数据处理装置以及相关产品
WO2021223642A1 (zh) 数据处理方法及装置以及相关产品
CN111260043B (zh) 数据选择器、数据处理方法、芯片及电子设备
WO2020247240A1 (en) Extended memory interface
WO2021018313A1 (zh) 数据同步方法及装置以及相关产品
WO2021082723A1 (zh) 运算装置
CN111260042B (zh) 数据选择器、数据处理方法、芯片及电子设备
WO2021223645A1 (zh) 数据处理方法及装置以及相关产品
WO2021223638A1 (zh) 数据处理方法及装置以及相关产品
WO2021223644A1 (zh) 数据处理方法及装置以及相关产品
TW201805802A (zh) 一種運算裝置及其操作方法
CN111382855B (zh) 数据处理装置、方法、芯片及电子设备
US11940919B2 (en) Recall pending cache line eviction
WO2024046018A1 (zh) 指令控制方法、数据缓存方法及相关产品
CN114661345A (zh) 加速运算装置、方法、集成电路芯片、计算装置和板卡
JP2003177960A (ja) 演算装置及び記憶装置
Chiu et al. Design and Implementation of the Link-List DMA Controller for High Bandwidth Data Streaming
CN117667212A (zh) 指令控制装置、方法、处理器、芯片和板卡

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21799951

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21799951

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21799951

Country of ref document: EP

Kind code of ref document: A1