CN111258639B - Data processing method, processor, data processing device and storage medium - Google Patents

Data processing method, processor, data processing device and storage medium Download PDF

Info

Publication number
CN111258639B
CN111258639B CN201811454947.4A CN201811454947A CN111258639B CN 111258639 B CN111258639 B CN 111258639B CN 201811454947 A CN201811454947 A CN 201811454947A CN 111258639 B CN111258639 B CN 111258639B
Authority
CN
China
Prior art keywords
subdata
data
storage device
source operand
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811454947.4A
Other languages
Chinese (zh)
Other versions
CN111258639A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201811454947.4A priority Critical patent/CN111258639B/en
Priority to PCT/CN2019/121064 priority patent/WO2020108496A1/en
Publication of CN111258639A publication Critical patent/CN111258639A/en
Application granted granted Critical
Publication of CN111258639B publication Critical patent/CN111258639B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30029Logical and Boolean instructions, e.g. XOR, NOT
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30076Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
    • G06F9/30083Power or thermal control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • G06F9/383Operand prefetching
    • G06F9/3832Value prediction for operands; operand history buffers

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The present application relates to a data processing method, a processor, a data processing apparatus, and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining an operation instruction, reading first subdata from a first storage device according to a preset data reading mode, storing the currently read first subdata into a second storage device, then obtaining second subdata and third subdata according to the operation instruction, carrying out replacement operation on the currently read first subdata, second subdata and third subdata, storing an obtained current operation result into the first storage device, and returning to continuously read the first subdata from the first storage device until operation corresponding to the operation instruction is completed. The data are read circularly for operation, the operation result of each circulation is stored in the first storage device continuously, exclusive access to the first storage device is realized, other processor cores are prevented from accessing the first storage device, and atomicity of atomic operation is guaranteed.

Description

Data processing method, processor, data processing device and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method, a processor, a data processing apparatus, and a storage medium.
Background
Atomic operation refers to an operation that is not interrupted by a thread scheduling mechanism, and once the operation is started, the operation is run to the end, and no thread switching is performed in the middle (for example, the shared variable i performs accumulation, and the result of i + + is generated by multiple cores at the same time in the case of non-atomic operation, and the error occurs). In a multi-core processor system, a plurality of processor cores share the same memory space, and a common data transmission technology may not guarantee atomicity, that is, the plurality of processor cores may access the same address at the same time.
In an actual program, an operation result is stored in a storage space, the storage space has a certain address range, and since the storage space may be accessed by other processor cores before the operation is completed, the conventional method is to read data in the storage space into a storage unit, store the operation result in the storage unit, and write the result of the storage unit back into the storage space after the instruction operation is completed. However, if other processor cores access the memory space during operation, an erroneous result is obtained, destroying the atomicity of the accumulation.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data processing method, a processor, a data processing apparatus, and a storage medium, which are capable of achieving independent access to an off-chip storage space during an atomic operation.
A method of data processing, the method comprising:
the method comprises the steps of obtaining an operation instruction, wherein the operation instruction is used for realizing replacement operation among a first source operand, a second source operand and a third source operand, the first source operand comprises first sub-data, the second source operand comprises second sub-data, and the third source operand comprises third sub-data;
reading the first subdata from a first storage device according to data reading capacity and the operation instruction and a preset data reading mode, and storing the currently read first subdata into a second storage device, wherein the first storage device is an off-chip storage device, and the second storage device is an on-chip storage device;
acquiring the second subdata and the third subdata according to the operation instruction, performing replacement operation according to the currently read first subdata, second subdata and third subdata, and storing an obtained current operation result into the second storage device and the first storage device;
and returning to the step of reading the first subdata from the first storage device according to the operation instruction and the data reading capacity and a preset data reading mode until the operation corresponding to the operation instruction is completed.
In one embodiment, performing bit comparison on the currently read first sub-data and the second sub-data, and determining whether the currently read first sub-data is equal to the second sub-data;
when the first subdata is equal to the second subdata, taking the third subdata as the current operation result;
and when the first subdata is not equal to the second subdata, taking the first subdata as the current operation result.
In one embodiment, the second source operand is an immediate or data stored in the second storage device, the method further comprising:
if the second source operand is an immediate, copying the immediate, copying a plurality of obtained immediate as second subdata, wherein the number of the second subdata is equal to that of the currently read first subdata;
and if the second source operand is data stored in the second storage device, reading the second subdata from a preset storage address of the second storage device, wherein the number of the currently read second subdata is equal to that of the first subdata.
In one embodiment, the third source operand is an immediate or data stored in the second storage device, the method further comprising:
if the third source operand is an immediate, copying the immediate, copying a plurality of obtained immediate as third subdata, wherein the number of the third subdata is equal to that of the currently read first subdata;
and if the third source operand is data stored in the second storage device, reading the third subdata from a preset storage address of the second storage device, wherein the number of the currently read third subdata is equal to that of the first subdata.
In one embodiment, each time after the current operation result is stored in the first storage device, a next address of an end address of the first sub data read last time is used as a start address of the first sub data read currently.
In one embodiment, when the current operation result is stored in the first storage device, a storage address of the current operation result is consistent with a storage address of the currently read first sub-data.
In one embodiment, the counter is controlled to accumulate once or decrement once, and then the step of reading the first sub-data from the first storage device according to the data reading capacity and the operation instruction and according to a preset data reading mode is returned until the counter is accumulated to the target cycle number from an initial value or the counter is decremented to the initial value from the target cycle number, so as to complete the operation corresponding to the operation instruction.
In one embodiment, the data size of the first source operand is obtained according to the operation instruction;
and obtaining the target cycle number according to the data size of the first source operand and a preset splitting granularity.
In one embodiment, the instruction format of the operation instruction comprises an instruction type, a first source operand, a second source operand, a third source operand, a target operand and an operation code;
the instruction type is used for determining whether the operation instruction is an atomic operation instruction;
the instruction type is used for determining the operation type of the operation instruction;
the operation code is used for configuring the number of source operands;
the first source operand and the second source operand are respectively used for representing data participating in operation;
the target operand is used for representing the current operation result.
A processor comprises an arithmetic circuit, a read-write circuit and a second storage device arranged adjacent to the arithmetic circuit, wherein the second storage device can be connected with a first storage device outside the processor through the read-write circuit;
the arithmetic circuit is used for acquiring an arithmetic instruction and sending a read-write request to the first storage device according to the arithmetic instruction;
the operation instruction is used for realizing replacement operation among a first source operand, a second source operand and a third source operand, wherein the first source operand comprises first sub-data, the second source operand comprises second sub-data, and the third source operand comprises third sub-data;
the read-write circuit is used for reading first subdata from the first storage device according to the read-write request and a preset data reading mode, and storing the first subdata to the second storage device;
the arithmetic circuit is configured to obtain the second sub-data and the third sub-data, perform replacement operation on the currently read first sub-data, second sub-data and third sub-data according to the arithmetic instruction, and store an obtained current operation result in the second storage device and the first storage device; and then, sending a read-write request to the first storage device again until the operation corresponding to the operation instruction is completed.
In one embodiment, the processor further comprises a data selector, the arithmetic circuit comprises a replacement arithmetic module, the data selector is connected between the arithmetic circuit and the read-write circuit, and the data selector is used for gating a connection path of the replacement arithmetic module and the read-write circuit;
the replacement operation module is configured to obtain the second sub-data, perform replacement operation on the currently read first sub-data, second sub-data, and third sub-data according to the operation instruction, and store an obtained current operation result in the second storage device and the first storage device.
In one embodiment, the replacement operation module comprises an operation unit and a result output unit connected with the operation unit;
the arithmetic unit is used for comparing the currently read first subdata with the second subdata and judging whether the currently read first subdata is equal to the second subdata or not;
the result output unit is used for taking the third subdata as the current operation result when the first subdata is equal to the second subdata;
the result output unit is further configured to take the first sub-data as the current operation result when the first sub-data is not equal to the second sub-data.
In one embodiment, the replacement operation module is further configured to determine, according to the operation instruction, that the second source operand is an immediate or data stored in the second storage device;
if the second source operand is determined to be an immediate, the replacement operation module copies the immediate, and copies a plurality of obtained immediate as the second subdata, wherein the number of the second subdata is equal to the number of the currently read first subdata;
if the second source operand is determined to be the data stored in the second storage device, the read-write circuit reads the second subdata from a preset storage address of the second storage device, and the number of the currently read second subdata is equal to the number of the first subdata.
In one embodiment, the replacement operation module is further configured to determine, according to the operation instruction, that the third source operand is an immediate or data stored in the second storage device;
if the third source operand is determined to be an immediate, the replacement operation module copies the immediate, copies a plurality of obtained immediate as the third subdata, and the number of the second subdata is equal to the number of the currently read first subdata;
if the third source operand is determined to be the data stored in the second storage device, the read-write circuit reads the third subdata from a preset storage address of the second storage device, and the number of the currently read third subdata is equal to the number of the first subdata.
In one embodiment, the arithmetic circuit comprises a master processing circuit and more than one slave processing circuits, and the more than one slave processing circuits are all connected to the master processing circuit;
the replacement operation module is arranged in the main processing circuit.
A data processing apparatus, the apparatus comprising:
the system comprises an obtaining module and a replacing module, wherein the obtaining module is used for obtaining an operation instruction, the operation instruction is used for realizing replacement operation among a first source operand, a second source operand and a third source operand, the first source operand comprises first sub-data, the second source operand comprises second sub-data, and the third source operand comprises third sub-data;
the reading module is used for reading the first subdata from a first storage device according to data reading capacity and the operation instruction and a preset data reading mode, and storing the currently read first subdata into a second storage device, wherein the first storage device is an off-chip storage device, and the second storage device is an on-chip storage device;
and the operation module is used for acquiring the second subdata and the third subdata according to the operation instruction, performing replacement operation on the currently read first subdata, the currently read second subdata and the currently read third subdata, storing an obtained current operation result into the second storage device and the first storage device, and then circularly calling the reading module and the operation module until the operation corresponding to the operation instruction is completed.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
the method comprises the steps of obtaining an operation instruction, wherein the operation instruction is used for realizing replacement operation among a first source operand, a second source operand and a third source operand, the first source operand comprises first sub-data, the second source operand comprises second sub-data, and the third source operand comprises third sub-data;
reading the first subdata from a first storage device according to data reading capacity and the operation instruction and a preset data reading mode, and storing the currently read first subdata into a second storage device, wherein the first storage device is an off-chip storage device, and the second storage device is an on-chip storage device;
acquiring the second subdata and the third subdata according to the operation instruction, performing replacement operation according to the currently read first subdata, second subdata and third subdata, and storing an obtained current operation result into the second storage device and the first storage device;
and returning to the step of reading the first subdata from the first storage device according to the operation instruction and the data reading capacity and a preset data reading mode until the operation corresponding to the operation instruction is completed.
According to the data processing method, the processor, the data processing device and the storage medium, an operation instruction is obtained, first sub-data is read from a first storage device according to a preset data reading mode, the currently read first sub-data is stored in a second storage device, then second sub-data and third sub-data are obtained according to the operation instruction, replacement operation is carried out on the currently read first sub-data, second sub-data and third sub-data, a current operation result is obtained, the current operation result is stored in the first storage device, and then the first sub-data is returned to be continuously read from the first storage device until operation corresponding to the operation instruction is completed. The processor continuously stores the operation result of each cycle into the first storage device by circularly reading data for operation, realizes exclusive access to the first storage device, avoids other processor cores from accessing the first storage device, and ensures atomicity of atomic operation. The arithmetic operation function of the processor is further expanded, and the operation efficiency during the atomic operation is improved by realizing exclusive access to the first storage device.
Drawings
FIG. 1 is a block diagram of a processor in one embodiment;
FIG. 2 is a block diagram of an embodiment of an operational module;
FIG. 3 is a schematic diagram of a processor according to another embodiment;
FIG. 4 is a schematic diagram of a processor according to another embodiment;
FIG. 5 is a schematic diagram of a processor according to another embodiment;
FIG. 6 is a flow diagram illustrating a data processing method according to one embodiment;
FIG. 7 is a flow chart illustrating a data processing method according to another embodiment;
FIG. 8 is a flowchart illustrating a method for instruction disassembly in accordance with another embodiment;
FIG. 9 is a schematic flow chart of the Atomic CAS method in one embodiment;
FIG. 10 is a flowchart illustrating step S706 according to an exemplary embodiment;
FIG. 11 is a block diagram showing the structure of a data processing apparatus according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
The terms "first," "second," and "third," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different elements and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus. Off-chip refers to the outside of the processor, i.e., off-chip memory means a memory device disposed outside the processor; on-chip refers to the inside of the processor, i.e. on-chip memory means refers to memory means arranged inside the processor.
The data processing method provided by the present application can be applied to the processor 1000 shown in fig. 1. The processor 1000 includes an arithmetic circuit 12, a read/write circuit 203, and a second storage device 201. The second storage means 201 may be a buffer and/or a register arranged inside the processor 1000. The second storage device 201 may be connected to the first storage device 13 provided outside the processor 1000 through the read-write circuit 203. The first storage device 13 and the second storage device 201 may be a non-volatile memory or a volatile memory, and are not limited herein. The read and write circuits 203 may be I/O circuits.
The operation circuit 12 and the read/write circuit 203 can be connected to the second storage device 201, and the read/write circuit 203 can be connected to the first storage device 13. The second storage device 201 can be connected to the first storage device 13 outside the processor 1000 via the read/write circuit 203. The second memory device 201 may read the first source operand from the first memory device 13 via the read-write circuit 203 and transfer the first source operand to the operation circuit 12 for operation. The arithmetic circuit 12 may store the obtained arithmetic result and the intermediate arithmetic result in the second storage device 201, and the second storage device 201 may write the arithmetic result back to the first storage device 13 through the read/write circuit 203. In the embodiment of the present application, the intermediate operation result is continuously written back from the second storage device 201 to the first storage device 13 outside the processor 1000, so that the operation circuit 12 can exclusively use the first storage device 13, and the atomicity of the operation and the accuracy of the operation result can be ensured.
The arithmetic circuit 12 is configured to receive an arithmetic instruction, analyze the arithmetic instruction, and implement a corresponding arithmetic operation according to the arithmetic instruction. Optionally, the operation instruction may have a specific instruction format, and the operation circuit may parse the instruction format according to the operation instruction to obtain instruction information such as an instruction type, a source operand, and an operation code of the operation instruction, so as to implement a corresponding operation according to the operation instruction.
Alternatively, the operation instruction in the embodiment of the present application may be an atomic operation instruction, and as shown in table 1 below, the instruction format of the operation instruction may include an instruction type Name, an instruction type Op, a first source operand, a second source operand, a destination operand Dst, an opcode Src Op, and the like.
The instruction class Name is used for determining the class of the instruction (the class of the instruction includes an atomic operation instruction and other common operation instructions), that is, the instruction class is used for determining whether the operation instruction is an atomic operation instruction. The instruction type Op is used to determine the operation type of the operation instruction, and the operation type is used to indicate what kind of operation the operation instruction implements, so as to distinguish the specific function of the operation, for example, the operation type may be an accumulation operation, a decrement operation, a maximum value operation, a minimum value operation, a logical and operation, a logical or operation, a logical xor operation, an alternative operation, a swap operation, and the like. The operation code Src Op is used to configure the number of source operands involved in the operation instruction. The target operand dstaddr is used to indicate a current operation result obtained after at least one source operand operation, and specifically, the target operand dstaddr may refer to a storage address of the current comparison result, and an operation result corresponding to the operation instruction may be stored in a storage space indicated by the storage address corresponding to the dstaddr. The first and second source operands may represent data participating in an operation, the first source operand may be data stored on the off-chip first storage means 13, i.e. the first source operand may represent data stored in the address Src0 addr. The second source operand may represent data stored in an immediate or address in an instruction.
Further, the instruction format of the arithmetic instruction may further include an identification bit Src1vec for identifying whether the source operand a is an immediate or an address, and an identification bit Src2vec for identifying whether the source operand B is an immediate or an address.
Specifically, when Src1vec is 0, it indicates that the source operand a is an immediate number, and when Src1vec is 1, it indicates that the source operand a is data stored in an address; when Src2vec is 0, it indicates that source operand B is an immediate number, and when Src2vec is 1, it indicates that source operand B is data stored in an address.
Furthermore, the instruction format of the arithmetic instruction further includes a Data size Data for indicating the first source operand and a Data stream IO config for requesting the splitting and identifying the target loop times.
The instruction format of the operation instruction may be as follows, as shown in table 1:
instruction field Bit width Means of
Name 8 Instruction class, atomic class 15
Op 8 Instruction type, distinguishing particular functions
Src0addr 49 Source operand 0 address, off-chip only, aligned by byte
Dst addr 32 Destination address, on-chip only
Src1 32 Source operand A, immediate/address (determined by Src1 vec)
Src2 32 Source operand B, immediate/address (determined by Src2 vec)
IO config 9 Atomic operations read and write data stream IDs for requesting splits
Data size 32 Atomic operations read and write data size, aligned by byte
Scr Op 3 Operation code, configuration source operand number
Data type 3 Type of data
Src1 vec 1 Source operand A type (immediate/address)
Src2 vec 1 Source operand B type (immediate/address)
Where Src0addr represents the address of the first source operand and dstaddr represents the storage address of the target operand.
In one embodiment, the source operand A or the source operand B is used as the second source operand according to the operation instruction.
Optionally, the second source operand includes source operand A (Src 1), source operand B (Src 2), source operand A selected identification bits, source operand B selected identification bits. Specifically, when the selected identification bit of the source operand A is valid, the source operand A is taken as a second source operand; and when the selected identification bit of the source operand B is valid, using the source operand B as a second source operand. When the selected identification bit of the source operand A is valid and the selected identification bit of the source operand B is valid, the source operand A and the source operand B can be simultaneously used as second source operands, and the number of the second source operands is two. Further, the bit width of the operation code Scr Op may include 3 bits, where 2 bits are used to distinguish the number of source operands participating in the operation, and 1 bit is used to select the source operand a (Src 1) and/or the source operand B (Src 2) as the second source operand to participate in the operation. Reference may be made to table 2:
source operand Src Op
Src0 000
Src0、Src1 010
Src0、Src2 011
Src0、Src1、Src2 100
When the operation code Scr Op is "000", it indicates that the source operand of the operation instruction is 1, which is the first source operand Src0. When the operation code Scr Op is "010", it indicates that the source operand of the operation instruction is 2, including the first source operand Src0 and the second source operand a, and the selected identification bit of the source operand a is valid, and the second source operand is the source operand a (Src 1). When the operation code Scr Op is "011", it indicates that the source operands of the operation instruction are 2, including the first source operand Src0 and the second source operand, and the selected identification bit of the source operand B is valid, and the second source operand is the source operand B (Src 2). When the operation code Scr Op is "100", it indicates that the source operands of the operation instruction are 3, including the first source operand Src0, the source operand a (Src 1), and the source operand B (Src 2).
In the embodiment of the present application, it may be default that the first source operand Src0 is always valid.
Optionally, the Data Type represents a Data Type, and the instruction supports, but is not limited to, the following Data types:
type of data Data Type
Int16 000
Uint16 001
Int32 010
Uint32 011
Optionally, the operation instruction may include an arithmetic operation instruction, and may also include a logic operation instruction: the arithmetic operation instruction may include: the arithmetic operation instruction Atomic MAX _ SCALAR, the arithmetic operation instruction Atomic MIN _ SCALAR, the arithmetic operation instruction Atomic MAX _ VEC, the arithmetic operation instruction Atomic MIN _ VEC, the replacement operation instruction Atomic CAS, the exchange operation instruction Atomic EXCH, the addition operation instruction Atomic ADD, the accumulation operation instruction Atomic INC, and the subtraction operation instruction Atomic DEC. The logical operation instruction may include: the logical AND operation instruction Atomic AND, the logical OR operation instruction Atomic OR, the logical XOR operation instruction Atomic XOR, AND the logical NOT operation instruction Atomic NOT.
The monocular maximum value operation instruction Atomic MAX _ SCALAR is used for solving the maximum value of the plurality of first subdata in the first source operand.
And the monocular minimum value operation instruction Atomic MIN _ SCALAR is used for solving the minimum value of the plurality of first subdata in the first source operand.
The binary maximum value operation instruction Atomic MAX _ VEC is used for calculating the maximum value of the first source operand and the second source operand.
The binary minimum value operation instruction Atomic MIN _ VEC is used for calculating the minimum value of the first source operand and the second source operand.
The ADD operation instruction Atomic ADD is used for adding a first source operand and a second source operand.
And the accumulation operation instruction Atomic INC is used for performing accumulation operation between the first source operand and the second source operand.
A subtraction instruction Atomic DEC to perform a subtraction operation between a first source operand and a second source operand.
AND the logic AND operation instruction Atomic AND is used for carrying out AND logic operation between the first source operand AND the second source operand.
The logical OR instruction Atomic OR is used for carrying out logical OR operation between the first source operand and the second source operand.
And the logic exclusive-OR operation instruction Atomic XOR is used for performing logic exclusive-OR operation between the first source operand and the second source operand.
And the logic NOT instruction is used for carrying out NOT operation between the first source operand and the second source operand.
The replacement operation instruction Atomic CAS is used for replacing the first source operand, the second source operand and the third source operand.
The system comprises an exchange operation instruction Atomic EXCH and an operation instruction used for exchanging between a first source operand and a second source operand.
In this embodiment, to ensure atomicity of operation, the operation instruction may be implemented by dividing the same operation into multiple sub-operation operations, and the first storage device is monopolized by continuously writing back the intermediate calculation result to the first storage device.
Specifically, the first source operand includes at least one first subdata, the arithmetic circuit 12 receives an arithmetic instruction, sends a read-write request to the first storage device 13 according to the arithmetic instruction, the read-write circuit 203 reads the first subdata from the first storage device 13 according to the read-write request and stores the first subdata into the second storage device 201 according to a data reading mode, the arithmetic circuit 12 obtains the second source operand according to the arithmetic instruction, executes an arithmetic operation to obtain a current arithmetic result, stores the obtained current arithmetic result into the second storage device 201, and stores the current arithmetic result of the second storage device 201 into the first storage device 13 through the read-write circuit 203. After that, the arithmetic circuit 12 may send a read/write request to the first storage device 13 again to read the first sub-data from the first storage device 13 again, and execute the arithmetic operation multiple times in a loop until the arithmetic operation corresponding to the arithmetic instruction is completed.
Optionally, the processor may further comprise a counter, which may be connected to the arithmetic circuitry 12, for recording a target number of cycles of the arithmetic instruction. Specifically, each time the read/write circuit 203 stores the current operation result of the second storage device 201 in the first storage device 13, the operation circuit 12 may control the counter to increment once, and send the read/write request to the first storage device 13 again until the counter is incremented from the initial value to the target cycle number. In this embodiment, the initial value of the counter may be 0, that is, when the counter is incremented from 0 to the target cycle number, the corresponding operation of the operation instruction is completed. Alternatively, the arithmetic circuit 12 may control the counter to decrement once, and send the read/write request to the first storage device 13 again until the counter is decremented from the target number of cycles to the initial value. In this embodiment, the initial value of the counter may be 0, that is, when the counter is decremented from the target cycle number to 0, the corresponding operation of the operation instruction is completed.
Further, the arithmetic circuit 12 may be provided with an arithmetic module corresponding to each arithmetic instruction. Specifically, referring to fig. 2, the arithmetic circuit 12 may include a binoculus maximum value operation module 121, a binoculus minimum value operation module 122, a logical and operation module 123, a logical or operation module 124, a logical exclusive-or operation module 125, an exchange operation module 126, a replacement operation module 127, a monocular maximum value operation module 128, a monocular minimum value operation module 129, an addition operation module 130, an accumulation operation module 131, a subtraction operation module 132, a logical not operation module 133, and the like.
The two-purpose maximum value operation module 121 is configured to implement the operation of the two-purpose maximum value operation instruction Atomic MAX _ VEC, that is, to implement the maximum value operation of the first source operand and the second source operand.
The binary minimum value operation module 122 is configured to implement the operation of the binary minimum value operation instruction Atomic MIN _ VEC, that is, to implement the minimum value operation of the first source operand and the second source operand.
The AND logic module 123 is configured to implement the operation of the AND logic instruction Atomic AND, that is, to implement the AND logic operation between the first source operand AND the second source operand.
The OR logic module 124 is configured to implement the operation of the above-mentioned OR logic instruction Atomic OR, that is, to implement a logical OR operation between the first source operand and the second source operand.
The XOR operation module 125 is configured to implement the operation of the above XOR operation instruction Atomic XOR, that is, to implement a XOR operation between the first source operand and the second source operand.
The swap operation module 126 is configured to implement the operation of the swap operation instruction Atomic EXCH, i.e., the swap operation between the first source operand and the second source operand.
The replacement operation module 127 is configured to implement the operation of the replacement operation instruction Atomic CAS, that is, to implement a replacement operation between the first source operand, the second source operand, and the third source operand.
The monocular maximum value operation module 128 is configured to implement the operation of the above-mentioned monocular maximum value operation instruction Atomic MAX _ SCALAR, that is, to implement the maximum value operation of the plurality of first sub-data in the first source operand.
The monocular minimum value operation module 129 is configured to implement the operation of the above-mentioned monocular minimum value operation instruction Atomic MIN _ SCALAR, that is, to implement the minimum value operation of the plurality of first sub-data in the first source operand.
The addition operation module 130 is configured to implement the operation of the above-mentioned addition operation instruction Atomic ADD, that is, to implement the operation of adding the first source operand and the second source operand.
The accumulation operation module 131 is configured to implement the operation of the accumulation operation instruction Atomic INC, that is, to implement the operation of accumulating the first source operand and the second source operand.
The subtraction module 132 is configured to implement the operation of the subtraction instruction Atomic DEC, that is, to implement the subtraction operation between the first source operand and the second source operand.
The logical NOT operation module 133 is configured to implement the operation of the above-mentioned logical NOT operation instruction Atomic NOT, that is, to implement the logical NOT operation between the first source operand and the second source operand.
Alternatively, each operation module may include an operation unit and a result output unit connected to the operation unit. The operation unit is used for executing specific operation steps, and the result output unit is used for taking the result obtained in the operation steps as the current operation result.
Further, as shown in fig. 1 and fig. 2, the processor may further include a data selector 14, and the data selector 14 is connected between the arithmetic circuit 12 and the read-write circuit 203. The data selector 14 is configured to gate the connection paths between the operation blocks in the operation circuit 12 and the read/write circuit 203. For example, if the operation command is an Atomic MAX _ VEC, the data selector 14 is used to gate the connection path between the binary maximum operation module 121 and the read/write circuit 203. At this time, the maximum two-entry operation module 121 is configured to obtain the second subdata, determine whether the currently read first subdata is greater than or equal to the second subdata according to the operation instruction, store the obtained current comparison result in the second storage device 201, and store the current comparison result of the second storage device 201 in the first storage device 13 through the read/write circuit 203 and the data selector 14.
If the operation command is the binary minimum operation command Atomic MIN _ VEC, the data selector 14 is configured to gate the connection path between the binary minimum operation module 122 and the read/write circuit 203. At this time, the two-entry minimum value operation module 122 is configured to obtain the second sub data, determine whether the currently read first sub data is smaller than the second sub data according to the operation instruction, store the obtained current comparison result in the second storage device 201, and store the current comparison result of the second storage device 201 in the first storage device 13 through the read/write circuit 203 and the data selector 14.
If the operation instruction is the AND operation instruction Atomic AND, the data selector 14 is used to gate the connection path between the AND operation module 123 AND the read/write circuit 203. At this time, the and logic module 123 is configured to obtain the second sub-data, perform and logic operation on the currently read first sub-data and second sub-data according to the operation instruction, store the obtained current operation result into the second storage device 201, and store the current operation result of the second storage device 201 into the first storage device 13 through the read/write circuit 203 and the data selector 14.
If the operation instruction is the logic and operation instruction Atomic OR, the data selector 14 is used to gate the connection path between the logic OR operation module 124 and the read/write circuit 203. At this time, the or logic module 124 is configured to obtain the second sub data, perform a or logic operation on the currently read first sub data and second sub data according to the operation instruction, store the obtained current operation result in the second storage device 201, and store the current operation result of the second storage device 201 in the first storage device 13 through the read/write circuit 203 and the data selector 14.
If the operation instruction is the logical and operation instruction Atomic XOR, the data selector 14 is configured to gate the connection path between the logical exclusive-or operation module 125 and the read/write circuit 203. At this time, the xor operation module 125 is configured to obtain the second sub-data, perform xor operation on the currently read first sub-data and second sub-data according to the operation instruction, store the obtained current operation result into the second storage device 201, and store the current operation result of the second storage device 201 into the first storage device 13 through the read/write circuit 203 and the data selector 14.
If the operation instruction is the replacement operation instruction Atomic CAS, the data selector 14 is used for gating the connection path between the replacement operation module 127 and the read/write circuit 203. At this time, the replacement operation module 127 is configured to obtain the second sub-data and the third sub-data, perform replacement operation on the currently read first sub-data, second sub-data, and third sub-data according to the operation instruction, store the obtained current operation result into the second storage device 201, and store the current operation result of the second storage device 201 into the first storage device 13 through the read-write circuit 203 and the data selector 14.
If the operation instruction is an exchange operation instruction Atomic EXCH, the data selector 14 is used for gating the connection path between the exchange operation module 126 and the read/write circuit 203. At this time, the exchange operation module 126 is configured to obtain the second sub data, perform an exchange operation on the currently read first sub data and second sub data according to the operation instruction, store the obtained current operation result in the second storage device 201, and store the current operation result of the second storage device 201 in the first storage device 13 through the read/write circuit 203 and the data selector 14.
If the operation command is the monocular maximum value operation command Atomic MAX _ SCALAR, the data selector 14 is configured to gate the connection path between the monocular maximum value operation module 128 and the read/write circuit 203. At this time, the monocular maximum value operation module 128 is configured to compare N pieces of sub data in the sub data segments of the source operand one by one to obtain a maximum value of the N pieces of sub data, store the maximum value as a current comparison result in the second storage device 201, and store the current comparison result of the second storage device 201 in the first storage device 13 through the read/write circuit 203 and the data selector 14.
If the operation command is the monocular minimum value operation command Atomic MIN _ SCALAR, the data selector 14 is configured to gate the connection path between the monocular minimum value operation module 129 and the read/write circuit 203. At this time, the monocular minimum value operation module 129 is configured to compare the N pieces of sub data in the sub data segments of the source operand one by one to obtain a minimum value of the N pieces of sub data, store the minimum value as a current comparison result in the second storage device 201, and store the current comparison result of the second storage device 201 in the first storage device 13 through the read/write circuit 203 and the data selector 14.
If the operation instruction is an ADD operation instruction Atomic ADD, the data selector 14 is configured to gate a connection path between the ADD operation module 130 and the read/write circuit 203. At this time, the addition operation module 130 is configured to obtain the second sub-data, add the currently read first sub-data and the second sub-data according to the operation instruction to obtain a current operation result, store the current operation result into the second storage device 201, and store the current operation result of the second storage device 201 into the first storage device 13 through the read/write circuit 203 and the data selector 14.
If the operation instruction is an accumulation operation instruction Atomic INC, the data selector 14 is configured to gate a connection path between the accumulation operation module 131 and the read/write circuit 203. At this time, the accumulation operation module 131 is configured to obtain the second sub-data, determine whether the currently read first sub-data is greater than or equal to the second sub-data according to the operation instruction, reset the first sub-data when the first sub-data is greater than or equal to the second sub-data, store the reset first sub-data as a current comparison result in the second storage device 201, and store the current comparison result of the second storage device 201 in the first storage device 13 through the read-write circuit 203 and the data selector 14.
If the operation instruction is the subtraction instruction Atomic DEC, the data selector 14 is configured to gate a connection path between the subtraction module 132 and the read/write circuit 203. At this time, the subtraction module 132 is configured to obtain second sub-data, determine whether the currently read first sub-data is larger than the second sub-data according to the operation instruction, and store the second sub-data as a current comparison result in the second storage device 201 when the first sub-data is larger than the second sub-data; when the first subdata is smaller than or equal to the second subdata, subtracting the first subdata from the first preset value, storing the subtracted first subdata into the second storage device 201 as a current comparison result, and storing the current comparison result of the second storage device 201 into the first storage device 13 through the read-write circuit 203 and the data selector 14.
If the operation instruction is the logic NOT operation instruction Atomic NOT, the data selector 14 is configured to gate a connection path between the logic NOT operation module 133 and the read/write circuit 203. At this time, the logical negation operation module 133 is configured to obtain the second sub data, perform a logical negation operation on the currently read first sub data and second sub data according to the operation instruction, store the obtained current operation result in the second storage device 201, and store the current operation result of the second storage device 201 in the first storage device 13 through the read/write circuit 203 and the data selector 14.
In one embodiment, with continued reference to fig. 3-5, the second memory device 201 and the read/write circuit 203 may be packaged as a memory circuit 10. The arithmetic circuit 12 includes a master processing circuit 101 and at least one slave processing circuit 102, the at least one slave processing circuit 102 each being connected to the master processing circuit 101, the master processing circuit 101 being connected to a branch processing circuit(s) 103, the branch processing circuit 103 being connected to the one or more slave processing circuits 102; the branch processing circuit 103 is configured to execute forwarding of data or instructions between the master processing circuit 101 and the slave processing circuit 102. The main processing circuit 101 is used for performing preamble processing on a source operand and transmitting data and an operation instruction with a plurality of slave processing circuits; the plurality of slave processing circuits 102 are configured to perform an intermediate operation in parallel according to the data and the operation instruction transmitted from the master processing circuit to obtain a plurality of intermediate results, and transmit the plurality of intermediate results to the master processing circuit; the main processing circuit 101 is configured to perform subsequent processing on the plurality of intermediate results to obtain a calculation result of the calculation instruction.
The main processing circuit 101 may include the aforementioned two-purpose maximum operation module 121, two-purpose minimum operation module 122, logical and operation module 123, logical or operation module 124, logical exclusive or operation module 125, replacement operation module 126, swap operation module 127, one-purpose maximum operation module 128, one-purpose minimum operation module 129, addition operation module 130, accumulation operation module 131, subtraction operation module 132, and logical not operation module 133. The data selector 14 described above may be connected between the main processing circuit 101 and the read/write circuit 203.
In one embodiment, the processor may further include a controller circuit 11, the controller circuit 11 including: instruction cache circuit 110, instruction processing circuit 111, and store queue circuit 113.
The instruction cache circuit 110 is configured to store a calculation instruction associated with an artificial neural network operation.
The instruction processing circuit 111 is configured to analyze the calculation instruction to obtain a plurality of operation instructions.
A store queue circuit 113 for storing an instruction queue comprising: and a plurality of operation instructions or calculation instructions to be executed according to the front and back sequence of the queue.
Further, the controller circuit 11 may include a split granularity circuit 114, a cycle number processing circuit 115, and a data read capacity calculation circuit 116.
The split granularity circuit 114 is connected to the cycle number processing circuit 115, the cycle number processing circuit 115 is connected to the instruction processing circuit 111 and the data reading capacity calculation circuit 116, the data reading capacity calculation circuit 116 is connected to the operation circuit 12, and the second storage device 201 can be connected to the first storage device 13 outside the processor through the read/write circuit 203.
The instruction processing circuit 111 is configured to obtain an operation instruction, parse the data size of the first source operand according to the operation instruction, and transmit the data size of the first source operand to the loop number processing circuit 115.
The split granularity circuit 114 is used to store a preset split granularity. In this embodiment, the split-granularity circuit 114 may be a buffer or a segment of a storage space in the second storage device, for example, the split-granularity circuit 114 may be a storage space corresponding to a specified address range in the second storage device.
The cycle number processing circuit 115 is configured to obtain a target cycle number according to the size of the first operand and a preset splitting granularity, and transmit the target cycle number to the operation circuit 12. In this embodiment, the loop number processing circuit 115 may be a counter.
The data reading capacity calculation circuit 116 is configured to obtain a data reading capacity according to the size of the first operand and a preset splitting granularity, and transmit the data reading capacity to the operation circuit 12. The arithmetic circuit 12 is configured to send a read/write request to the first storage device 13 according to an arithmetic instruction to read first sub-data from the first storage device 13, where the size of the first sub-data is equal to the data reading capacity. After that, the operation circuit 12 may perform an operation according to the read first sub data and second sub data, and after the current operation is completed, the cycle number processing circuit 115 increments the target cycle number once from the initial value, and sends the read/write request to the first storage device 13 again until the counter increments from the initial value to the target cycle number. In this embodiment, the initial value may be 0, that is, until the current cycle number is accumulated from 0 to the target cycle number, the corresponding operation of the operation instruction is completed. Alternatively, the loop count processing circuit 115 decrements the target loop count once, and transmits the read/write request to the first storage device 13 again until the target loop count is decremented to 0. Namely, when the current cycle number is decreased to 0, the corresponding operation of the operation instruction is completed.
In the embodiment, the data is split by adding the split granularity circuit 114, the cycle number processing circuit 115 and the data reading capacity calculation circuit 116, so that the size of the processed data is larger than the memory access bandwidth which can be accommodated in a single clock cycle.
Referring to fig. 6 or fig. 7, after receiving the operation instruction, the processor may perform the following steps:
s100, an operation instruction is obtained.
The operation instruction is used for realizing operation among source operands, and the first source operand comprises at least one first subdata.
And S200, reading the first subdata from the first storage device according to the data reading capacity and the operation instruction and a preset data reading mode, and storing the currently read first subdata into the second storage device.
The data reading capacity represents the number of data read at a time, and can be calculated. The first storage device 13 is an off-chip storage device and the second storage device 201 is an on-chip storage device. Specifically, after the arithmetic circuit 12 obtains the arithmetic instruction, it sends a read-write request to the first storage device 13 according to the arithmetic instruction, and then the read-write circuit 203 reads the first sub-data from the first storage device 13 according to the preset data reading mode and according to the read-write request, and stores the currently read first sub-data in the second storage device 201.
S300, executing operation according to the operation instruction, and storing the obtained current operation result into the second storage device and the first storage device.
Specifically, the arithmetic circuit 12 performs a corresponding arithmetic operation according to the obtained arithmetic instruction, so as to obtain a current arithmetic result, then stores the obtained current arithmetic result in the second storage device 201, and stores the current arithmetic result of the second storage device 201 in the first storage device 13 through the read/write circuit 203.
And S400, returning to the step of reading the first subdata from the first storage device according to the data reading capacity and the operation instruction and a preset data reading mode until the operation corresponding to the operation instruction is completed.
Specifically, step S400 may include: and controlling the counter to accumulate once or decrement once, then returning to the step S200, and reading the first subdata from the first storage device according to the operation instruction and the data reading capacity and a preset data reading mode until the counter is accumulated to the target cycle number from the initial value or the counter is decremented to the initial value from the target cycle number. In the embodiment of the present application, the initial value of the counter may be 0.
Further, the target number of cycles is calculated according to the data size of the first source operand. After the current operation result of the second storage device 201 is stored in the first storage device 13 through the read-write circuit 203, the counter is controlled to accumulate once, and then the first subdata is continuously read from the first storage device 13 according to the operation instruction and the data reading capacity until the counter is accumulated from 0 to the target cycle number. Or controls the counter to decrement once until the counter decrements the target number of cycles to 0, and stops reading the first sub data from the first storage device 13.
In another embodiment, referring to fig. 8, the operation method may further include the following steps:
s500, an operation instruction is obtained, and the data size of the first source operand is analyzed according to the operation instruction.
Specifically, the instruction processing circuit 111 obtains the operation instruction, parses the data size of the first source operand according to the operation instruction, and sends the data size of the first source operand to the loop number processing circuit 115.
S600, according to the data size of the first source operand and a preset splitting granularity, obtaining the target cycle number and the data reading capacity.
Specifically, the preset split granularity is stored in the split granularity circuit 114, which may be a certain storage space in the static memory on the chip. The cycle number processing circuit 115 receives the data size of the first source operand, and calculates the target cycle number according to the data size of the first source operand and the preset splitting granularity. The data read capacity calculation circuit 116 calculates the data read capacity according to the cycle count sent by the cycle count processing circuit 115, the data size of the first source operand sent by the instruction processing circuit 111, and the preset split granularity, and sends the data read capacity and the target cycle count to the operation circuit 12.
Alternatively, the loop number processing circuit 115 may calculate the target loop number according to the following formula:
Figure BDA0001887514650000161
wherein, count is expressed as a target cycle number, data size is expressed as a data size of the first source operand, and the splitting granularity is a preset splitting granularity. In the embodiment of the application, the quotient obtained by dividing the data size by the preset splitting granularity is rounded up to obtain the target cycle number Count.
The data read capacity calculation circuit 116 may calculate the data read capacity according to the following formula:
data size = min { unprocessed data size, split granularity }
The data real size represents data reading capacity, the splitting granularity is preset splitting granularity, and the unprocessed data size represents the data size of the first source operand minus the data reading capacity.
For example, the controller circuit 11 analyzes that the data size of the first source operand is 1000 bytes, the preset splitting granularity is 512 bytes, the number of cycles is 2, and the data reading capacities of the two times are 512 bytes and 488 bytes, respectively, according to the operation instruction.
In one embodiment, the source operand A or the source operand B is used as the second source operand according to the instruction format of the operation instruction.
When it is determined that three source operands participate in the operation according to the operation instruction, according to the instruction format of the operation instruction, the source operand a is used as the second source operand, and the source operand B is used as the third source operand.
Specifically, referring to the format of Src Op in table 2, when Src Op in the received operation instruction is 010, which indicates that source operand a is valid, source operand a is used as the second source operand; when the Src Op in the received operation instruction corresponds to 011, indicating that the source operand B is valid, taking the source operand B as a second source operand; when Src Op in the received operation instruction corresponds to 100, which indicates that source operand a is valid and source operand B is valid, source operand a is used as the second source operand and source operand B is used as the third source operand.
In this embodiment, the source operand a or the source operand B is selected as the second source operand for selection according to the format of the operation code Src Op in the instruction format.
Specifically, when the above-mentioned operation instruction is Atomic CAS, the operation method shown in fig. 9 may include the following steps:
s702, an operation instruction is obtained.
The operation instruction is used for realizing replacement operation among a first source operand, a second source operand and a third source operand, wherein the first source operand comprises first subdata, the second source operand comprises second subdata and the third source operand comprises third subdata. Src0 may be used as a first source operand, source operand A (Src 1) as a second source operand, and source operand B (Src 2) as a third source operand, depending on the format of the operation instruction.
Specifically, the arithmetic circuit 12 obtains an arithmetic instruction for implementing a replacement operation among the first source operand, the second source operand, and the third source operand.
S704, reading first subdata from a first storage device according to the operation instruction and the data reading capacity and a preset data reading mode, and storing the currently read first subdata to a second storage device.
The first storage device 13 is an off-chip storage device, and the second storage device 201 is an on-chip storage device.
Specifically, after the arithmetic circuit 12 obtains the arithmetic instruction, it sends a read/write request to the first storage device 13 according to the arithmetic instruction and the data reading capacity, and then the read/write circuit 203 reads the first sub-data from the first storage device 13 according to the read/write request in a data reading manner, and stores the currently read first sub-data in the second storage device 201.
S706, obtaining second subdata and third subdata according to the operation instruction, performing replacement operation on the currently read first subdata, second subdata and third subdata, and storing the obtained current operation result into a second storage device and a first storage device.
The number of the obtained second sub data, the third sub data and the number of the currently read first sub data are equal.
Specifically, the arithmetic circuit 12 obtains the second sub-data and the third sub-data required for the comparison operation according to the obtained operation instruction, performs replacement operation on the currently read first sub-data, second sub-data, and third sub-data to obtain a current operation result, stores the obtained current operation result in the second storage device 201, and stores the current operation result in the first storage device 13 through the read-write circuit 203.
And S708, returning to the step of reading the first subdata from the first storage device according to the data reading capacity and the operation instruction and a preset data reading mode until the operation corresponding to the operation instruction is completed.
Specifically, step S708 may include: and controlling the counter to accumulate once or decrement once, and then returning to the step S704, reading the first subdata from the first storage device according to the operation instruction and the data reading capacity and according to a preset data reading mode until the counter is accumulated to the target cycle number from the initial value, or the counter is decremented to the initial value from the target cycle number. In the embodiment of the present application, the initial value of the counter may be 0.
Further, the target number of cycles is calculated according to the data size of the first source operand. After the current operation result of the second storage device 201 is stored in the first storage device 13 through the read-write circuit 203, the counter is controlled to accumulate once, and then the first subdata is continuously read from the first storage device 13 according to the operation instruction and the data reading capacity until the counter is accumulated from 0 to the target cycle number. Or controls the counter to decrement once until the counter decrements the target number of cycles to 0, and stops reading the first sub data from the first storage device 13.
According to the method, the data are read circularly to operate, the operation result of each circulation is stored in the first storage device continuously, exclusive access to the first storage device is achieved, other processor cores are prevented from accessing the first storage device, and atomicity of atomic operation is guaranteed.
In one embodiment, referring to fig. 10, the step S706 includes:
s7062, comparing the currently read first sub-data with the second sub-data, and determining whether the currently read first sub-data is equal to the second sub-data.
Specifically, after the arithmetic circuit 12 obtains the first sub data and the second sub data, the currently read first sub data and the second sub data are compared in a bit, the first sub data and the second sub data of each bit are compared, and whether the currently read first sub data is equal to the second sub data or not is determined.
When the first sub-data is equal to the second sub-data, S7064 is executed to take the third sub-data as a current operation result, and when the first sub-data is not equal to the second sub-data, S7066 is executed to take the first sub-data as a current operation result.
For example, the currently read first subdata and second subdata are multiple data, and the third subdata is multiple data, where the first subdata has a = {1,2,3,4,5,6,7,8}, the second subdata has b = {0,2,3,5, 4,7,8}, and the third subdata has c = {0,1,7,8,9,4,5,9}, each bit of the first subdata in a is compared with each bit of the second subdata in b correspondingly, that is, a first bit of the first subdata 1 in a is compared with a first bit of the second subdata 0 in b, obviously, 1 is greater than 0, indicating that the first subdata is not equal to the second subdata, and then the first subdata 1 is used as an operation result; comparing the second bit 2 in a with the second bit 2 in b, obviously 2 equals to 2, which means that the first subdata equals to the second subdata, at this time, the third subdata 1 of the second bit is taken as the operation result, and so on, after the operation of the first subdata in a and the second subdata in b is completed, the current operation result is {1, 7,4,9,6,5,9}.
In one embodiment, the preset data reading mode may include: after the first sub-data is read from the first storage device 13 for the first time, the current operation result is stored in the first storage device 13, and then after the current operation result is stored in the first storage device 13 each time, the next address of the end address of the first sub-data read last time is used as the start address of the first sub-data read currently. In this embodiment of the application, after the current operation result is stored in the first storage device 13 each time, an address offset is performed on the storage address corresponding to the first subdata, and the address offset is equal to the storage address occupied by the first subdata read last time. According to the method, after the first sub-data is read from the first storage device 13 according to the operation instruction, the currently read first sub-data is stored in the second storage device 201.
For example, the data size of the first source operand is 1000 bytes, and the storage addresses occupied by the 1000 bytes are 0-256; the first sub data read for the first time is 512 bytes, and the storage address of the first sub data can be 0 to 128; wherein, the start address of the first subdata is 0, and the end address is 128. When reading the data for the second time, the next bit of the end address of the first sub-data read last time is used as the start address of the first sub-data read currently, that is, the storage address corresponding to the first sub-data read for the second time may be 129 to 256.
In this embodiment, the first source operand is read cyclically and segmentally, and the next address of the last read end address of the first sub-data is used as the start address of the currently read first sub-data, so that the first storage device 13 is continuously read, and an exclusive function is realized.
In one embodiment, when the current operation result is stored in the first storage device 13, the storage address of the current operation result is consistent with the storage address of the currently read first sub-data. In this embodiment, the current operation result is stored in the position of the read first subdata, so that the accuracy of the data read in a segmented manner is ensured.
In one embodiment, when the current operation result is stored in the first storage device 13, the storage address of the current operation result is consistent with the storage address of the currently read first sub-data. In this embodiment, the current operation result is stored in the position of the read first subdata, so that the accuracy of the data read in a segmented manner is ensured.
In one embodiment, after receiving the operation instruction, the operation circuit 12 can determine that the second source operand is an immediate or data stored in the second storage device 201 according to the operation instruction.
Specifically, when it is determined that the second source operand is an immediate according to the operation instruction, the operation circuit 12 sends a read-write request to the second storage device 201, the read-write circuit 203 sends the immediate to the operation circuit 12, the operation circuit 12 copies the immediate, and a plurality of immediate obtained after copying serve as second sub-data, where the number of the second sub-data is equal to the number of the currently read first sub-data.
For example, the number of the currently read first sub-data is 3 bytes, and when it is determined that the second source operand is an immediate, the second source operand may be a scalar, and if 2 is assumed, the immediate 2 is copied, and then the immediate 2 of 3 bytes is copied, so as to obtain the second sub-data of {2, 2}.
When it is determined that the second source operand is data stored in the second storage device 201, the operation circuit 12 sends a read-write request to the second storage device 201, and the read-write circuit 203 reads second sub-data from a preset storage address of the second storage device 201, where the number of the currently read second sub-data is equal to the number of the first sub-data. See in particular the description above.
In one embodiment, after receiving the operation instruction, the operation circuit 12 can determine that the second source operand is an immediate or data stored in the second storage device 201 and determine that the third source operand is an immediate or data stored in the second storage device 201 according to the operation instruction.
In particular, the manner of reading the second source operand may refer to the above description.
When the third source operand is determined to be an immediate according to the operation instruction, the operation circuit 12 sends a read-write request to the second storage device 201, the read-write circuit 203 sends the immediate to the operation circuit 12, the operation circuit 12 copies the immediate, a plurality of obtained immediate after copying serve as third sub-data, and the number of the third sub-data is equal to the number of the currently read first sub-data.
For example, the number of the currently read first sub-data is 3 bytes, and when the third source operand is determined to be an immediate, the third source operand may be a scalar, and if 2 is assumed, the immediate 2 is copied, and the immediate 2 of the 3 bytes is copied, so that the third sub-data is {2, 2}.
When it is determined that the third source operand is data stored in the second storage device 201, the arithmetic circuit 12 sends a read-write request to the second storage device 201, and the read-write circuit 203 reads third sub-data from a preset storage address of the second storage device 201, where the number of the currently read third sub-data is equal to the number of the first sub-data.
In this embodiment, the second source operand and the third source operand are copied or the second sub-data and the third sub-data are read, so that the number of the second sub-data and the number of the third sub-data are respectively equal to that of the first sub-data, and the first sub-data and the second sub-data are conveniently subjected to alignment operation.
It should be understood that although the various steps in the flow diagrams of fig. 6-10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Also, at least some of the steps in fig. 6-10 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 11, there is provided a computing device including: an obtaining module 100, a reading module 200, an operation module 300 and a counting module 400, wherein:
the obtaining module 100 is configured to obtain an operation instruction.
The reading module 200 is configured to read first subdata from a first storage device according to a data reading capacity and an operation instruction in a preset data reading manner, and store the currently read first subdata in a second storage device.
The operation module 300 is configured to execute an operation according to the operation instruction, obtain a current operation result, store the current operation result in the second storage device and the first storage device, and then circularly call the reading module 200 and the operation module 300 until the operation corresponding to the operation instruction is completed.
Further, the operation device may include a counting module 400 for controlling the counter to increment or decrement once after the current operation result in the second storage device is stored in the first storage device, and then, the reading module 200, the operation module 300, and the counting module 400 are called in a loop until the counter is incremented from the initial value to the target number of loops or the counter is decremented from the target number of loops to the initial value. In the embodiment of the present application, the initial value may be 0.
For the specific limitation of the operation device, reference may be made to the above limitation on the operation method, which is not described herein again. All or part of each module in the computing device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
When the operation instruction is an Atomic CAS, the operation module 300 may include a replacement operation module, where the obtaining module 100 is configured to obtain the operation instruction, where the operation instruction is configured to implement a replacement operation among a first source operand, a second source operand, and a third source operand, the first source operand includes first sub-data, the second source operand includes second sub-data, and the third source operand includes third sub-data. The reading module 200 is configured to read first subdata from a first storage device according to a data reading capacity and an operation instruction in a preset data reading manner, and store the currently read first subdata in a second storage device, where the first storage device is an off-chip storage device and the second storage device is an on-chip storage device. The replacement operation module 307 acquires second sub data and third sub data according to the operation instruction, performs replacement operation on the currently read first sub data, second sub data and third sub data, stores the acquired current operation result in the second storage device and the first storage device, and then circularly calls the reading module 200 and the replacement operation module 307 until the operation corresponding to the operation instruction is completed. Further, the operation device may include a counting module 400 for controlling the counter to increment or decrement once after the current operation result in the second storage device is stored in the first storage device, and then, the reading module 200, the replacement operation module 307, and the counting module 400 are called in a loop until the counter is incremented from the initial value to the target number of loops or the counter is decremented from the target number of loops to the initial value. In the embodiment of the present application, the initial value may be 0.
In the embodiment of the present invention, the specific structure of the operation module is similar to that of the operation circuit in the embodiment, and refer to fig. 2 and the description above.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
the method comprises the steps of obtaining an operation instruction, wherein the operation instruction is used for realizing operation among source operands, and the first source operand comprises more than one first subdata.
Reading first subdata from a first storage device according to the data reading capacity and the operation instruction and a preset data reading mode, and storing the currently read first subdata into a second storage device; the first storage device 13 is an off-chip storage device, and the second storage device 201 is an on-chip storage device.
And executing the operation according to the operation instruction, and storing the obtained current operation result into the second storage device and the first storage device.
And then, returning to the step of reading the first subdata from the first storage device according to the data reading capacity and the operation instruction and a preset data reading mode until the operation corresponding to the operation instruction is completed.
It should be clear that, the steps implemented when the computer program in the embodiment of the present application is executed by the processor are consistent with the execution process of each step of the method in the above embodiments, and specific reference may be made to the above description, and no further description is given here.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (16)

1. A method of data processing, the method comprising:
the method comprises the steps of obtaining an operation instruction, wherein the operation instruction is used for realizing replacement operation among a first source operand, a second source operand and a third source operand, the first source operand comprises first sub-data, the second source operand comprises second sub-data, and the third source operand comprises third sub-data;
reading the first subdata from a first storage device according to data reading capacity and the operation instruction and a preset data reading mode, and storing the currently read first subdata into a second storage device, wherein the first storage device is an off-chip storage device, and the second storage device is an on-chip storage device;
acquiring the second subdata and the third subdata according to the operation instruction, performing replacement operation according to the currently read first subdata, second subdata and third subdata, and storing an obtained current operation result into the second storage device and the first storage device;
and controlling the counter to accumulate once or decrement once, and then returning to the step of reading the first subdata from the first storage device according to the operation instruction and the data reading capacity and a preset data reading mode until the counter accumulates to a target cycle number from an initial value or the counter decrements to the initial value from the target cycle number, so as to finish the operation corresponding to the operation instruction.
2. The data processing method according to claim 1, wherein the step of obtaining the second sub-data and the third sub-data according to the operation instruction, and performing a replacement operation according to the currently read first sub-data, second sub-data, and third sub-data to obtain a current operation result includes:
comparing the currently read first subdata with the second subdata in a contraposition mode, and judging whether the currently read first subdata is equal to the second subdata or not;
when the first subdata is equal to the second subdata, taking the third subdata as the current operation result;
and when the first subdata is not equal to the second subdata, taking the first subdata as the current operation result.
3. The data processing method of claim 1 or 2, wherein the second source operand is an immediate or data stored in the second storage device, the method further comprising:
if the second source operand is an immediate, copying the immediate, copying a plurality of obtained immediate as second subdata, wherein the number of the second subdata is equal to that of the currently read first subdata;
and if the second source operand is data stored in the second storage device, reading the second subdata from a preset storage address of the second storage device, wherein the number of the currently read second subdata is equal to that of the first subdata.
4. A data processing method according to claim 1 or 2, wherein the third source operand is an immediate or data stored in the second storage means, the method further comprising:
if the third source operand is an immediate, copying the immediate, copying a plurality of obtained immediate as third subdata, wherein the number of the third subdata is equal to that of the currently read first subdata;
and if the third source operand is data stored in the second storage device, reading the third subdata from a preset storage address of the second storage device, wherein the number of the currently read third subdata is equal to that of the first subdata.
5. The data processing method according to claim 1, wherein the step of reading the first sub-data from the first storage device according to a preset data reading manner comprises:
and after the current operation result is stored in the first storage device, taking the address next to the end address of the first sub-data read last time as the starting address of the first sub-data read currently.
6. The data processing method of claim 1, wherein the method further comprises:
and when the current operation result is stored in the first storage device, the storage address of the current operation result is consistent with the storage address of the currently read first subdata.
7. The data processing method of claim 1, wherein the method further comprises:
obtaining the data size of the first source operand according to the operation instruction;
and obtaining the target cycle number according to the data size of the first source operand and a preset splitting granularity.
8. The data processing method of claim 1,
the instruction format of the operation instruction comprises an instruction type, a first source operand, a second source operand, a third source operand, a target operand and an operation code;
the instruction type is used for determining whether the operation instruction is an atomic operation instruction;
the instruction type is used for determining the operation type of the operation instruction;
the operation code is used for configuring the number of source operands;
the first source operand and the second source operand are respectively used for representing data participating in operation;
the target operand is used for representing the current operation result.
9. A processor comprising an arithmetic circuitry, a read-write circuitry, and a second storage device disposed adjacent to the arithmetic circuitry, the second storage device being connectable to a first storage device external to the processor via the read-write circuitry;
the arithmetic circuit is used for acquiring an arithmetic instruction and sending a read-write request to the first storage device according to the arithmetic instruction;
the operation instruction is used for realizing replacement operation among a first source operand, a second source operand and a third source operand, wherein the first source operand comprises first sub-data, the second source operand comprises second sub-data, and the third source operand comprises third sub-data;
the read-write circuit is used for reading first subdata from the first storage device according to the read-write request and a preset data reading mode, and storing the first subdata to the second storage device;
the arithmetic circuit is configured to obtain the second sub-data and the third sub-data, perform replacement operation on the currently read first sub-data, second sub-data and third sub-data according to the arithmetic instruction, and store an obtained current operation result in the second storage device and the first storage device; and controlling the counter to accumulate once or decrement once, then sending a read-write request to the first storage device again, reading the first subdata from the first storage device according to the data reading capacity and the operation instruction and a preset data reading mode until the counter accumulates from an initial value to a target cycle time or the counter decrements from the target cycle time to the initial value, and finishing the operation corresponding to the operation instruction.
10. The processor of claim 9, further comprising a data selector, wherein the arithmetic circuit comprises a replacement arithmetic module, wherein the data selector is connected between the arithmetic circuit and the read/write circuit, and wherein the data selector is configured to gate a connection path between the replacement arithmetic module and the read/write circuit;
the replacement operation module is configured to obtain the second sub-data, perform replacement operation on the currently read first sub-data, second sub-data, and third sub-data according to the operation instruction, and store an obtained current operation result in the second storage device and the first storage device.
11. The processor according to claim 10, wherein the replacement operation module comprises an operation unit and a result output unit connected to the operation unit;
the arithmetic unit is used for performing alignment comparison on the currently read first subdata and the currently read second subdata and judging whether the currently read first subdata is equal to the second subdata or not;
the result output unit is used for taking the third subdata as the current operation result when the first subdata is equal to the second subdata;
the result output unit is further configured to take the first sub-data as the current operation result when the first sub-data is not equal to the second sub-data.
12. The processor of claim 10,
the replacement operation module is further configured to determine, according to the operation instruction, that the second source operand is an immediate or data stored in the second storage device;
if the second source operand is determined to be an immediate, the replacement operation module copies the immediate, and copies a plurality of obtained immediate as the second subdata, wherein the number of the second subdata is equal to the number of the currently read first subdata;
if the second source operand is determined to be the data stored in the second storage device, the read-write circuit reads the second subdata from a preset storage address of the second storage device, and the number of the currently read second subdata is equal to the number of the first subdata.
13. The processor of claim 10,
the replacement operation module is further configured to determine, according to the operation instruction, that the third source operand is an immediate or data stored in the second storage device;
if the third source operand is determined to be an immediate, the replacement operation module copies the immediate, copies a plurality of obtained immediate as the third subdata, and the number of the second subdata is equal to the number of the currently read first subdata;
if the third source operand is determined to be the data stored in the second storage device, the read-write circuit reads the third subdata from a preset storage address of the second storage device, and the number of the currently read third subdata is equal to the number of the first subdata.
14. The processor according to any one of claims 10 to 13, wherein the arithmetic circuitry comprises a master processing circuit and more than one slave processing circuits, each of the more than one slave processing circuits being connected to the master processing circuit;
the replacement operation module is arranged in the main processing circuit.
15. A data processing apparatus, characterized in that the apparatus comprises:
the system comprises an obtaining module, a replacing module and a replacing module, wherein the obtaining module is used for obtaining an operation instruction, the operation instruction is used for realizing replacement operation among a first source operand, a second source operand and a third source operand, the first source operand comprises first subdata, the second source operand comprises second subdata, and the third source operand comprises third subdata;
the reading module is used for reading the first subdata from a first storage device according to data reading capacity and the operation instruction and a preset data reading mode, and storing the currently read first subdata into a second storage device, wherein the first storage device is an off-chip storage device, and the second storage device is an on-chip storage device;
the operation module is configured to obtain the second subdata and the third subdata according to the operation instruction, perform replacement operation on the currently read first subdata, second subdata and third subdata, store an obtained current operation result in the second storage device and the first storage device, control a counter to accumulate once or decrease once, then cyclically call the reading module and the operation module, read the first subdata from the first storage device according to a preset data reading manner according to a data reading capacity and the operation instruction until the counter accumulates from an initial value to a target cycle number, or the counter decreases from the target cycle number to the initial value, and complete an operation corresponding to the operation instruction.
16. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN201811454947.4A 2018-11-30 2018-11-30 Data processing method, processor, data processing device and storage medium Active CN111258639B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811454947.4A CN111258639B (en) 2018-11-30 2018-11-30 Data processing method, processor, data processing device and storage medium
PCT/CN2019/121064 WO2020108496A1 (en) 2018-11-30 2019-11-26 Method and device for processing data in atomic operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811454947.4A CN111258639B (en) 2018-11-30 2018-11-30 Data processing method, processor, data processing device and storage medium

Publications (2)

Publication Number Publication Date
CN111258639A CN111258639A (en) 2020-06-09
CN111258639B true CN111258639B (en) 2022-10-04

Family

ID=70946504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811454947.4A Active CN111258639B (en) 2018-11-30 2018-11-30 Data processing method, processor, data processing device and storage medium

Country Status (1)

Country Link
CN (1) CN111258639B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441616A (en) * 2008-11-24 2009-05-27 中国人民解放军信息工程大学 Rapid data exchange structure based on register document and management method thereof
CN102298515A (en) * 2010-06-22 2011-12-28 国际商业机器公司 Method and system for performing an operation on two operands and subsequently storing an original value of operand
CN104699629A (en) * 2015-03-16 2015-06-10 清华大学 Sharing on-chip cache dividing device
CN104794100A (en) * 2015-05-06 2015-07-22 西安电子科技大学 Heterogeneous multi-core processing system based on on-chip network
CN105389277A (en) * 2015-10-29 2016-03-09 中国人民解放军国防科学技术大学 Scientific computation-oriented high performance DMA (Direct Memory Access) part in GPDSP (General-Purpose Digital Signal Processor)
CN107957976A (en) * 2017-12-15 2018-04-24 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium
CN108701027A (en) * 2016-04-02 2018-10-23 英特尔公司 Processor, method, system and instruction for the broader data atom of data width than primary support to be stored to memory

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8996845B2 (en) * 2009-12-22 2015-03-31 Intel Corporation Vector compare-and-exchange operation
US20140181474A1 (en) * 2012-12-26 2014-06-26 Telefonaktiebolaget L M Ericsson (Publ) Atomic write and read microprocessor instructions
US10678545B2 (en) * 2016-07-07 2020-06-09 Texas Instruments Incorporated Data processing apparatus having streaming engine with read and read/advance operand coding

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441616A (en) * 2008-11-24 2009-05-27 中国人民解放军信息工程大学 Rapid data exchange structure based on register document and management method thereof
CN102298515A (en) * 2010-06-22 2011-12-28 国际商业机器公司 Method and system for performing an operation on two operands and subsequently storing an original value of operand
CN104699629A (en) * 2015-03-16 2015-06-10 清华大学 Sharing on-chip cache dividing device
CN104794100A (en) * 2015-05-06 2015-07-22 西安电子科技大学 Heterogeneous multi-core processing system based on on-chip network
CN105389277A (en) * 2015-10-29 2016-03-09 中国人民解放军国防科学技术大学 Scientific computation-oriented high performance DMA (Direct Memory Access) part in GPDSP (General-Purpose Digital Signal Processor)
CN108701027A (en) * 2016-04-02 2018-10-23 英特尔公司 Processor, method, system and instruction for the broader data atom of data width than primary support to be stored to memory
CN107957976A (en) * 2017-12-15 2018-04-24 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108197705A (en) * 2017-12-29 2018-06-22 国民技术股份有限公司 Convolutional neural networks hardware accelerator and convolutional calculation method and storage medium

Also Published As

Publication number Publication date
CN111258639A (en) 2020-06-09

Similar Documents

Publication Publication Date Title
US12072824B2 (en) Multicore bus architecture with non-blocking high performance transaction credit system
BR102020019657A2 (en) apparatus, methods and systems for instructions of a matrix operations accelerator
CN105393240A (en) Method and apparatus for asynchronous processor with auxiliary asynchronous vector processor
US5577256A (en) Data driven type information processor including a combined program memory and memory for queuing operand data
CN113254073A (en) Data processing method and device
CN111258635B (en) Data processing method, processor, data processing device and storage medium
CN111258646B (en) Instruction disassembly method, processor, instruction disassembly device and storage medium
CN109923520B (en) Computer system and memory access technique
CN111258644B (en) Data processing method, processor, data processing device and storage medium
CN111258950B (en) Atomic access and storage method, storage medium, computer equipment, device and system
CN111258639B (en) Data processing method, processor, data processing device and storage medium
CN111258636B (en) Data processing method, processor, data processing device and storage medium
CN111258638B (en) Data processing method, processor, data processing device and storage medium
CN111258647B (en) Data processing method, processor, data processing device and storage medium
CN111258652B (en) Data processing method, processor, data processing device and storage medium
CN111258640B (en) Data processing method, processor, data processing device and storage medium
CN111258637B (en) Data processing method, processor, data processing device and storage medium
CN111258642B (en) Data processing method, processor, data processing device and storage medium
CN116775544B (en) Coprocessor and computer equipment
CN111258645B (en) Data processing method, processor, data processing device and storage medium
CN111258643B (en) Data processing method, processor, data processing device and storage medium
CN111258770B (en) Data processing method, processor, data processing device and storage medium
CN114443143A (en) Instruction processing method, instruction processing device, chip, electronic device and storage medium
JP2005339540A (en) Controller with decoding means
JP7503198B2 (en) Hardware Autoloader

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant