CN111782267B - Data processing method and device and related product - Google Patents
Data processing method and device and related product Download PDFInfo
- Publication number
- CN111782267B CN111782267B CN201910272660.8A CN201910272660A CN111782267B CN 111782267 B CN111782267 B CN 111782267B CN 201910272660 A CN201910272660 A CN 201910272660A CN 111782267 B CN111782267 B CN 111782267B
- Authority
- CN
- China
- Prior art keywords
- descriptor
- processing instruction
- data
- instruction
- identifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
The present disclosure relates to a data processing method and apparatus, and related products, the products including a control module, the control module including: the device comprises an instruction cache unit, an instruction processing unit and a storage queue unit; the instruction cache unit is used for storing the calculation instruction associated with the artificial neural network operation; the instruction processing unit is used for analyzing the calculation instruction to obtain a plurality of operation instructions; the storage queue unit is configured to store an instruction queue, where the instruction queue includes: and a plurality of operation instructions or calculation instructions to be executed according to the front and back sequence of the queue. Through the method, the operation efficiency of related products in operation of the neural network model can be improved.
Description
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and a related product.
Background
With the continuous development of artificial intelligence technology, the amount of data and the data dimension which need to be processed are continuously increased. In the related art, a processor usually obtains parameters of instructions to determine data addresses, and then determines a dependency relationship between the instructions according to the data addresses, where the dependency relationship between the instructions needs to calculate data addresses of operands first, which reduces processing efficiency of the processor.
Disclosure of Invention
In view of this, the present disclosure provides a data processing method and apparatus, and a related product.
According to an aspect of the present disclosure, there is provided a data processing method, the method including: when the operand of the decoded first processing instruction comprises the identifier of a descriptor, judging whether the first processing instruction can be executed or not according to the identifier of the descriptor, wherein the descriptor is used for indicating the shape of the tensor; when the first processing instruction is executable, executing data processing corresponding to the first processing instruction according to the identifier of the descriptor.
According to another aspect of the present disclosure, there is provided a data processing apparatus, the apparatus comprising: the judging module is used for judging whether the first processing instruction can be executed or not according to the identifier of the descriptor when the operand of the decoded first processing instruction comprises the identifier of the descriptor, and the descriptor is used for indicating the shape of the tensor; and the execution module is used for executing data processing corresponding to the first processing instruction according to the identifier of the descriptor when the first processing instruction can be executed.
According to another aspect of the present disclosure, there is provided an artificial intelligence chip comprising a data processing apparatus as described above.
According to another aspect of the present disclosure, there is provided an electronic device including the artificial intelligence chip as described above.
According to another aspect of the present disclosure, a board card is provided, which includes: a memory device, an interface device and a control device and an artificial intelligence chip as described above; wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment; and the control device is used for monitoring the state of the artificial intelligence chip.
According to the data processing method of the embodiment of the disclosure, when the operand of the decoded processing instruction comprises the identifier of the descriptor, whether the instruction is executable or not can be judged through the identifier of the descriptor, and when the instruction is executable, the data processing corresponding to the instruction is executed according to the identifier of the descriptor, so that the complexity of judging whether the instruction is executable or not by the processor can be reduced, and the processing efficiency of the processor can be improved.
Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure.
Fig. 2 shows a schematic diagram of a data storage space of a data processing method according to an embodiment of the present disclosure.
Fig. 3 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure.
Fig. 4 shows a block diagram of a board according to an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the subject matter of the present disclosure.
Fig. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the data processing method includes:
in step S11, when the operand of the decoded first processing instruction includes an identifier of a descriptor, determining whether the first processing instruction is executable according to the identifier of the descriptor, where the descriptor is used to indicate the shape of the tensor;
in step S12, when the first processing instruction is executable, the data processing corresponding to the first processing instruction is executed according to the identifier of the descriptor.
In one possible implementation, the data processing method is applicable to a processor. The processors may include general-purpose processors (e.g., central Processing Unit (CPU), graphics Processor (GPU)) and special-purpose processors (e.g., artificial intelligence processor, scientific computing processor, digital signal processor, etc.). The present disclosure is not limited as to the type of processor.
In one possible implementation, the decoded first processing instruction includes an opcode indicating a type of processing corresponding to the first processing instruction and one or more operands. The first processing instruction may include a data access instruction, an operation instruction, a descriptor management instruction, a synchronous communication instruction, and the like. The present disclosure is not limited to a particular type of first processing instruction.
In one possible implementation, the data to be processed may include N-dimensional tensor data (N is an integer greater than or equal to zero, e.g., N =1, 2, or 3), where the tensor may include multiple forms of data composition, the tensor may be of different dimensions, such as a scalar may be regarded as a 0-dimensional tensor, a vector may be regarded as a 1-dimensional tensor, and a matrix may be a 2-dimensional or more than 2-dimensional tensor. The shape of the tensor includes information such as the dimensions of the tensor, the sizes of the dimensions of the tensor, and the like. For example, for a tensor:
the shape of the tensor can be described by a descriptor as (2, 4), i.e. the tensor is represented by two parameters as a two-dimensional tensor, with a first dimension (column) of size 2 and a second dimension (row) of size 4. It should be noted that the manner in which the descriptors indicate the tensor shape is not limited in the present application. When storing tensor data in a memory, the shape of the tensor data cannot be determined according to the data address (or the storage area), and further, related information such as the correlation among a plurality of tensor data cannot be determined, which results in low access efficiency of the processor to the tensor data.
In one possible implementation, the descriptor may be used to indicate the shape of the tensor data for the N dimensions, N being an integer greater than or equal to zero. The value of N may be determined according to the dimension (order) of the tensor data, or may be set according to the usage requirement of the tensor data. For example, when the value of N is 3, the tensor data is three-dimensional data, and the descriptor can be used to indicate the shape (e.g., offset, size, etc.) of the tensor data in three dimensional directions. It should be understood that the value of N can be set by those skilled in the art according to actual needs, and the disclosure does not limit this.
In one possible implementation, the descriptor may include an identifier and content, etc., and the identifier of the descriptor may be used to distinguish the descriptor, such as a number; the content of the descriptor may include at least one shape parameter (e.g., a size in each dimension direction of the tensor, etc.) representing the shape of the tensor data, and may further include at least one address parameter (e.g., a reference address of the data reference point) representing the address of the tensor data. The present disclosure does not limit the specific parameters included in the content of the descriptor.
In one possible implementation, the identity and content of the descriptor may be stored in a descriptor storage space, which may be a storage space in an internal memory of the control unit (e.g., a register, an on-chip SRAM, or other media cache, etc.). The data storage space of the tensor data indicated by the descriptors may be a storage space in an internal memory (e.g., an on-chip cache) of the control unit or an external memory (e.g., an off-chip memory) connected to the control unit. The data addresses in the data storage space may be actual physical addresses or virtual addresses. The present disclosure does not limit the location of the descriptor storage space and the data storage space and the type of data address.
In one possible implementation, the descriptor's identification, content, and tensor data indicated by the descriptor may be located in the same block of area, for example, a block of contiguous area cached on-chip may be used to store the descriptor's associated content at addresses ADDR0-ADDR1023, where addresses ADDR0-ADDR31 may be used to store the descriptor's identification, addresses ADDR32-ADDR63 may be used to store the descriptor's content, and addresses ADDR64-ADDR1023 may be used to store the tensor data indicated by the descriptor. Here, the address ADDR is not limited to 1 bit or one byte, and is used to represent one address, which is an address unit. The storage area and its address can be determined by those skilled in the art in practical situations, and the present disclosure is not limited thereto.
In one possible implementation, the identifier and content of the descriptor and the tensor data indicated by the descriptor may be stored separately in different areas of the internal memory, for example, a register may be used as the descriptor storage space, the identifier and content of the descriptor may be stored in the register, an on-chip cache may be used as the data storage space, and the tensor data indicated by the descriptor may be stored.
In a possible implementation, a Special Register (SR) dedicated to the descriptor may also be provided, and the data in the descriptor may be an immediate or may be obtained from the special register. When the register is used to store the identifier and the content of the descriptor, the identifier of the descriptor may be represented by using the number of the register, for example, when the number of the register is 0, the identifier of the descriptor stored therein is 0. When the descriptor in the register is valid, an area may be allocated in the buffer space according to the size of the tensor data indicated by the descriptor (for example, a tensor buffer unit is created in the buffer for each tensor data) for storing the tensor data. It should be understood that the tensor data may also be stored in a preset buffer space, which is not limited by the present disclosure.
In one possible implementation, the identity and content of the descriptors may be stored in an internal memory and the tensor data indicated by the descriptors may be stored in an external memory. For example, the identification and content of the descriptors can be stored on-chip, and the tensor data indicated by the descriptors can be stored off-chip.
In one possible implementation, the data address of the data storage space corresponding to the descriptor may be a fixed address. For example, separate data storage spaces may be divided for tensor data, each tensor data having a one-to-one correspondence with an identification of a descriptor at a start address of the data storage space. In this case, the control unit may determine a data address of data corresponding to the operand through the tensor control module according to the contents of the descriptor, and then execute the first processing instruction.
In one possible implementation, when the data address of the data storage space corresponding to the identifier of the descriptor is a variable address, the descriptor may be further used to indicate an address of tensor data of the N-dimension, wherein the content of the descriptor may further include at least one address parameter indicating an address of the tensor data. For example, the tensor data is 3-dimensional data, when the descriptor indicates an address of the tensor data, the content of the descriptor may include one address parameter indicating the address of the tensor data, such as a start address of the tensor data, or may include a plurality of address parameters of the address of the tensor data, such as a start address of the tensor data + an address offset, or the tensor data is based on the address parameters of each dimension. The address parameters can be set by those skilled in the art according to actual needs, and the disclosure does not limit this.
In one possible implementation, the address parameter of the tensor data includes a reference address of a data reference point of the descriptor in the data storage space. Wherein the reference address may be different according to a variation of the data reference point. The present disclosure does not limit the selection of the data reference point.
In one possible implementation, the base address may include a start address of the data storage space. When the data reference point of the descriptor is the first data block of the data storage space, the reference address of the descriptor is the start address of the data storage space. When the data reference point of the descriptor is data other than the first data block in the data storage space, the reference address of the descriptor is the physical address of the data block in the data storage space.
In one possible implementation, the shape parameters of the tensor data include at least one of: a size of the data storage space in at least one of N dimensional directions, a size of a storage region of the tensor data in at least one of the N dimensional directions, an offset amount of the storage region in at least one of the N dimensional directions, positions of at least two vertices located at diagonal positions of the N dimensional directions with respect to the data reference point, and a mapping relationship between a data description position of the tensor data indicated by the descriptor and a data address. Where the data description position is a mapping position of a point or a region in the tensor data indicated by the descriptor, for example, when the tensor data is 3-dimensional data, the descriptor may represent a shape of the tensor data using three-dimensional space coordinates (x, y, z), and the data description position of the tensor data may be a position of a point or a region in the three-dimensional space to which the tensor data is mapped, which is represented using three-dimensional space coordinates (x, y, z).
It should be understood that the shape parameters representing tensor data can be selected by one skilled in the art based on practical circumstances, and the present disclosure is not limited thereto.
Fig. 2 shows a schematic diagram of a data storage space of a data processing method according to an embodiment of the present disclosure. As shown in fig. 2, the data storage space 21 stores a two-dimensional data in a line-first manner, which can be represented by (X, Y) (where the X axis is horizontally right and the Y axis is vertically downward), the size in the X axis direction (the size of each line) is ori _ X (not shown in the figure), the size in the Y axis direction (the total number of lines) is ori _ Y (not shown in the figure), and the starting address PA _ start (the reference address) of the data storage space 21 is the physical address of the first data block 22. The data block 23 is partial data in the data storage space 21, and its offset amount 25 in the X-axis direction is denoted as offset _ X, the offset amount 24 in the Y-axis direction is denoted as offset _ Y, the size in the X-axis direction is denoted as size _ X, and the size in the Y-axis direction is denoted as size _ Y.
In one possible implementation, when the data block 23 is defined by using a descriptor, a data reference point of the descriptor may use a first data block of the data storage space 21, and the reference address of the descriptor is a starting address PA _ start of the data storage space 21, and then the content of the descriptor of the data block 23 may be determined by combining a size ori _ X of the data storage space 21 in the X axis direction, a size ori _ Y of the data storage space 21 in the Y axis direction, an offset _ Y of the data block 23 in the Y axis direction, an offset _ X in the X axis direction, a size _ X in the X axis direction, and a size _ Y in the Y axis direction.
In one possible implementation, the content of the descriptor can be represented using the following formula (1):
it should be understood that, although the descriptor describes a two-dimensional space in the above example, the dimension of the content representation of the descriptor can be set by those skilled in the art according to the actual situation, and the disclosure does not limit this.
In one possible implementation, the content of the descriptor of the tensor data may be determined according to a reference address of a data reference point of the descriptor in the data storage space, and positions of at least two vertices located at diagonal positions in N dimensional directions relative to the data reference point.
For example, the contents of the descriptor of the data block 23 in fig. 2 may be determined using the reference address PA _ base of the data reference point of the descriptor in the data storage space, and the positions of the two vertices of the angular position relative to the data reference point. First, a data reference point of the descriptor and its reference address PA _ base in the data storage space are determined, for example, one data (e.g., data with a position of (2, 2)) can be selected as the data reference point in the data storage space 21, and the physical address of the data in the data storage space is taken as the reference address PA _ base; then, the positions of at least two vertices of the diagonal positions of the data block 23 with respect to the data reference point are determined, for example, using the positions of the diagonal position vertices with respect to the data reference point in the top-left to bottom-right direction, where the relative position of the top-left vertex is (x _ min, y _ min) and the relative position of the bottom-right vertex is (x _ max, y _ max), and then the content of the descriptor of the data block 23 can be determined according to the reference address PA _ base, the relative position of the top-left vertex (x _ min, y _ min), and the relative position of the bottom-right vertex (x _ max, y _ max).
In one possible implementation, the content of the descriptor can be represented using the following equation (2):
it should be understood that, although the two vertices of the upper left corner and the lower right corner are used to determine the content of the descriptor in the above example, a person skilled in the art may set the specific vertex of the at least two vertices according to actual needs, and the disclosure is not limited thereto.
In one possible implementation manner, the content of the descriptor of the tensor data can be determined according to a reference address of the data reference point of the descriptor in the data storage space and a mapping relation between the data description position and the data address of the tensor data indicated by the descriptor. The mapping relationship between the data description position and the data address may be set according to actual needs, for example, when tensor data indicated by the descriptor is three-dimensional space data, the mapping relationship between the data description position and the data address may be defined by using a function f (x, y, z).
In one possible implementation, the content of the descriptor can be represented using the following equation (3):
it should be understood that, a person skilled in the art may set the mapping relationship between the data description location and the data address according to practical situations, and the disclosure does not limit this.
In a possible implementation manner, in step S11, when the operand of the decoded first processing instruction includes an identifier of a descriptor, it may be determined whether the first processing instruction is executable according to the identifier of the descriptor. Wherein, whether the identifiers of the descriptors are the same can indicate whether the tensor data indicated by the descriptors are the same. The identification of the descriptor is simpler than the storage address of the data, and the process of judging whether the preamble instruction which also operates the descriptor exists through the identification of the descriptor is simpler and more efficient than the process of judging whether the preamble instruction exists through the storage address of the data, wherein the preamble instruction can be a processing instruction which has a dependency relationship with the first processing instruction.
In one possible implementation, whether the first processing instruction is executable or not may be determined according to the identifier of the descriptor and a preset execution condition. For example, where registers are used to store the identifier of the descriptor, where each register holds one descriptor identifier, multiple instructions accessing the same register may be executed in instruction commit order, and thus it may be determined whether all instructions accessing the same register prior to the first processing instruction in the instruction queue have completed; and determining that the first processing instruction is executable after the instruction accessing the same register is executed. The preset execution condition may be that the required descriptor is registered first, and the descriptor is not used up and can not be logged off. The preset execution condition is not limited by the present disclosure.
In one possible implementation, step S11 may include: judging whether a second unprocessed processing instruction exists according to the identifier of the descriptor, wherein the second processing instruction comprises a processing instruction which is in an instruction queue before the first processing instruction and has the identifier of the descriptor in an operand; determining that the first processing instruction is executable when the second processing instruction is not present.
That is to say, when the operand of the first processing instruction includes the identifier of the descriptor, it may be determined whether a second processing instruction that is before the first processing instruction and has the identifier of the descriptor in the operand exists in the instruction queue according to the identifier of the descriptor, and the found second processing instruction is used as the processing instruction having a dependency relationship with the first processing instruction. In case of an operand of a first processing instruction having an identification of a plurality of descriptors, the dependency corresponding to each descriptor may be determined separately, i.e. a preceding instruction in the operand having an identification of at least one of the plurality of descriptors may be treated as a second processing instruction having a dependency.
When there is a second processing instruction that is not processed, the first processing instruction is not executable; when the second processing instruction is not present, the first processing instruction is executable.
For example, when the operand of the first processing instruction includes an identifier of at least one descriptor, and whether there is a second processing instruction that has not completed execution is determined, the identifiers of all descriptors included in the operand may be determined, when there is an identifier of at least one descriptor in the operand of the first processing instruction that is the same as the identifier of the descriptor in the operand of the second processing instruction, the first processing instruction and the second processing instruction have a dependency relationship, and when the second processing instruction has not completed execution, the first processing instruction may not be executed.
For example, where the first processing instruction is ADD; TR10; TR11; TR12, the second processing instruction is ADD; TR10; TR11; and TR12, the identifiers of the descriptors in the operands of the first processing instruction and the second operation instruction are identical, and the first processing instruction and the second processing instruction have a dependency relationship. The first processing instruction is not executable when the second processing instruction is not executed.
When the first processing instruction is ADD; TR10; TR11; TR13, the second processing instruction is ADD; TR10; TR11; TR12, the identifiers (TR 10 and TR 11) of two descriptors in operands of the first processing instruction and the second operation instruction are the same, and the first processing instruction and the second processing instruction have a dependency relationship. The first processing instruction is not executable when the second processing instruction is not executed.
The first processing instruction is ADD; TR10; TR12; TR13, the second processing instruction is ADD; TR10; TR14; TR15, the first processing instruction and the second processing instruction have a dependency relationship if the operands of the first processing instruction and the second operation instruction have the same identifier (TR 10). The first processing instruction may not be executable when the second processing instruction is not executed.
The first processing instruction is ADD; TR10; TR11; TR12, the second processing instruction is ADD; TR13; TR14; and TR15, the identifiers of the descriptors in the operands of the first processing instruction and the second operation instruction are completely different, and the first processing instruction and the second processing instruction have no dependency relationship. The first processing instruction may be executed when the second processing instruction is not completely executed.
The first processing instruction is SUM; TR10, the second processing instruction is SUM; TR10, the identifiers of the descriptors in the operands of the first processing instruction and the second operation instruction are identical, and the first processing instruction and the second processing instruction have a dependency relationship. The first processing instruction is not executable when the second processing instruction is not executed.
By the method, whether the instruction can be executed or not can be directly judged according to the identifier of the descriptor, the base address and the operation range of the operand related in the instruction do not need to be obtained for many times, the data address and the operation range of the operand in the instruction are obtained through calculation, the complexity of judging whether the instruction can be executed or not by the processor is reduced, the analysis process of the data address of the operand in the instruction is simplified, and the execution efficiency of the processor is improved.
In one possible implementation, at least one of the first processing instruction and the second processing instruction comprises a write operation to the descriptor.
For example, if the first processing instruction is a read instruction for descriptor TR2 and the second processing instruction is also a read instruction for descriptor TR2, i.e. neither the first processing instruction nor the second processing instruction comprises a write operation for descriptor TR2, then the first processing instruction may be executed. If the second processing instruction is a write instruction to TR2, the first processing instruction may not be executed when the second processing instruction is not finished processing.
In this way, one descriptor allows multiple instructions to operate at the same time, and the concurrent execution efficiency of the instructions can be improved, so that the processing efficiency of the processor is improved.
In one possible implementation, the operand of the first processing instruction may include an identification of at least one descriptor, and step S11 may include: respectively determining a first state of each descriptor according to the identifier of the at least one descriptor, wherein the first state comprises a registered state or an unregistered state; determining that the first processing instruction is executable when the first state of each descriptor is a registered state. That is, the first processing instruction may execute when the status of all descriptors included in the operand is registered.
For example, the first processing instruction includes in its operands the identities TR3 and TR4 of the two descriptors. From the identifiers TR3 and TR4 of the descriptors, it is possible to determine the status (registered or unregistered) of TR3 and TR4, the first processing instruction being not executable when the status of at least one of TR3 and TR4 is unregistered; at this time, the descriptor registration instruction may be called to register TR3 and/or TR4, and after the registration is successful, the status of TR3 and/or TR4 is changed to registered. The first processing instruction is executable when the states of the descriptors TR3 and TR4 are both registered.
In one possible implementation, the first state of the descriptor may be represented in a variety of ways. For example, a first flag bit may be set in the descriptor to indicate a first state, e.g., an identifier of the descriptor may be saved in a register, and a highest order bit of the register may be used as the first flag bit to store information about the descriptor starting from a second highest order bit. A state correspondence table may also be provided, and the first state of the descriptor is written into the state correspondence table. The person skilled in the art can set the representation of the first state according to actual needs, and the present disclosure does not limit this.
In this way, whether the instruction is executable or not can be judged according to the first state of the descriptor, and the complexity of judging whether the instruction is executable or not by the processor is reduced. Taking the logout operation as an example, when the descriptor is to be logout, the operation can be completed only by changing the state without clearing the storage area related to the descriptor, and when other descriptors use the space, the area is directly covered. Taking an operation as an example, the state of the operator can be directly determined first, and when the operator is invalid, the instruction can be determined to be unexecutable through the first state, so that the instruction can be blocked without further determination.
In one possible implementation, the operand of the first processing instruction may include an identification of at least one descriptor, and step S11 may include: respectively determining a second state of each descriptor according to the identifier of the at least one descriptor, wherein the second state comprises an operable state or an inoperable state; and when the second state of each descriptor is an operable state, determining that the first processing instruction is executable.
For example, where the preceding instruction of the first processing instruction is currently operating on (e.g., writing or reading) the descriptor, the current state of the descriptor is an inoperable state. In this state, the first processing instruction cannot be executed, and may be blocked or cached. Conversely, in the case where there is no preamble instruction currently operating on the descriptor, the current state of the descriptor may be set to an operable state. In this state, the first processing instruction can be executed.
In a possible implementation manner, when there are more than two preamble instructions for operating the descriptor, the operable state is represented by "0", the inoperable state is represented by "1", and after all the preamble instructions are operated, the flag bit of the second state is "0", otherwise, the flag bit is "1"; or, the operable state is represented by "0", the inoperable state is represented by "N", N is the number of the preamble instructions for operating the descriptor, N-1 after each preamble instruction is operated, until the value of the flag bit is 0, and the second state of the descriptor is the operable state. The present disclosure is not limited to the particular manner of representation of the states.
In one possible implementation, the second state of the descriptor may include an operable state or an inoperable state, wherein the second state may be represented in a variety of ways. For example, a second flag bit may be set in the descriptor to indicate the second state, and the second state of the descriptor may also be written into the state correspondence table. The state mapping table can be stored in a register, and the judgment of the first state and the second state in the state mapping table can be realized in a hardware mode. The person skilled in the art can set the representation of the second state according to actual needs, and the disclosure does not limit this.
In this way, whether the instruction is operable or not can be judged according to the second state of the descriptor, and the complexity of judging whether the instruction is operable or not by the processor is reduced. Taking the operation as an example, directly judging whether the second state of the descriptor related to the instruction is an operable state, without acquiring the base address and the operation range of the operand for the operation data related to the instruction, thereby acquiring the actual operation area of the operation, and then judging whether the areas are overlapped to obtain the conclusion whether the instruction is the operable state.
In a possible implementation manner, when it is determined that the first processing instruction is executable through step S11, in step S12, data processing corresponding to the first processing instruction may be executed according to the identifier of the descriptor. That is, when the first processing instruction is executable, the data address of the tensor data indicated by the descriptor can be obtained by calculation according to the identifier of the descriptor, and then the tensor data is read from the data address and the data processing corresponding to the first processing instruction is executed.
In one possible implementation manner, the data address of the tensor data indicated by the descriptor can be directly obtained according to the identifier of the descriptor, for example, when the content of the descriptor is the data address of the tensor data, the data address can be directly read from the descriptor storage space without calculation, the tensor data is read from the data address, and the data processing corresponding to the first processing instruction is executed.
In one possible implementation, step S12 may include: acquiring the content of the descriptor from a descriptor storage space according to the identifier of the descriptor; determining the data address of the data corresponding to the operand in the data storage space according to the content of the descriptor; and executing data processing corresponding to the first processing instruction according to the data address.
In this embodiment, when the first processing instruction is executable, the content of the descriptor may be retrieved from the descriptor storage space according to the identifier of the descriptor in the operand. That is, based on the identifier of the descriptor, the location of the descriptor in the descriptor storage space can be determined, and the content of the descriptor can be obtained from the descriptor storage space. Therefore, the complexity of software programming can be reduced, the data storage mode of a hardware side does not need to be known at a software side, and the actual storage address of the hardware is calculated; and meanwhile, the complexity of the instruction can be reduced, and parameters (such as the content in the descriptor) which are used for multiple times do not need to be written into the instruction every time the parameters are used.
After the content of the descriptor is obtained, the data address of the data corresponding to the operand in the data storage space may be determined according to the content of the descriptor. The calculation of the data address can be automatically completed through hardware or realized through a software mode. The data addresses of the data in the data storage space corresponding to the operands may be calculated differently when the descriptors differ in content. The present disclosure does not limit the manner in which the data address is calculated.
For example, in the case where the content of the descriptor is expressed by the formula (1), the data description position is set to (x) for any data point in the tensor data q ,y q ) Then the data address PA2 of the data point in the data storage space (x,y) The following equation (4) may be used to determine:
PA2 (x,y) =PA_start+(offset_y+y q -1)*ori_x+(offset_x+x q ) (4)
after the data address of the data corresponding to the operand in the data storage space is obtained, the data processing corresponding to the first processing instruction may be performed according to the data address.
For example, ADD for the first processing instruction; a; b, if the identifiers TR5 and TR6 of the descriptors are respectively included in the operands A and B, the contents (such as shape parameters and address parameters) of the descriptors TR5 and TR6 can be respectively obtained from the descriptor storage space according to the TR5 and TR 6; then, based on the contents of the descriptors TR5 and TR6, the data addresses of data a and B are calculated, respectively, where address 1 of data a in the data memory space is ADDR64-ADDR127, and address 2 of data B in the data memory space is ADDR1023-ADDR1087. Then, data is read from address 1 and address 2, respectively, and an Addition (ADD) operation is performed to obtain an operation result (a + B).
In one possible implementation, the method may further include: when the first processing instruction is a descriptor registration instruction, acquiring registration parameters of a descriptor in the first processing instruction, wherein the registration parameters comprise at least one of an identifier of the descriptor, a tensor shape and content of tensor data indicated by the descriptor; judging whether the first processing instruction can be executed or not according to the registration parameters of the descriptors; executing the first processing instruction when the first processing instruction is executable.
In this embodiment, when the first processing instruction is a descriptor registration instruction, registration parameters of the descriptor may be acquired from the first processing instruction, where the registration parameters may include at least one of an identifier of the descriptor, a tensor shape, and content of tensor data indicated by the descriptor; then, according to the registration parameter of the descriptor, whether the first processing instruction is executable or not is judged, that is, whether the descriptor is registrable or not is judged according to the registration parameter of the descriptor. For example, in the case where the identifier of the descriptor is occupied or the descriptor storage space is insufficient, the descriptor cannot be successfully registered. It should be understood that the judgment of the registration parameters can be set by those skilled in the art according to practical situations, and the disclosure does not limit this.
In a possible implementation manner, determining whether the first processing instruction is executable according to the registration parameter of the descriptor may include: determining that the first processing instruction is executable upon satisfying at least one of an identification of the descriptor being unoccupied, a first storage region storing content of the descriptor being unoccupied, and a second storage region storing tensor data indicated by the descriptor being unoccupied. That is, the first processing instruction may be executable when the registration parameter satisfies at least one of the descriptor's identity being unoccupied, the first memory region being unoccupied, or the second memory region being unoccupied.
The first processing instruction (descriptor registration instruction) is executed when the first processing instruction is executable. For example, a first storage area of the content of the descriptor in the descriptor storage space and a second storage area of the content of the tensor data indicated by the descriptor in the data storage area may be determined first; then, determining the content of the descriptor according to the registration parameters and the second storage area, namely establishing the corresponding relation between the descriptor and the second storage area; then, the content of the descriptor is stored in the first storage area, and the registration of the descriptor is completed.
In one possible implementation, the method further includes: when the first processing instruction is a descriptor cancelling instruction, judging whether a fourth processing instruction which does not finish processing exists according to the identifier of the descriptor in the first processing instruction, wherein the fourth processing instruction is a processing instruction which is in an instruction queue and the operand of which comprises the identifier of the descriptor; executing the first processing instruction when there is no fourth processing instruction that does not complete processing.
In this embodiment, when the first processing instruction is a descriptor cancelling instruction, it may be determined whether there is a fourth processing instruction whose operand includes an identifier of a descriptor in the instruction queue according to the identifier of the descriptor, and when there is a fourth processing instruction whose processing is not completed, the first processing instruction may not be executed; when there is no fourth processing instruction which does not complete processing, the first processing instruction, that is, the descriptor cancelling instruction, is executed, and at this time, the storage area of the descriptor in the descriptor storage space and the storage area of the data indicated by the descriptor in the data storage space can be released respectively.
For example, the first processing instruction is a descriptor cancellation instruction, the cancelled descriptor is identified as TR7, and a fourth processing instruction whose operand includes TR7 may be first looked up in the instruction queue, for example, two fourth processing instructions in the instruction queue: an operation instruction and a read instruction for TR 7; then judging whether the execution of the two fourth processing instructions (the operation instruction and the read instruction aiming at TR 7) is finished; when neither of the fourth processing instructions completes execution, the first processing instruction (descriptor cancellation instruction) is not executable; when execution of both fourth processing instructions is complete, the first processing instruction may execute. After that, a first processing instruction (descriptor cancelling instruction) is executed to release the storage area of the TR7 in the descriptor storage space and the storage area of the data indicated by the TR6 in the data storage space, respectively.
In one possible implementation, when the first processing instruction is a descriptor deregistration instruction, whether the first state of the descriptor is registered or not may be determined according to the identifier of the descriptor. The first processing instruction (descriptor deregistration instruction) is executable when the first state of the descriptor is registered. Otherwise, the first processing instruction (descriptor cancellation instruction) is not executable. That is, the descriptor unregister instruction is executable only when the descriptor in the operand is in a registered state.
In one possible implementation, the method may further include: blocking or caching the first processing instruction when the first processing instruction is not executable. That is, when the first processing instruction is not executable, the first processing instruction may be blocked, and the execution of the first processing instruction and the other instructions after the first processing instruction is suspended until the execution of the second processing instruction is completed, and then the first processing instruction and the other instructions after the first processing instruction are executed; the first processing instruction can also be cached, the first processing instruction is stored in a preset cache space without influencing the execution of other instructions, and the first processing instruction in the cache space is executed after the execution of the second processing instruction is completed. The present disclosure does not limit the processing manner in which the first processing instruction is not executable.
According to the data processing method of the embodiment of the disclosure, when the operand of the decoded processing instruction comprises the identifier of the descriptor, whether the instruction is executable or not can be judged through the identifier of the descriptor, and when the instruction is executable, the data processing corresponding to the instruction is executed according to the identifier of the descriptor, so that the complexity of judging whether the instruction is executable or not by the processor can be reduced, and the processing efficiency of the processor can be improved.
It should be noted that, although the data processing method is described above by taking the above-mentioned embodiment as an example, those skilled in the art can understand that the present disclosure should not be limited thereto. In fact, the user can flexibly set each step according to personal preference and/or actual application scene, as long as the technical scheme of the disclosure is met.
Fig. 3 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the data processing apparatus includes:
a determining module 31, configured to determine, when an operand of a decoded first processing instruction includes an identifier of a descriptor, whether the first processing instruction is executable according to the identifier of the descriptor, where the descriptor is used to indicate a shape of a tensor;
and the execution module 32 is configured to execute data processing corresponding to the first processing instruction according to the identifier of the descriptor when the first processing instruction is executable.
In one possible implementation, the execution module 32 includes: the content acquisition submodule is used for acquiring the content of the descriptor from the descriptor storage space according to the identifier of the descriptor; the address determination submodule is used for determining the data address of the data corresponding to the operand in the data storage space according to the content of the descriptor; and the first execution submodule is used for executing data processing corresponding to the first processing instruction according to the data address.
In a possible implementation manner, the determining module 31 includes: the instruction judging submodule is used for judging whether a second unprocessed processing instruction exists or not according to the identifier of the descriptor, wherein the second processing instruction comprises a processing instruction which is in front of the first processing instruction in an instruction queue and has the identifier of the descriptor in an operand; a first execution determination submodule, configured to determine that the first processing instruction is executable when the second processing instruction does not exist.
In one possible implementation, at least one of the first processing instruction and the second processing instruction comprises a write operation to the descriptor.
In a possible implementation manner, the operand of the first processing instruction includes an identifier of at least one descriptor, where the determining module 31 includes: a first state determining submodule, configured to determine a first state of each descriptor according to an identifier of the at least one descriptor, where the first state includes a registered state or an unregistered state; and the second execution determining submodule is used for determining that the first processing instruction can be executed when the first state of each descriptor is a registered state.
In a possible implementation manner, the operand of the first processing instruction includes an identifier of at least one descriptor, where the determining module 31 includes: a second state determining submodule, configured to determine a second state of each descriptor according to an identifier of the at least one descriptor, where the second state includes an operable state or an inoperable state; and the third execution determination submodule is used for determining that the first processing instruction is executable when the second state of each descriptor is an operable state.
In one possible implementation, the apparatus further includes: a logout judging module, configured to, when the first processing instruction is a descriptor logout instruction, judge whether a fourth processing instruction that does not complete processing exists according to an identifier of a descriptor in the first processing instruction, where the fourth processing instruction is a processing instruction in an instruction queue, where an operand of the fourth processing instruction includes the identifier of the descriptor; and the logout execution module is used for executing the first processing instruction when the fourth processing instruction which does not finish processing does not exist.
In one possible implementation, the apparatus further includes: a parameter obtaining module, configured to, when the first processing instruction is a descriptor registration instruction, obtain registration parameters of a descriptor in the first processing instruction, where the registration parameters include at least one of an identifier of the descriptor, a tensor shape, and content of tensor data indicated by the descriptor; the registration judging module is used for judging whether the first processing instruction can be executed or not according to the registration parameters of the descriptors; and the register execution module is used for executing the first processing instruction when the first processing instruction can be executed.
In a possible implementation manner, the registration determining module includes: the condition judgment sub-module is used for determining that the first processing instruction is executable when at least one of the identifier of the descriptor is unoccupied, a first storage area for storing the content of the descriptor is unoccupied and a second storage area for storing tensor data indicated by the descriptor is unoccupied is met.
In one possible implementation, the apparatus further includes: an execution control module to block or cache the first processing instruction when the first processing instruction is not executable.
In one possible implementation, the descriptor is for indicating a shape of tensor data of dimension N, N being an integer greater than or equal to zero, wherein the content of the descriptor includes at least one shape parameter representing the shape of the tensor data.
In one possible implementation, the descriptor is further configured to indicate an address of the N-dimensional tensor data, wherein the content of the descriptor further includes at least one address parameter representing the address of the tensor data.
In one possible implementation, the address parameter of the tensor data comprises a reference address of a data reference point of the descriptor in a data storage space of the tensor data; wherein the shape parameters of the tensor data comprise at least one of: a size of the data storage space in at least one of N dimensional directions, a size of a storage region of the tensor data in at least one of the N dimensional directions, an offset amount of the storage region in at least one of the N dimensional directions, positions of at least two vertices located at diagonal positions of the N dimensional directions with respect to the data reference point, and a mapping relationship between a data description position of the tensor data indicated by the descriptor and a data address.
In a possible implementation manner, an artificial intelligence chip is also disclosed, which comprises the data processing device.
In a possible implementation manner, a board card is further disclosed, which comprises a storage device, an interface device, a control device and the artificial intelligence chip; wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment; and the control device is used for monitoring the state of the artificial intelligence chip.
Fig. 4 shows a block diagram of a board according to an embodiment of the present disclosure, and referring to fig. 4, the board may include other kit components besides the artificial intelligence chip 389, where the kit components include, but are not limited to: memory device 390, interface device 391 and control device 392;
the memory device 390 is connected to the artificial intelligence chip through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the artificial intelligence chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of a clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controllers are used for data transmission, and 8 bits are used for ECC checking. It can be understood that when DDR4-3200 grains are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600MB/s.
In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And arranging a controller for controlling DDR in the artificial intelligence chip, wherein the controller is used for controlling data transmission and data storage of each storage unit.
The interface device is electrically connected with the artificial intelligence chip. The interface device is used for realizing data transmission between the artificial intelligence chip and external equipment (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the artificial intelligence chip by the server through a standard PCIE interface, so that data transfer is realized. Preferably, when PCIE 3.0X 16 interface is adopted for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation results of the chip are still transmitted back to an external device (e.g. a server) by the interface means.
The control device is electrically connected with the artificial intelligence chip. The control device is used for monitoring the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device can be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). As the artificial intelligence chip can comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, a plurality of loads can be driven. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligence chip.
In one possible implementation manner, an electronic device is disclosed, which comprises the artificial intelligence chip. The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance instrument, a B ultrasonic instrument and/or an electrocardiograph.
A1, a data processing method, the method comprising:
when the operand of the decoded first processing instruction comprises the identifier of a descriptor, judging whether the first processing instruction can be executed or not according to the identifier of the descriptor, wherein the descriptor is used for indicating the shape of a tensor;
and when the first processing instruction is executable, executing data processing corresponding to the first processing instruction according to the identifier of the descriptor.
The method of claim A1, wherein performing data processing corresponding to the first processing instruction according to the identifier of the descriptor comprises:
acquiring the content of the descriptor from a descriptor storage space according to the identifier of the descriptor;
determining the data address of the data corresponding to the operand in the data storage space according to the content of the descriptor;
and executing data processing corresponding to the first processing instruction according to the data address.
The method of claim A1, wherein determining whether the first processing instruction is executable according to the identifier of the descriptor includes:
judging whether a second unprocessed processing instruction exists according to the identifier of the descriptor, wherein the second processing instruction comprises a processing instruction which is in an instruction queue before the first processing instruction and has the identifier of the descriptor in an operand;
determining that the first processing instruction is executable when the second processing instruction is not present.
A4, the method of claim A3, at least one of the first processing instruction and the second processing instruction comprising a write operation to the descriptor.
A5, the method of claim A1, the operand of the first processing instruction comprising an identification of at least one descriptor,
wherein, according to the identifier of the descriptor, determining whether the first processing instruction is executable includes:
respectively determining a first state of each descriptor according to the identifier of the at least one descriptor, wherein the first state comprises a registered state or an unregistered state;
determining that the first processing instruction is executable when the first state of each descriptor is a registered state.
A6, the method of claim A1, the operand of the first processing instruction comprising an identification of at least one descriptor,
wherein, according to the identifier of the descriptor, determining whether the first processing instruction is executable includes:
respectively determining a second state of each descriptor according to the identifier of the at least one descriptor, wherein the second state comprises an operable state or an inoperable state;
determining that the first processing instruction is executable when the second state of each descriptor is an operational state.
A7, the method of any one of claims A1-A6, further comprising:
when the first processing instruction is a descriptor cancelling instruction, judging whether a fourth processing instruction which does not finish processing exists according to the identifier of the descriptor in the first processing instruction, wherein the fourth processing instruction is a processing instruction which is in an instruction queue and the operand of which comprises the identifier of the descriptor;
executing the first processing instruction when there is no fourth processing instruction that does not complete processing.
A8, the method of any one of claims A1-A7, further comprising:
when the first processing instruction is a descriptor registration instruction, acquiring registration parameters of a descriptor in the first processing instruction, wherein the registration parameters comprise at least one of an identifier of the descriptor, a tensor shape and content of tensor data indicated by the descriptor;
judging whether the first processing instruction can be executed or not according to the registration parameters of the descriptors;
executing the first processing instruction when the first processing instruction is executable.
The method according to claim A8 and A9, wherein determining whether the first processing instruction is executable according to the registration parameter of the descriptor includes:
determining that the first processing instruction is executable upon satisfying at least one of an identification of the descriptor being unoccupied, a first storage region storing content of the descriptor being unoccupied, and a second storage region storing tensor data indicated by the descriptor being unoccupied.
A10, the method of any one of claims A1-A9, the method further comprising:
blocking or caching the first processing instruction when the first processing instruction is not executable.
A11, the method of claim A1, the descriptor indicating a shape of tensor data of N dimensions, N being an integer greater than or equal to zero,
wherein the content of the descriptor comprises at least one shape parameter representing a shape of tensor data.
A12, the method of claim a11, the descriptor further for indicating an address of the N-dimensional tensor data, wherein the content of the descriptor further comprises at least one address parameter representing the address of the tensor data.
A13, the method according to claim a12, wherein the address parameters of the tensor data include a reference address of a data reference point of the descriptor in a data storage space of the tensor data;
wherein the shape parameters of the tensor data comprise at least one of:
a size of the data storage space in at least one of N dimensional directions, a size of a storage region of the tensor data in at least one of the N dimensional directions, an offset amount of the storage region in at least one of the N dimensional directions, positions of at least two vertices located at diagonal positions of the N dimensional directions with respect to the data reference point, and a mapping relationship between a data description position of the tensor data indicated by the descriptor and a data address.
A14, a data processing apparatus, the apparatus comprising:
the judging module is used for judging whether the first processing instruction can be executed or not according to the identifier of the descriptor when the operand of the decoded first processing instruction comprises the identifier of the descriptor, and the descriptor is used for indicating the shape of the tensor;
and the execution module is used for executing data processing corresponding to the first processing instruction according to the identifier of the descriptor when the first processing instruction can be executed.
The apparatus according to claim a14, wherein the execution module includes:
the content acquisition submodule is used for acquiring the content of the descriptor from the descriptor storage space according to the identifier of the descriptor;
the address determination submodule is used for determining the data address of the data corresponding to the operand in the data storage space according to the content of the descriptor;
and the first execution submodule is used for executing data processing corresponding to the first processing instruction according to the data address.
The apparatus of claim a14, the determining means comprising:
the instruction judgment submodule is used for judging whether a second unprocessed processing instruction exists or not according to the identifier of the descriptor, wherein the second processing instruction comprises a processing instruction which is in an instruction queue and is before the first processing instruction and has the identifier of the descriptor in an operand;
and the first execution determining submodule is used for determining that the first processing instruction can be executed when the second processing instruction does not exist.
The apparatus of claim a17, at least one of the first processing instruction and the second processing instruction comprising a write operation to the descriptor.
A18, the apparatus of claim A14, an operand of said first processing instruction comprising an identification of at least one descriptor,
wherein, the judging module comprises:
a first state determining submodule, configured to determine a first state of each descriptor according to an identifier of the at least one descriptor, where the first state includes a registered state or an unregistered state;
and the second execution determining submodule is used for determining that the first processing instruction can be executed when the first state of each descriptor is a registered state.
A19, the apparatus of claim A14, the operand of the first processing instruction comprising an identification of at least one descriptor,
wherein, the judging module comprises:
a second state determining submodule, configured to determine a second state of each descriptor according to an identifier of the at least one descriptor, where the second state includes an operable state or an inoperable state;
and the third execution determination submodule is used for determining that the first processing instruction is executable when the second state of each descriptor is an operable state.
A20, the apparatus of any one of claims a14-a19, further comprising:
a logout judging module, configured to, when the first processing instruction is a descriptor logout instruction, judge whether a fourth processing instruction that does not complete processing exists according to an identifier of a descriptor in the first processing instruction, where the fourth processing instruction is a processing instruction in an instruction queue, and an operand of the fourth processing instruction includes the identifier of the descriptor;
and the logout execution module is used for executing the first processing instruction when no fourth processing instruction which does not finish processing exists.
A21, the apparatus of any one of claims a14 to a20, further comprising:
a parameter obtaining module, configured to, when the first processing instruction is a descriptor registration instruction, obtain registration parameters of a descriptor in the first processing instruction, where the registration parameters include at least one of an identifier of the descriptor, a tensor shape, and content of tensor data indicated by the descriptor;
the registration judging module is used for judging whether the first processing instruction can be executed or not according to the registration parameters of the descriptors;
and the register execution module is used for executing the first processing instruction when the first processing instruction can be executed.
The apparatus according to claim a21, wherein the registration determining module includes:
and the condition judgment sub-module is used for determining that the first processing instruction can be executed when at least one of the condition that the identifier of the descriptor is not occupied, the condition that a first storage area for storing the content of the descriptor is not occupied and the condition that a second storage area for storing tensor data indicated by the descriptor is not occupied is met.
A23, the apparatus of any one of claims a14-a22, further comprising:
an execution control module to block or cache the first processing instruction when the first processing instruction is not executable.
The apparatus of claim A14, the descriptor indicating a shape of tensor data of N dimensions, N being an integer greater than or equal to zero,
wherein the content of the descriptor comprises at least one shape parameter representing a shape of tensor data.
The apparatus of claim a24, the descriptor further for indicating an address of the N-dimensional tensor data, wherein the content of the descriptor further comprises at least one address parameter representing the address of the tensor data.
The apparatus according to claim a25, wherein the address parameters of the tensor data include a reference address of a data reference point of the descriptor in a data storage space of the tensor data;
wherein the shape parameters of the tensor data comprise at least one of:
a size of the data storage space in at least one of N dimensional directions, a size of a storage region of the tensor data in at least one of the N dimensional directions, an offset amount of the storage region in at least one of the N dimensional directions, positions of at least two vertices located at diagonal positions of the N dimensional directions with respect to the data reference point, and a mapping relationship between a data description position of the tensor data indicated by the descriptor and a data address.
A27, an artificial intelligence chip, said chip comprising a data processing apparatus according to any of claims a14-a 26.
A28, an electronic device comprising the artificial intelligence chip of claim A27.
A29, a board card, wherein the board card comprises: a memory device, an interface device and a control device and an artificial intelligence chip according to claim a 27;
wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment;
and the control device is used for monitoring the state of the artificial intelligence chip.
A30, the card of claim a29, the memory device comprising: the system comprises a plurality of groups of storage units, wherein each group of storage units is connected with the artificial intelligence chip through a bus, and the storage units are as follows: DDR SDRAM;
the chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit;
the interface device is as follows: a standard PCIE interface.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims (28)
1. A method of data processing, the method comprising:
when the operand of the decoded first processing instruction comprises the identifier of a descriptor, judging whether the first processing instruction can be executed or not according to the identifier of the descriptor, wherein the descriptor is used for indicating the shape of a tensor;
when the first processing instruction is executable, executing data processing corresponding to the first processing instruction according to the identifier of the descriptor;
wherein, according to the identifier of the descriptor, determining whether the first processing instruction is executable includes:
judging whether a second unprocessed processing instruction exists or not according to the identifier of the descriptor, wherein the second processing instruction comprises a processing instruction which is in an instruction queue and is before the first processing instruction and has the identifier of the descriptor in an operand;
determining that the first processing instruction is executable when there is no second processing instruction.
2. The method of claim 1, wherein performing data processing corresponding to the first processing instruction based on the identifier of the descriptor comprises:
acquiring the content of the descriptor from a descriptor storage space according to the identifier of the descriptor;
determining the data address of the data corresponding to the operand in the data storage space according to the content of the descriptor;
and executing data processing corresponding to the first processing instruction according to the data address.
3. The method of claim 1, wherein at least one of the first processing instruction and the second processing instruction comprises a write operation to the descriptor.
4. The method of claim 1, wherein an operand of the first processing instruction comprises an identification of at least one descriptor,
wherein, according to the identifier of the descriptor, determining whether the first processing instruction is executable includes:
respectively determining a first state of each descriptor according to the identifier of the at least one descriptor, wherein the first state comprises a registered state or an unregistered state;
determining that the first processing instruction is executable when the first state of each descriptor is a registered state.
5. The method of claim 1, wherein an operand of the first processing instruction comprises an identification of at least one descriptor,
wherein, according to the identifier of the descriptor, determining whether the first processing instruction is executable includes:
respectively determining a second state of each descriptor according to the identifier of the at least one descriptor, wherein the second state comprises an operable state or an inoperable state;
and when the second state of each descriptor is an operable state, determining that the first processing instruction is executable.
6. The method of claim 1, further comprising:
when the first processing instruction is a descriptor cancelling instruction, judging whether a fourth processing instruction which does not finish processing exists according to the identifier of the descriptor in the first processing instruction, wherein the fourth processing instruction is a processing instruction which is in an instruction queue and the operand of which comprises the identifier of the descriptor;
executing the first processing instruction when there is no fourth processing instruction that does not complete processing.
7. The method of claim 1, further comprising:
when the first processing instruction is a descriptor registration instruction, acquiring registration parameters of a descriptor in the first processing instruction, wherein the registration parameters comprise at least one of an identifier of the descriptor, a tensor shape and the content of tensor data indicated by the descriptor;
judging whether the first processing instruction can be executed or not according to the registration parameters of the descriptors;
executing the first processing instruction when the first processing instruction is executable.
8. The method of claim 7, wherein determining whether the first processing instruction is executable according to the registration parameter of the descriptor comprises:
determining that the first processing instruction is executable upon satisfying at least one of an identification of the descriptor being unoccupied, a first storage region storing content of the descriptor being unoccupied, and a second storage region storing tensor data indicated by the descriptor being unoccupied.
9. The method according to any one of claims 1-8, further comprising:
blocking or caching the first processing instruction when the first processing instruction is not executable.
10. The method of claim 1, wherein the descriptor is used to indicate a shape of tensor data of N dimensions, N being an integer greater than or equal to zero,
wherein the content of the descriptor comprises at least one shape parameter representing a shape of tensor data.
11. The method of claim 10, wherein the descriptor is further for indicating an address of tensor data of the N-dimension, wherein the content of the descriptor further comprises at least one address parameter representing the address of the tensor data.
12. The method of claim 11, wherein the address parameters of the tensor data include a reference address of a data reference point of the descriptor in a data storage space of the tensor data;
wherein the shape parameters of the tensor data comprise at least one of:
the size of the data storage space in at least one of the N dimensional directions, the size of the storage region of the tensor data in at least one of the N dimensional directions, the offset of the storage region in at least one of the N dimensional directions, the positions of at least two vertices located at diagonal positions of the N dimensional directions relative to the data reference point, and the mapping relationship between the data description position of the tensor data indicated by the descriptor and the data address.
13. A data processing apparatus, characterized in that the apparatus comprises:
the judging module is used for judging whether the first processing instruction can be executed or not according to the identifier of the descriptor when the operand of the decoded first processing instruction comprises the identifier of the descriptor, and the descriptor is used for indicating the shape of the tensor;
the execution module is used for executing data processing corresponding to the first processing instruction according to the identifier of the descriptor when the first processing instruction is executable;
wherein, the judging module comprises:
the instruction judgment submodule is used for judging whether a second unprocessed processing instruction exists or not according to the identifier of the descriptor, wherein the second processing instruction comprises a processing instruction which is in an instruction queue and is before the first processing instruction and has the identifier of the descriptor in an operand;
and the first execution determining submodule is used for determining that the first processing instruction can be executed when the second processing instruction does not exist.
14. The apparatus of claim 13, wherein the execution module comprises:
the content acquisition submodule is used for acquiring the content of the descriptor from the descriptor storage space according to the identifier of the descriptor;
the address determination submodule is used for determining the data address of the data corresponding to the operand in the data storage space according to the content of the descriptor;
and the first execution submodule is used for executing data processing corresponding to the first processing instruction according to the data address.
15. The apparatus of claim 13, wherein at least one of the first processing instruction and the second processing instruction comprises a write operation to the descriptor.
16. The apparatus of claim 13, wherein an operand of the first processing instruction comprises an identification of at least one descriptor,
wherein, the judging module comprises:
a first state determining submodule, configured to determine a first state of each descriptor according to an identifier of the at least one descriptor, where the first state includes a registered state or an unregistered state;
and the second execution determining submodule is used for determining that the first processing instruction can be executed when the first state of each descriptor is a registered state.
17. The apparatus of claim 13, wherein an operand for the first processing instruction comprises an identification of at least one descriptor,
wherein, the judging module comprises:
a second state determining sub-module, configured to determine a second state of each descriptor according to the identifier of the at least one descriptor, where the second state includes an operable state or an inoperable state;
and the third execution determination submodule is used for determining that the first processing instruction is executable when the second state of each descriptor is an operable state.
18. The apparatus of claim 13, further comprising:
a logout judging module, configured to, when the first processing instruction is a descriptor logout instruction, judge whether a fourth processing instruction that does not complete processing exists according to an identifier of a descriptor in the first processing instruction, where the fourth processing instruction is a processing instruction in an instruction queue, where an operand of the fourth processing instruction includes the identifier of the descriptor;
and the logout execution module is used for executing the first processing instruction when no fourth processing instruction which does not finish processing exists.
19. The apparatus of claim 13, further comprising:
a parameter obtaining module, configured to, when the first processing instruction is a descriptor registration instruction, obtain registration parameters of a descriptor in the first processing instruction, where the registration parameters include at least one of an identifier of the descriptor, a tensor shape, and content of tensor data indicated by the descriptor;
the registration judging module is used for judging whether the first processing instruction can be executed or not according to the registration parameters of the descriptors;
and the register execution module is used for executing the first processing instruction when the first processing instruction can be executed.
20. The apparatus of claim 19, wherein the registration determination module comprises:
and the condition judgment sub-module is used for determining that the first processing instruction can be executed when at least one of the condition that the identifier of the descriptor is not occupied, the condition that a first storage area for storing the content of the descriptor is not occupied and the condition that a second storage area for storing tensor data indicated by the descriptor is not occupied is met.
21. The apparatus of any one of claims 13-20, further comprising:
an execution control module to block or cache the first processing instruction when the first processing instruction is not executable.
22. The apparatus of claim 13, wherein the descriptor is to indicate a shape of tensor data for N dimensions, N being an integer greater than or equal to zero,
wherein the content of the descriptor comprises at least one shape parameter representing a shape of tensor data.
23. The apparatus of claim 22, wherein the descriptor is further configured to indicate an address of tensor data for the N-dimension, and wherein the content of the descriptor further comprises at least one address parameter indicative of the address of the tensor data.
24. The apparatus of claim 23, wherein the address parameters of the tensor data comprise a reference address of a data reference point of the descriptor in a data storage space of the tensor data;
wherein the shape parameters of the tensor data comprise at least one of:
a size of the data storage space in at least one of N dimensional directions, a size of a storage region of the tensor data in at least one of the N dimensional directions, an offset amount of the storage region in at least one of the N dimensional directions, positions of at least two vertices located at diagonal positions of the N dimensional directions with respect to the data reference point, and a mapping relationship between a data description position of the tensor data indicated by the descriptor and a data address.
25. An artificial intelligence chip, wherein the chip comprises a data processing apparatus according to any one of claims 13 to 24.
26. An electronic device, characterized in that it comprises an artificial intelligence chip according to claim 25.
27. A board, the board comprising: a memory device, an interface device and a control device and an artificial intelligence chip according to claim 25;
wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment;
and the control device is used for monitoring the state of the artificial intelligence chip.
28. The card of claim 27,
the memory device includes: the artificial intelligence chip comprises a plurality of groups of storage units, wherein each group of storage unit is connected with the artificial intelligence chip through a bus, and the storage units are as follows: DDR SDRAM;
the chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit;
the interface device is as follows: a standard PCIE interface.
Priority Applications (23)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910272660.8A CN111782267B (en) | 2019-04-04 | 2019-04-04 | Data processing method and device and related product |
EP20783678.4A EP3800547A4 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
EP20217332.4A EP3828698B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
KR1020207032006A KR20210002518A (en) | 2019-04-04 | 2020-04-01 | Data processing methods and devices and related products |
KR1020207036505A KR102550451B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
EP20217331.6A EP3825847B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
EP20217330.8A EP3825846A1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
KR1020207036492A KR102519470B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
JP2021510522A JP7073580B2 (en) | 2019-04-04 | 2020-04-01 | Data processing methods, equipment, and related products |
EP20217328.2A EP3825842B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
KR1020207036496A KR102522416B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
PCT/CN2020/082775 WO2020200244A1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
KR1020207036508A KR102379406B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
KR1020207036494A KR102569336B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
EP20217329.0A EP3825843B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
EP20217333.2A EP3825848A1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
KR1020207036500A KR102579192B1 (en) | 2019-04-04 | 2020-04-01 | Data processing method and apparatus, and related product |
JP2020198158A JP7121101B2 (en) | 2019-04-04 | 2020-11-30 | Data processing method, apparatus, and related products |
JP2020198041A JP2021170312A (en) | 2019-04-04 | 2020-11-30 | Data processing method, apparatus, and related product |
JP2020198102A JP7150802B2 (en) | 2019-04-04 | 2020-11-30 | Data processing method, apparatus, and related products |
JP2020198079A JP7121100B2 (en) | 2019-04-04 | 2020-11-30 | Data processing method, apparatus, and related products |
JP2020198021A JP7239547B2 (en) | 2019-04-04 | 2020-11-30 | Data processing method, apparatus, and related products |
JP2020198177A JP7121102B2 (en) | 2019-04-04 | 2020-11-30 | Data processing method, apparatus, and related products |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910272660.8A CN111782267B (en) | 2019-04-04 | 2019-04-04 | Data processing method and device and related product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111782267A CN111782267A (en) | 2020-10-16 |
CN111782267B true CN111782267B (en) | 2022-12-09 |
Family
ID=72755018
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910272660.8A Active CN111782267B (en) | 2019-04-04 | 2019-04-04 | Data processing method and device and related product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111782267B (en) |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1282067C (en) * | 2004-08-09 | 2006-10-25 | 威盛电子股份有限公司 | Device and relative method for hardware array appositive operation |
EP3191984B1 (en) * | 2014-09-10 | 2021-03-10 | Amazon Technologies Inc. | Scalable log-based transaction management |
US9898292B2 (en) * | 2015-02-25 | 2018-02-20 | Mireplica Technology, Llc | Hardware instruction generation unit for specialized processors |
KR102395541B1 (en) * | 2015-07-09 | 2022-05-11 | 에스케이하이닉스 주식회사 | Memory control unit and data storage device including the same |
US9977619B2 (en) * | 2015-11-06 | 2018-05-22 | Vivante Corporation | Transfer descriptor for memory access commands |
US10817293B2 (en) * | 2017-04-28 | 2020-10-27 | Tenstorrent Inc. | Processing core with metadata actuated conditional graph execution |
US10338925B2 (en) * | 2017-05-24 | 2019-07-02 | Microsoft Technology Licensing, Llc | Tensor register files |
CN109543832B (en) * | 2018-11-27 | 2020-03-20 | 中科寒武纪科技股份有限公司 | Computing device and board card |
-
2019
- 2019-04-04 CN CN201910272660.8A patent/CN111782267B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111782267A (en) | 2020-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111782133A (en) | Data processing method and device and related product | |
JP7239547B2 (en) | Data processing method, apparatus, and related products | |
US7415576B2 (en) | Data processor with block transfer control | |
CN111857828B (en) | Processor operation method and device and related product | |
CN111782577B (en) | Data processing device and method and related product | |
CN111831337B (en) | Data synchronization method and device and related product | |
CN112347186B (en) | Data synchronization method and device and related product | |
CN111782267B (en) | Data processing method and device and related product | |
CN113807507B (en) | Data processing method and device and related products | |
CN112306945B (en) | Data synchronization method and device and related products | |
CN111782274B (en) | Data processing device and related product | |
CN111813449A (en) | Operation method, device and related product | |
CN111831329B (en) | Data processing method and device and related product | |
CN111783992A (en) | Data processing device and related product | |
CN111831722A (en) | Data synchronization method and device and related product | |
CN113806246A (en) | Data processing device and method and related product | |
CN112347027A (en) | Data synchronization method and device and related product | |
CN111857829B (en) | Processor operation method and device and related products | |
CN112347026B (en) | Data synchronization method and device and related product | |
CN111325331B (en) | Operation method, device and related product | |
CN113867686A (en) | Operation method, device and related product | |
CN114282159A (en) | Data processing device, integrated circuit chip, equipment and method for realizing the same | |
CN111857829A (en) | Processor operation method and device and related product | |
CN114489790A (en) | Data processing device, data processing method and related product | |
CN112347185A (en) | Data synchronization method and device and related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |