WO2021018313A1 - 数据同步方法及装置以及相关产品 - Google Patents

数据同步方法及装置以及相关产品 Download PDF

Info

Publication number
WO2021018313A1
WO2021018313A1 PCT/CN2020/111259 CN2020111259W WO2021018313A1 WO 2021018313 A1 WO2021018313 A1 WO 2021018313A1 CN 2020111259 W CN2020111259 W CN 2020111259W WO 2021018313 A1 WO2021018313 A1 WO 2021018313A1
Authority
WO
WIPO (PCT)
Prior art keywords
synchronized
tensor data
data
descriptor
synchronization
Prior art date
Application number
PCT/CN2020/111259
Other languages
English (en)
French (fr)
Inventor
曾洪博
王秉睿
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021018313A1 publication Critical patent/WO2021018313A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a data synchronization method and device and related products.
  • the present disclosure proposes a data synchronization technical solution.
  • a data synchronization method which is applied to a first processor and includes: determining synchronization information of the tensor data according to a descriptor of the tensor data to be synchronized, and The descriptor is used to indicate the shape of the tensor data to be synchronized; the synchronization instruction is generated according to the synchronization information of the tensor data; the synchronization instruction is sent to the second processor, and the synchronization instruction is used to indicate the second The processor obtains the tensor data to be synchronized according to the synchronization instruction.
  • a data synchronization method is provided.
  • the method is applied to a second processor and includes: parsing synchronization instructions from the first processor to obtain synchronization information of tensor data to be synchronized;
  • the synchronization information of the tensor data to be synchronized determines the descriptor of the tensor data to be synchronized, and the descriptor is used to indicate the shape of the tensor data to be synchronized; according to the tensor data to be synchronized To obtain the tensor data to be synchronized.
  • a data synchronization method which is applied to a second processor and includes: when there is tensor data to be synchronized, generating a synchronization request instruction, the synchronization request instruction being used for Instruct the first processor to determine the descriptor of the tensor data to be synchronized according to the synchronization request instruction, where the descriptor is used to indicate the shape of the tensor data to be synchronized; send the synchronization request to the first processor instruction.
  • a data synchronization device which is applied to a first processor and includes: a first information determination module, configured to determine the data according to the descriptor of the tensor data to be synchronized Synchronization information of tensor data, the descriptor is used to indicate the shape of the tensor data to be synchronized; a first instruction generation module, used to generate synchronization instructions according to the synchronization information of the tensor data; a first instruction sending module , For sending the synchronization instruction to the second processor, where the synchronization instruction is used for instructing the second processor to obtain the tensor data to be synchronized according to the synchronization instruction.
  • a data synchronization device which is applied to a second processor and includes: a second information determination module, configured to parse a synchronization instruction from the first processor to obtain the data to be synchronized Synchronization information of tensor data; the second descriptor determining module is used to determine the descriptor of the tensor data to be synchronized according to the synchronization information of the tensor data to be synchronized, and the descriptor is used to indicate The shape of the synchronized tensor data; the first data obtaining module is configured to obtain the tensor data to be synchronized according to the descriptor of the tensor data to be synchronized.
  • a data synchronization device which is applied to a second processor and includes: a second instruction generation module, configured to generate a synchronization request instruction when there is tensor data to be synchronized ,
  • the synchronization request instruction is used to instruct the first processor to determine the descriptor of the tensor data to be synchronized according to the synchronization request instruction, and the descriptor is used to indicate the shape of the tensor data to be synchronized; the second instruction is sent
  • the module is used to send the synchronization request instruction to the first processor.
  • an artificial intelligence chip including the data synchronization device as described above.
  • an electronic device including the artificial intelligence chip as described above.
  • a board card comprising: a storage device, an interface device, a control device, and the artificial intelligence chip as described above; wherein the artificial intelligence chip and the storage device , The control device and the interface device are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and external equipment; the control device, Used to monitor the state of the artificial intelligence chip.
  • the synchronization information of the tensor data is determined according to the descriptor, the synchronization instruction is generated according to the synchronization information, and the synchronization instruction is sent to the second processor to instruct the second processor
  • the second processor obtains the tensor data to be synchronized according to the synchronization instruction, thereby reducing synchronization overhead and improving the efficiency of data synchronization.
  • Fig. 1 shows a schematic diagram of a processing system of a data synchronization method according to an embodiment of the present disclosure.
  • Fig. 2 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure.
  • Fig. 3 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure.
  • Fig. 4 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure.
  • Fig. 5 shows a schematic diagram of data storage space of a data synchronization method according to an embodiment of the present disclosure.
  • Fig. 6 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • Fig. 7 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • Fig. 8 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • Fig. 9 shows a structural block diagram of a board according to an embodiment of the present disclosure.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • the data synchronization method according to the embodiment of the present disclosure can be applied to any processor of a processing system (for example, an artificial intelligence chip) including multiple processors (multi-core).
  • the processor may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor (IPU) for performing artificial intelligence operations.
  • Artificial intelligence operations may include machine learning operations, brain-like operations, etc. Among them, machine learning operations include neural network operations, k-means operations, and support vector machine operations.
  • the artificial intelligence processor may, for example, include GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • GPU Graphics Processing Unit
  • NPU Neuro-Network Processing Unit
  • DSP Digital Signal Process, digital signal processing unit
  • field programmable gate array Field-Programmable Gate Array, FPGA
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as: convolution operation tasks, pooling tasks Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks executed by the processing unit.
  • Fig. 1 shows a schematic diagram of a processing system of a data synchronization method according to an embodiment of the present disclosure.
  • the processing system 100 includes multiple processors 101 and a memory 102.
  • the multiple processors 101 are used to execute instruction sequences.
  • the memory 102 is used to store data, and may include random access memory (RAM, Random Access Memory) and registers. heap.
  • RAM random access memory
  • the multiple processors 101 in the processing system 100 can not only share part of the storage space, for example, share part of the RAM storage space and the register file, but also have their own storage space at the same time.
  • Fig. 2 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure. As shown in Figure 2, the method is applied to the first processor (any processor in the processing system), and the method includes:
  • step S11 determine synchronization information of the tensor data according to the descriptor of the tensor data to be synchronized, and the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • step S12 generate a synchronization instruction according to the synchronization information of the tensor data
  • step S13 Send the synchronization instruction to a second processor, where the synchronization instruction is used to instruct the second processor to obtain the tensor data to be synchronized according to the synchronization instruction.
  • Quantities can be of different dimensions.
  • a scalar can be regarded as a 0-dimensional tensor
  • a vector can be regarded as a 1-dimensional tensor
  • a matrix can be a tensor of 2 dimensions or more than 2 dimensions.
  • the shape of a tensor includes information such as the dimensions of the tensor and the size of each dimension of the tensor. For example, for tensors:
  • the shape of the tensor can be described by the descriptor as (2, 4), which means that the tensor is a two-dimensional tensor through two parameters, and the size of the first dimension (column) of the tensor is 2. The size of the second dimension (row) is 4. It should be noted that the present disclosure does not limit the manner in which the descriptor indicates the shape of the tensor. When storing tensor data in the memory, the shape of the tensor data cannot be determined according to its data address (or storage area), and the relationship between multiple tensor data cannot be determined. The access efficiency is low, and the complexity of data synchronization is also greater.
  • a descriptor (tensor descriptor) can be set to indicate the shape of tensor data (N-dimensional tensor data).
  • the value of N can be determined according to the dimensionality (order) of the tensor data, or can be set according to the needs of the tensor data.
  • the tensor data is three-dimensional tensor data
  • the descriptor can be used to indicate the shape of the three-dimensional tensor data in three dimensions (such as offset, size, etc.). It should be understood that those skilled in the art can set the value of N according to actual needs, which is not limited in the present disclosure.
  • the descriptor may include identification and content, etc.
  • the identifier of the descriptor may be used to distinguish the descriptor, for example, a number; the content of the descriptor may include at least one shape representing the shape of the tensor data
  • the parameters (for example, the size in each dimension of the tensor, etc.) may also include at least one address parameter representing the address of the tensor data (for example, the reference address of the data reference point).
  • the present disclosure does not limit the specific parameters included in the content of the descriptor.
  • the shape of tensor data can be expressed, and related information such as the relationship between multiple tensor data can be determined, which improves the efficiency of access to tensor data, thereby reducing The complexity of data synchronization.
  • data synchronization between multiple processors may be required, for example, the calculation result of processor A1 is synchronized to the processor A2 is used as input data for another operation.
  • a descriptor-based data synchronization mechanism can be used to achieve data synchronization.
  • the first processor is the sender of data synchronization
  • the second processor is the receiver of data synchronization.
  • the first processor may determine synchronization information of the tensor data (such as the identity, shape, source, storage address, etc. of the tensor data) according to the descriptor of the tensor data in step S11 Information);
  • step S12 a synchronization instruction is generated according to the synchronization information, and the synchronization instruction is sent to the second processor to be synchronized in step S13.
  • the second processor may include a general-purpose processor (such as a central processing unit CPU, a graphics processor GPU) and a dedicated processor (such as an artificial intelligence processor, a scientific computing processor, or a digital signal processor, etc.).
  • the type of the second processor may be the same as or different from the type of the first processor, and the present disclosure does not limit the type of the second processor.
  • the first processor can actively initiate data synchronization with the second processor. For example, when the first processor completes an operation and obtains the operation result (tensor data), actively initiates the The data of the second processor of the operation result is synchronized.
  • the first processor may also initiate data synchronization to the second processor in response to the synchronization request of the second processor, for example, when receiving the synchronization request instruction of the second processor, initiate the synchronization request to the second processor.
  • the processor's data is synchronized. The present disclosure does not limit the timing of initiating data synchronization.
  • the descriptor of the tensor data may be acquired.
  • the descriptor may be a registered (created) descriptor used to indicate the shape of the tensor data, or a new descriptor may be registered (created) according to the shape parameter of the tensor data, which is not limited in the present disclosure.
  • the synchronization information of the tensor data can be determined according to the descriptor of the tensor data.
  • the synchronization information may include at least one of the identification (for example, data number), shape, source, and storage address of the tensor data.
  • a synchronization instruction can be generated.
  • the synchronization instruction may only include part of the synchronization information, such as the identifier of the tensor data, to Instruct the second processor to synchronize the tensor data according to the identifier of the tensor data; if the second processor does not have information about the tensor data, the synchronization instruction can include more synchronization information, such as the tensor data.
  • the identifier and storage address are used to instruct the second processor to synchronize the tensor data according to the corresponding information. The present disclosure does not limit the specific content included in the synchronization command.
  • the synchronization instruction may be sent to the second processor to instruct the second processor to obtain the tensor data to be synchronized according to the synchronization instruction.
  • the second processor can determine the tensor data to be synchronized according to the identifier, and register or obtain a descriptor indicating the tensor data to be synchronized, and then obtain according to the content of the descriptor The tensor data indicated by the descriptor, thereby achieving synchronization of the tensor data.
  • the second processor can register a descriptor indicating the tensor data to be synchronized according to the synchronization information in the instruction, and directly obtain it according to the content of the descriptor The tensor data indicated by the descriptor, thereby achieving synchronization of the tensor data.
  • the synchronization information of the tensor data is determined according to the descriptor, the synchronization instruction is generated according to the synchronization information, and the synchronization instruction is sent to the second processor, To instruct the second processor to obtain the tensor data to be synchronized according to the synchronization instruction, so as to reduce synchronization overhead and improve the efficiency of data synchronization without changing the structure of the synchronization instruction.
  • the synchronization information may include the storage address of the tensor data to be synchronized.
  • Step S12 may include: when the storage address of the tensor data to be synchronized is in the shared storage space, generating a synchronization instruction according to the storage address of the tensor data to be synchronized to instruct the second processor to slave the The shared storage space obtains the tensor data to be synchronized.
  • multiple processors may have a shared storage space, such as an off-chip memory that can be accessed by both the first processor and the second processor.
  • the shared storage space can be a storage space where multiple cores (multiple processors) can access data, or a storage space where part of cores (part of processors) can access data, and can be preset with inter-core Shared storage space, the present disclosure does not limit the setting method of the shared storage space.
  • the storage address of the tensor data to be synchronized can be determined according to the content of the descriptor of the tensor data to be synchronized. If the storage address of the tensor data to be synchronized is in the shared storage space, since the second processor can also access data from the shared storage space, the second processor can directly read the tensor data according to the storage address of the tensor data To achieve synchronization.
  • the synchronization instruction may include the storage address of the tensor data to be synchronized, that is, the synchronization instruction may be generated according to the storage address of the tensor data to be synchronized.
  • the second processor can parse the instruction to obtain the storage address of the tensor data; according to the storage address of the tensor data, the second processor can register (create) the descriptor of the tensor data to be synchronized, so that The content of the descriptor corresponds to the data address of the tensor data, and the tensor data to be synchronized is obtained from the shared storage space, thereby realizing the entire synchronization process.
  • the synchronization information includes the storage address of the tensor data to be synchronized.
  • Step S12 may include: when the storage address of the tensor data to be synchronized is in the non-shared storage space, storing the tensor data to be synchronized in the shared storage space; The address in the space generates a synchronization instruction to instruct the second processor to obtain the tensor data to be synchronized from the shared storage space.
  • the first processor may have a non-shared storage space, the first processor can access data in the non-shared storage space, but the second processor cannot access the non-shared storage space of the first processor. Access the data in the non-shared storage space. If the storage address of the tensor data to be synchronized is in a non-shared storage space, the second processor cannot directly obtain the tensor data. In this case, the first processor may dump the tensor data to be synchronized to the shared storage space, so that the second processor can access the tensor data.
  • the first processor may generate a descriptor of the tensor data to be synchronized, that is, register a new descriptor to indicate the tensor data in the shared storage space.
  • the first processor may generate a synchronization instruction according to the address of the tensor data to be synchronized in the shared storage space.
  • the second processor can parse the instruction to obtain the storage address of the tensor data to be synchronized; according to the storage address of the tensor data, the second processor can register (create) the description of the tensor data to be synchronized Symbol, so that the content of the descriptor corresponds to the data address of the tensor data, and the tensor data to be synchronized is obtained from the shared storage space, thereby realizing the entire synchronization process.
  • the tensor data to be synchronized in the non-shared storage space can be actively dumped to the shared storage space, so that the second processor can obtain the tensor data to be synchronized, thereby reducing the data between processors during synchronization Transmission to improve the processing efficiency of synchronization.
  • the method further includes: determining the descriptor of the tensor data to be synchronized according to a synchronization request instruction from the second processor.
  • the first processor may initiate data synchronization with the second processor in response to the synchronization request of the second processor.
  • the synchronization request instruction from the second processor may include information of the tensor data to be synchronized, for example, the data characteristics of the tensor data to be synchronized.
  • the data characteristics of the tensor data may include information such as the identification, shape, source, address of the tensor data, and the present disclosure does not limit the specific content of the synchronization request instruction.
  • the first processor may determine the descriptor of the tensor data to be synchronized, determine the synchronization information of the tensor data according to the descriptor, and then generate the synchronization instruction.
  • the descriptor of the tensor data to be synchronized can be determined according to the synchronization request of the second processor, so as to generate synchronization instructions, thereby avoiding unnecessary data synchronization and improving the efficiency of data synchronization.
  • the synchronization request instruction includes the data characteristics of the tensor data to be synchronized, and the synchronization request instruction from the second processor determines the size of the tensor data to be synchronized
  • the descriptor steps can include:
  • the descriptor of the tensor data to be synchronized is determined.
  • the synchronization request instruction may include data features, such as tensors The identity of the data.
  • the first processor may parse the synchronization request instruction from the second processor to obtain the data characteristics of the tensor data to be synchronized.
  • the data characteristics of the tensor data to be synchronized may include information such as the identification, shape, source, and address of the tensor data.
  • the data source of the tensor data is the Kth sender (the Kth processor)
  • the data source of the tensor data is the result of the convolution operation numbered 200
  • the address of the tensor data is specific
  • the address area for example, addresses ADDR0-ADDR127
  • the shape of the tensor data is a specified shape (for example, a two-dimensional tensor of 20*10), etc.
  • Those skilled in the art can set the data characteristics of the tensor data to be synchronized according to the actual situation, which is not limited in the present disclosure.
  • the first processor can find the tensor data to be synchronized, and determine the descriptor of the tensor data to be synchronized, for example, directly obtain or newly register the corresponding description symbol. According to the descriptor of the tensor data to be synchronized, the synchronization information of the tensor data can be determined, thereby generating and sending a synchronization instruction to instruct the second processor to synchronize the tensor data.
  • the descriptor of the tensor data to be synchronized can be determined according to the data characteristics in the request instruction, so as to realize the synchronization of the tensor data, so that the tensor data itself does not need to be transmitted during synchronization, which reduces the amount of data transmitted.
  • synchronization overhead improves processing efficiency.
  • Fig. 3 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure.
  • the data synchronization method can be applied to the second processor.
  • the data synchronization method includes:
  • step S21 parse the synchronization instruction from the first processor to obtain synchronization information of the tensor data to be synchronized
  • a descriptor of the tensor data to be synchronized is determined according to the synchronization information of the tensor data to be synchronized, and the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • step S23 the tensor data to be synchronized is obtained according to the descriptor of the tensor data to be synchronized.
  • the first processor can actively initiate data synchronization with the second processor (receiver). For example, when the first processor completes an operation and obtains the result (tensor data), it actively initiates Synchronize the data of the second processor that needs to use the operation result.
  • the second processor when the second processor receives the synchronization instruction from the first processor, it can parse the synchronization instruction to obtain synchronization information of the tensor data to be synchronized (for example, the identifier of the tensor data, Shape, storage address, etc.).
  • the second processor may internally search for the tensor data corresponding to the identifier of the tensor data and/or the description corresponding to the tensor data According to the content of the descriptor, the tensor data to be synchronized is obtained, so as to realize the synchronization of the tensor data.
  • the second processor may register a description indicating the tensor data to be synchronized according to the shape and storage address of the tensor data According to the content of the descriptor, the tensor data to be synchronized is obtained, so as to realize the synchronization of the tensor data.
  • the data synchronization method of the embodiment of the present disclosure by setting the descriptor indicating the shape of the tensor data, it is possible to determine the descriptor of the tensor data according to the synchronization information of the tensor data to be synchronized in the synchronization instruction, and then obtain The tensor data realizes synchronization of the tensor data, thereby reducing synchronization overhead, reducing the complexity of data synchronization, and improving the efficiency of data synchronization.
  • the synchronization information includes the storage address of the tensor data to be synchronized
  • Step S22 includes: determining the identifier of the descriptor of the tensor data to be synchronized and/or the content of the descriptor according to the storage address of the tensor data to be synchronized;
  • Step S23 includes: obtaining the tensor data to be synchronized from the shared storage space according to the content of the descriptor of the tensor data to be synchronized.
  • the second processor can access the data from the shared storage space.
  • the synchronization instruction may include the storage address of the tensor data to be synchronized.
  • the second processor may parse the instruction to obtain the storage address of the tensor data to be synchronized; according to the storage address of the tensor data, create or modify the descriptor corresponding to the tensor data. According to the content of the descriptor, the second processor can obtain the tensor data to be synchronized from the shared storage space, thereby realizing the entire synchronization process.
  • Fig. 4 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure.
  • the data synchronization method can be applied to the second processor.
  • the data synchronization method includes:
  • step S31 when there is tensor data to be synchronized, a synchronization request instruction is generated, and the synchronization request instruction is used to instruct the first processor to determine the descriptor of the tensor data to be synchronized according to the synchronization request instruction.
  • the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • step S32 the synchronization request instruction is sent to the first processor.
  • a synchronization request instruction can be actively sent to the first processor to obtain the tensor data to be synchronized.
  • the second processor may generate a synchronization request instruction according to the information of the tensor data to be synchronized, for example, the data characteristics of the tensor data to be synchronized.
  • the present disclosure does not limit the specific content of the synchronization request instruction.
  • the first processor may determine the descriptor of the tensor data to be synchronized, and then generate the synchronization instruction.
  • the synchronization request instruction includes the data characteristics of the tensor data to be synchronized, so that the first processor can determine the tensor data to be synchronized.
  • the data characteristics of the tensor data may include information such as the identification, shape, source, and address of the tensor data.
  • Those skilled in the art can set the data characteristics of the tensor data to be synchronized according to the actual situation, which is not limited in the present disclosure.
  • the method further includes:
  • the second processor when it receives a synchronization instruction from the first processor, it can parse the synchronization instruction to obtain synchronization information of the tensor data to be synchronized (such as the identity, shape, storage address of the tensor data, etc.) ).
  • the second processor may internally search for the tensor data corresponding to the identifier of the tensor data and/or the description corresponding to the tensor data According to the content of the descriptor, the tensor data to be synchronized is obtained, so as to realize the synchronization of the tensor data.
  • the second processor may create an instruction to indicate the tensor data to be synchronized according to the shape and storage address of the tensor data. Descriptor, and obtain the tensor data to be synchronized according to the content of the descriptor, thereby achieving synchronization of the tensor data.
  • the synchronization information includes the storage address of the tensor data to be synchronized
  • the step of determining the descriptor of the tensor data to be synchronized according to the synchronization information of the tensor data to be synchronized may include: determining the to be synchronized according to the storage address of the tensor data to be synchronized The identifier of the descriptor of the tensor data and/or the content of the descriptor;
  • the step of obtaining the tensor data to be synchronized according to the descriptor of the tensor data to be synchronized may include: obtaining all data from a shared storage space according to the content of the descriptor of the tensor data to be synchronized Describe the tensor data to be synchronized.
  • the second processor can access the data from the shared storage space.
  • the synchronization instruction may include the storage address of the tensor data to be synchronized.
  • the second processor may parse the instruction to obtain the storage address of the tensor data to be synchronized; according to the storage address of the tensor data, create or modify the descriptor corresponding to the tensor data. According to the content of the descriptor, the second processor can obtain the tensor data to be synchronized from the shared storage space, thereby realizing the entire synchronization process.
  • the identifier and content of the descriptor can be stored in the descriptor storage space, which can be the internal memory of the processor (such as registers, on-chip SRAM or other media cache, etc.) Storage space.
  • the data storage space of the tensor data indicated by the descriptor may be a storage space in the internal memory of the processor (for example, on-chip cache) or an external memory (off-chip memory) connected to the processor.
  • the data address in the data storage space may be an actual physical address or a virtual address.
  • the present disclosure does not limit the location of the descriptor storage space and the data storage space, and the type of data address.
  • the identifier and content of the descriptor and the tensor data indicated by the descriptor can be located in the same area.
  • a continuous area of the on-chip cache can be used to store the relevant content of the descriptor
  • the address is ADDR0-ADDR1023, where the address ADDR0-ADDR31 can be used to store the identifier of the descriptor, the address ADDR32-ADDR63 can be used to store the content of the descriptor, and the address ADDR64-ADDR1023 can be used to store the tensor data indicated by the descriptor.
  • the address ADDR is not limited to one bit or one byte. It is used here to indicate an address and is an address unit. Those skilled in the art can determine the storage area and its address in actual conditions, and the present disclosure does not limit this.
  • the identifier and content of the descriptor and the tensor data indicated by the descriptor can be stored separately in different areas of the internal memory.
  • a register can be used as a descriptor storage space, and the description can be stored in the register.
  • the identifier and content of the symbol use the on-chip cache as the data storage space to store the tensor data indicated by the descriptor.
  • a special register (SR) dedicated to the descriptor can also be set, and the data in the descriptor can be an immediate value or can be obtained from a special register.
  • the number of the register can be used to represent the identifier of the descriptor. For example, when the number of the register is 0, the identifier of the stored descriptor is 0.
  • an area can be allocated in the cache space according to the size of the tensor data indicated by the descriptor (for example, a tensor cache unit is created for each tensor data in the cache) for storing the Tensor data. It should be understood that a preset cache space may also be used to store the tensor data, which is not limited in the present disclosure.
  • the identifier and content of the descriptor can be stored in the internal memory, and the tensor data indicated by the descriptor can be stored in the external memory.
  • a method of storing the identifier and content of the descriptor on the chip, and storing the tensor data indicated by the descriptor off the chip may be adopted.
  • the data address of the data storage space corresponding to the descriptor may be a fixed address.
  • a separate data storage space can be divided for tensor data, and the starting address of each tensor data in the data storage space corresponds to the identifier of the descriptor in a one-to-one correspondence.
  • the processor can determine the data address of the tensor data based on the content of the descriptor.
  • the descriptor may also be used to indicate the address of N-dimensional tensor data, where the descriptor
  • the content of may also include at least one address parameter representing the address of the tensor data.
  • tensor data is three-dimensional data.
  • the content of the descriptor may include an address parameter indicating the address of the tensor data, such as the starting address of the tensor data, and It may include multiple address parameters of the address of the tensor data, such as the start address of the tensor data + address offset, or the address parameters of the tensor data based on each dimension.
  • address parameters such as the start address of the tensor data + address offset, or the address parameters of the tensor data based on each dimension.
  • the address parameter of the tensor data includes a reference address of the data reference point of the descriptor in the data storage space of the tensor data.
  • the reference address can be different according to the change of the data reference point.
  • the present disclosure does not limit the selection of data reference points.
  • the reference address may include the start address of the data storage space.
  • the reference address of the descriptor is the starting address of the data storage space.
  • the reference address of the descriptor is the physical address of the data block in the data storage space.
  • the shape parameter of the tensor data includes at least one of the following: the size of the data storage space of the tensor data in at least one of the N dimensional directions, and the storage area The size in at least one of the N dimensional directions, the offset of the storage area in at least one of the N dimensional directions, and at least two vertices at diagonal positions in the N dimensional directions relative to the data The position of the reference point, the data description position of the tensor data indicated by the descriptor and the mapping relationship between the data address. Among them, the data description position is the mapping position of the point or region in the tensor data indicated by the descriptor.
  • the descriptor can be represented by 3D space coordinates (x, y, z)
  • the shape of the tensor data and the data description position of the tensor data may be the position of a point or area in the three-dimensional space that the tensor data is mapped to, which is represented by three-dimensional space coordinates (x, y, z).
  • Fig. 5 shows a schematic diagram of data storage space of a data synchronization method according to an embodiment of the present disclosure.
  • the data storage space 21 stores a two-dimensional data in a row-first manner, which can be represented by (x, y) (where the X axis goes horizontally to the right and the Y axis goes vertically downwards), and the X axis direction
  • the size (the size of each row) is ori_x (not shown in the figure)
  • the size in the Y-axis direction (the total number of rows)
  • ori_y not shown in the figure
  • the start address of the data storage space 21 is PA_start (reference Address) is the physical address of the first data block 22.
  • the data block 23 is part of the data in the data storage space 21, the offset 25 in the X axis direction is represented as offset_x, the offset 24 in the Y axis direction is represented as offset_y, and the size in the X axis direction is represented Is size_x, and the size in the Y-axis direction is expressed as size_y.
  • the data reference point of the descriptor can use the first data block of the data storage space 21, and the reference address of the descriptor is the start of the data storage space 21.
  • the start address PA_start can then be combined with the size ori_x of the data storage space 21 on the X axis, the size ori_y on the Y axis, and the offset of the data block 23 in the Y axis direction offset_y, the offset amount offset_x in the X axis direction,
  • the size size_x in the X-axis direction and the size size_y in the Y-axis direction determine the content of the descriptor of the data block 23.
  • the descriptor describes a two-dimensional space
  • those skilled in the art can set the dimension represented by the content of the descriptor according to the actual situation, which is not limited in the present disclosure.
  • At least two vertices at diagonal positions in N dimensions relative to the data reference may be based on the reference address of the data reference point of the descriptor in the data storage space. The position of the point determines the content of the descriptor of the tensor data.
  • the reference address PA_base of the data reference point of the descriptor in the data storage space and the position of the two diagonal vertices relative to the data reference point can be used to determine the descriptor value of the data block 23 in FIG. 2 content.
  • one data (for example, the data at position (2, 2)) can be selected as the data reference point in the data storage space 21 ,
  • the physical address of the data in the data storage space is used as the reference address PA_base; then, determine the position of at least two vertices of the diagonal position of the data block 23 relative to the data reference point, for example, using the upper left to lower right direction pair
  • PA_base the upper left corner vertex
  • the relative position (x_min, y_min) and the relative position (x_max, y_max) of the vertex of the lower right corner determine the content of the descriptor of the data block 23.
  • the data reference point of the descriptor may be based on the reference address in the data storage space, and between the data description position of the tensor data indicated by the descriptor and the data address To determine the content of the descriptor of the tensor data.
  • the mapping relationship between the data description location and the data address can be set according to actual needs. For example, when the tensor data indicated by the descriptor is three-dimensional spatial data, the function f(x, y, z) can be used to define The data describes the mapping relationship between the location and the data address.
  • mapping relationship between the data description location and the data address can be set according to the actual situation, which is not limited in the present disclosure.
  • PA2 (x,y) PA_start+(offset_y+y q -1)*ori_x+(offset_x+x q ) (4)
  • the processor can calculate the data address of the tensor data indicated by the descriptor in the data storage space according to the content of the descriptor, and then perform corresponding processing (such as data operation, data synchronization, etc.) according to the address, Therefore, the complexity of data access can be reduced, and the processing efficiency of the processor can be improved.
  • steps in the flowchart are displayed in sequence according to the directions of the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least part of the steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • Fig. 6 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • the data synchronization device is applied to the first processor.
  • the data synchronization device includes:
  • the first information determining module 51 is configured to determine synchronization information of the tensor data according to the descriptor of the tensor data to be synchronized, and the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • the first instruction generation module 52 is configured to generate synchronization instructions according to the synchronization information of the tensor data
  • the first instruction sending module 53 is configured to send the synchronization instruction to the second processor, where the synchronization instruction is used to instruct the second processor to obtain the tensor data to be synchronized according to the synchronization instruction.
  • the synchronization information includes the storage address of the tensor data to be synchronized
  • the first instruction generation module includes: a first generation sub-module for storing the tensor data to be synchronized
  • a synchronization instruction is generated to instruct the second processor to obtain the tensor data to be synchronized from the shared storage space.
  • the synchronization information includes the storage address of the tensor data to be synchronized
  • the first instruction generation module includes: a dump sub-module for storing the storage address of the tensor data to be synchronized When in the non-shared storage space, store the tensor data to be synchronized in the shared storage space; the second generation sub-module is used to generate synchronization according to the address of the tensor data to be synchronized in the shared storage space Instructions to instruct the second processor to obtain the tensor data to be synchronized from the shared storage space.
  • the device further includes: a first descriptor determining module, configured to determine a descriptor of the tensor data to be synchronized according to a synchronization request instruction from the second processor.
  • the synchronization request instruction includes the data characteristics of the tensor data to be synchronized
  • the first descriptor determining module includes: an instruction parsing sub-module for parsing the synchronization request instruction To obtain the data characteristics of the tensor data to be synchronized; the first descriptor determining sub-module is configured to determine the descriptor of the tensor data to be synchronized according to the data characteristics of the tensor data to be synchronized.
  • Fig. 7 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • the data synchronization device is applied to the second processor.
  • the data synchronization device includes:
  • the second information determining module 61 is configured to parse the synchronization instruction from the first processor to obtain synchronization information of the tensor data to be synchronized;
  • the second descriptor determining module 62 is configured to determine the descriptor of the tensor data to be synchronized according to the synchronization information of the tensor data to be synchronized, and the descriptor is used to indicate the tensor data to be synchronized. shape;
  • the first data obtaining module 63 is configured to obtain the tensor data to be synchronized according to the descriptor of the tensor data to be synchronized.
  • the synchronization information includes the storage address of the tensor data to be synchronized
  • the second descriptor determining module includes: a first determining sub-module for The storage address of the tensor data, determining the identifier of the descriptor of the tensor data to be synchronized and/or the content of the descriptor;
  • the first data acquisition module includes: a first data acquisition sub-module configured to acquire the tensor data to be synchronized from a shared storage space according to the content of the descriptor of the tensor data to be synchronized.
  • Fig. 8 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • the data synchronization device is applied to the second processor.
  • the data synchronization device includes:
  • the second instruction generating module 71 is configured to generate a synchronization request instruction when there is tensor data to be synchronized.
  • the synchronization request instruction is used to instruct the first processor to determine the tensor data to be synchronized according to the synchronization request instruction.
  • a descriptor the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • the second instruction sending module 72 is configured to send the synchronization request instruction to the first processor.
  • the synchronization request instruction includes the data characteristics of the tensor data to be synchronized.
  • the device further includes: a third information determining module, configured to parse the synchronization instruction from the first processor to obtain synchronization information of the tensor data to be synchronized; and a third descriptor determining module , Used to determine the descriptor of the tensor data to be synchronized according to the synchronization information of the tensor data to be synchronized; the second data acquisition module, used to determine the descriptor of the tensor data to be synchronized, Obtain the tensor data to be synchronized.
  • a third information determining module configured to parse the synchronization instruction from the first processor to obtain synchronization information of the tensor data to be synchronized
  • a third descriptor determining module Used to determine the descriptor of the tensor data to be synchronized according to the synchronization information of the tensor data to be synchronized
  • the second data acquisition module used to determine the descriptor of the tensor data to be synchronized,
  • the synchronization information includes the storage address of the tensor data to be synchronized
  • the third descriptor determining module includes: a second determining sub-module for The storage address of the tensor data, determining the identifier of the descriptor of the tensor data to be synchronized and/or the content of the descriptor;
  • the second data acquisition module includes: a second data acquisition sub-module configured to acquire the tensor data to be synchronized from a shared storage space according to the content of the descriptor of the tensor data to be synchronized.
  • the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of the units/modules in the foregoing embodiment is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules, or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist.
  • the modules are integrated together.
  • the above-mentioned integrated units/modules can be implemented in the form of hardware or software program modules.
  • the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Static random access memory SRAM Static Random-Access Memory
  • enhanced dynamic random access memory EDRAM Enhanced Dynamic Random Access Memory
  • high-bandwidth memory HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other various media that can store program codes.
  • an artificial intelligence chip is also disclosed, which includes the above-mentioned data synchronization device.
  • a board card which includes a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip; wherein, the artificial intelligence chip and the storage device and the control device And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to implement data transmission between the artificial intelligence chip and external equipment; the control device is used to The state of the artificial intelligence chip is monitored.
  • Fig. 9 shows a structural block diagram of a board card according to an embodiment of the present disclosure.
  • the board card may include other supporting components in addition to the chip 389 described above.
  • the supporting components include, but are not limited to: a storage device 390, Interface device 391 and control device 392;
  • the storage device 390 is connected to the artificial intelligence chip through a bus for storing data.
  • the storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage unit may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips).
  • the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of the storage unit, the theoretical bandwidth of data transmission can reach 25600MB/s.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the artificial intelligence chip.
  • the interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function.
  • the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the artificial intelligence chip.
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the artificial intelligence chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multiple load and light load.
  • the control device can realize the regulation of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
  • an electronic device which includes the aforementioned artificial intelligence chip.
  • Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • a data synchronization method which is applied to a first processor, includes:
  • the method according to clause A1 wherein the synchronization information includes a storage address of the tensor data to be synchronized, and generating a synchronization instruction according to the synchronization information of the tensor data includes:
  • a synchronization instruction is generated according to the storage address of the tensor data to be synchronized to instruct the second processor to obtain all the data from the shared storage space. Describe the tensor data to be synchronized.
  • a synchronization instruction is generated to instruct the second processor to obtain the tensor data to be synchronized from the shared storage space.
  • the descriptor of the tensor data to be synchronized is determined.
  • Clause A5. The method according to clause A4, wherein the synchronization request instruction includes data characteristics of the tensor data to be synchronized, and the synchronization request instruction from the second processor determines the tensor to be synchronized
  • the data descriptor includes: parsing the synchronization request instruction to obtain the data characteristics of the tensor data to be synchronized;
  • the descriptor of the tensor data to be synchronized is determined.
  • a data synchronization method applied to a second processor including:
  • Clause A7 The method according to clause A6, wherein the synchronization information includes a storage address of the tensor data to be synchronized, and the tensor data to be synchronized is determined according to the synchronization information of the tensor data to be synchronized.
  • the descriptor of the volume data includes: determining the identifier of the descriptor of the tensor data to be synchronized and/or the content of the descriptor according to the storage address of the tensor data to be synchronized;
  • the obtaining the tensor data to be synchronized according to the descriptor of the tensor data to be synchronized includes: obtaining the tensor data to be synchronized from a shared storage space according to the content of the descriptor of the tensor data to be synchronized Synchronized tensor data.
  • a data synchronization method applied to a second processor including:
  • a synchronization request instruction is generated.
  • the synchronization request instruction is used to instruct the first processor to determine the descriptor of the tensor data to be synchronized according to the synchronization request instruction, and the descriptor is used to Indicating the shape of the tensor data to be synchronized; sending the synchronization request instruction to the first processor.
  • Clause A11 The method according to clause A10, wherein the synchronization information includes a storage address of the tensor data to be synchronized, and the tensor data to be synchronized is determined according to the synchronization information of the tensor data to be synchronized.
  • the descriptor of the volume data includes: determining the identifier of the descriptor of the tensor data to be synchronized and/or the content of the descriptor according to the storage address of the tensor data to be synchronized;
  • the obtaining the tensor data to be synchronized according to the descriptor of the tensor data to be synchronized includes: obtaining the tensor data to be synchronized from a shared storage space according to the content of the descriptor of the tensor data to be synchronized Synchronized tensor data.
  • a data synchronization device applied to a first processor including:
  • the first information determining module is configured to determine synchronization information of the tensor data according to the descriptor of the tensor data to be synchronized, and the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • the first instruction generation module is configured to generate synchronization instructions according to the synchronization information of the tensor data
  • the first instruction sending module is configured to send the synchronization instruction to a second processor, where the synchronization instruction is used to instruct the second processor to obtain the tensor data to be synchronized according to the synchronization instruction.
  • the synchronization information includes a storage address of the tensor data to be synchronized
  • the first instruction generation module includes: a first generation sub-module for storing the tensor data to be synchronized
  • a synchronization instruction is generated to instruct the second processor to obtain the tensor data to be synchronized from the shared storage space .
  • Clause A14 The device according to clause A12 or clause A13, wherein the synchronization information includes a storage address of the tensor data to be synchronized, and the first instruction generation module includes: a dump sub-module for storing the When the storage address of the quantity data is in the non-shared storage space, storing the tensor data to be synchronized in the shared storage space;
  • the second generation sub-module is configured to generate a synchronization instruction according to the address of the tensor data to be synchronized in the shared storage space to instruct the second processor to obtain the to-be synchronized tensor data from the shared storage space. ⁇ Volume data.
  • Clause A15 The device according to any one of clauses A12 to A14, the device further comprising: a first descriptor determining module, configured to determine the to-be-synchronized device according to a synchronization request instruction from the second processor Descriptor of tensor data.
  • Clause A16 The device according to clause A15, wherein the synchronization request instruction includes the data characteristics of the tensor data to be synchronized, and the first descriptor determining module includes: an instruction parsing sub-module for parsing the synchronization The request instruction obtains the data characteristics of the tensor data to be synchronized; the first descriptor determining sub-module is used to determine the descriptor of the tensor data to be synchronized according to the data characteristics of the tensor data to be synchronized.
  • a data synchronization device applied to a second processor including:
  • the second information determining module is used to parse the synchronization instruction from the first processor to obtain synchronization information of the tensor data to be synchronized;
  • the second descriptor determining module is used to determine the descriptor of the tensor data to be synchronized according to the synchronization information of the tensor data to be synchronized, and the descriptor is used to indicate the shape of the tensor data to be synchronized ;
  • the first data acquisition module is configured to acquire the tensor data to be synchronized according to the descriptor of the tensor data to be synchronized.
  • the synchronization information includes the storage address of the tensor data to be synchronized
  • the second descriptor determining module includes: a first determining sub-module for The storage address of the synchronized tensor data, determining the identifier of the descriptor of the tensor data to be synchronized and/or the content of the descriptor;
  • the first data acquisition module includes: a first data acquisition sub-module configured to acquire the tensor data to be synchronized from a shared storage space according to the content of the descriptor of the tensor data to be synchronized.
  • a data synchronization device applied to a second processor including:
  • the second instruction generation module is configured to generate a synchronization request instruction when there is tensor data to be synchronized, and the synchronization request instruction is used to instruct the first processor to determine the description of the tensor data to be synchronized according to the synchronization request instruction Symbol, the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • the second instruction sending module is configured to send the synchronization request instruction to the first processor.
  • Clause A20 The device according to clause A19, wherein the synchronization request instruction includes data characteristics of the tensor data to be synchronized.
  • Clause A21 The device according to clause A19 or clause A20, wherein the device further includes: a third information determination module, configured to parse the synchronization instruction from the first processor to obtain synchronization information of the tensor data to be synchronized; The three-descriptor determination module is used to determine the descriptor of the tensor data to be synchronized according to the synchronization information of the tensor data to be synchronized; the second data acquisition module is used to determine the descriptor of the tensor data to be synchronized according to the The descriptor of the data to obtain the tensor data to be synchronized.
  • a third information determination module configured to parse the synchronization instruction from the first processor to obtain synchronization information of the tensor data to be synchronized
  • the three-descriptor determination module is used to determine the descriptor of the tensor data to be synchronized according to the synchronization information of the tensor data to be synchronized
  • the synchronization information includes the storage address of the tensor data to be synchronized
  • the third descriptor determining module includes: a second determining sub-module for The storage address of the synchronized tensor data, determining the identifier of the descriptor of the tensor data to be synchronized and/or the content of the descriptor;
  • the second data acquisition module includes: a second data acquisition sub-module configured to acquire the tensor data to be synchronized from a shared storage space according to the content of the descriptor of the tensor data to be synchronized.
  • Clause A24 An electronic device comprising the artificial intelligence chip as described in Clause A23.
  • a board card comprising: a storage device, an interface device, and a control device, and the artificial intelligence chip as described in clause A23; wherein the artificial intelligence chip is related to the storage device and the control device And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to implement data transmission between the artificial intelligence chip and external equipment; the control device is used to The state of the artificial intelligence chip is monitored.
  • the storage device includes multiple sets of storage units, each of which is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
  • the chip includes a DDR controller, which is used to control the data transmission and data storage of each storage unit;
  • the interface device is a standard PCIE interface.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Hardware Redundancy (AREA)

Abstract

本公开涉及一种数据同步方法及装置以及相关产品,所述产品包括控制模块,所述控制模块包括:指令缓存单元、指令处理单元和存储队列单元;所述指令缓存单元,用于存储所述人工神经网络运算关联的计算指令;所述指令处理单元,用于对所述计算指令解析得到多个运算指令;所述存储队列单元,用于存储指令队列,该指令队列包括:按该队列的前后顺序待执行的多个运算指令或计算指令。通过以上方法,本公开可以提高相关产品在进行神经网络模型的运算时的运算效率。

Description

数据同步方法及装置以及相关产品 技术领域
本公开涉及计算机技术领域,尤其涉及一种数据同步方法及装置以及相关产品。
背景技术
随着人工智能技术的不断发展,其应用领域越来越广泛,在图像识别、语音识别、自然语言处理等领域中都得到了良好的应用。然而,随着人工智能算法的复杂度提高,需要处理的数据量和数据维度都在不断增大,通常需要多核和/或多芯片进行数据处理。在进行核间或芯片间的数据同步时,采用相关技术的同步方式的同步开销较大,处理效率较低。
发明内容
有鉴于此,本公开提出了一种数据同步技术方案。
根据本公开的一方面,提供了一种数据同步方法,所述方法应用于第一处理器,包括:根据待同步的张量数据的描述符,确定所述张量数据的同步信息,所述描述符用于指示待同步的张量数据的形状;根据所述张量数据的同步信息,生成同步指令;向第二处理器发送所述同步指令,所述同步指令用于指示所述第二处理器根据所述同步指令获取所述待同步的张量数据。
根据本公开的另一方面,提供了一种数据同步方法,所述方法应用于第二处理器,包括:解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
根据本公开的另一方面,提供了一种数据同步方法,所述方法应用于第二处理器,包括:当存在待同步的张量数据时,生成同步请求指令,所述同步请求指令用于指示第一处理器根据所述同步请求指令确定待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;向所述第一处理器发送所述同步请求指令。
根据本公开的另一方面,提供了一种数据同步装置,所述装置应用于第一处理器,包括:第一信息确定模块,用于根据待同步的张量数据的描述符,确定所述张量数据的同步信息,所述描述符用于指示待同步的张量数据的形状;第一指令生成模块,用于根据所述张量数据的同步信息,生成同步指令;第一指令发送模块,用于向第二处理器发送所述同步指令,所述同步指令用于指示所述第二处理器根据所述同步指令获取所述待同步的张量数据。
根据本公开的另一方面,提供了一种数据同步装置,所述装置应用于第二处理器,包括:第二信息确定模块,用于解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;第二描述符确定模块,用于根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;第一数据获取模块,用于根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
根据本公开的另一方面,提供了一种数据同步装置,所述装置应用于第二处理器,包括:第二指令生成模块,用于当存在待同步的张量数据时,生成同步请求指令,所述同步请求指令用于指示第一处理器根据所述同步请求指令确定待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;第二指令发送模块,用于向所述第一处理器发送所述同步请求指令。
根据本公开的另一方面,提供了一种人工智能芯片,所述芯片包括如上所述的数据同步装置。
根据本公开的另一方面,提供了一种电子设备,所述电子设备包括如上所述的人工智能芯片。
根据本公开的另一方面,提供了一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如上所述的人工智能芯片;其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;所述存储器件,用于存储数据;所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;所述控制器件,用于对所述人工智能芯片的状态进行监控。
根据本公开的实施例,通过设定指示张量数据的形状的描述符,根据描述符确定张量数据的同步信息,根据同步信息生成同步指令并向第二处理器发送同步指令,以指示第二处理器根据同步指令获取待同步的张量数据,从而减少同步开销,提高数据同步的效率。
通过权要中的技术特征进行推导,能够达到对应背景技术中的技术问题的有益效果。根据下面参考附图对示例性实施例的详细说明,本公开的其它特征及方面将变得清楚。
附图说明
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。
图1示出根据本公开实施例的数据同步方法的处理系统的示意图。
图2示出根据本公开实施例的数据同步方法的流程图。
图3示出根据本公开实施例的数据同步方法的流程图。
图4示出根据本公开实施例的数据同步方法的流程图。
图5示出根据本公开实施例的数据同步方法的数据存储空间的示意图。
图6示出根据本公开实施例的数据同步装置的框图。
图7示出根据本公开实施例的数据同步装置的框图。
图8示出根据本公开实施例的数据同步装置的框图。
图9示出根据本公开实施例的板卡的结构框图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
应当理解,本公开的权利要求、说明书及附图中的术语“第一”、“第二”、和“第三”等是用于区别不同对象,而不是用于描述特定顺序。本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本公开说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
根据本公开实施例的数据同步方法可应用于包括多个处理器(多核)的处理系统(例如人工智能芯片)的任意一个处理器中。该处理器可以是通用处理器,例如CPU(Central Processing Unit,中央处理器),也可以是用于执行人工智能运算的人工智能处理器(IPU)。人工智能运算可包括机器学习运算,类脑运算等。其中,机器学习运算包括神经网络运算、k-means运算、支持向量机运算等。该人工智能处理器可例如包括GPU(Graphics Processing Unit,图形处理单元)、NPU(Neural-Network Processing Unit,神经网络处理单元)、DSP(Digital Signal Process,数字信号处理单元)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片中的一种或组合。本公开对处理器的具体类型不作限制。此外,处理系统中的多个处理器的类型可以相同或不同,本公开对此不作限制。
在一种可能的实现方式中,本公开中所提及的处理器可包括多个处理单元,每个处理单元可以独立运行所分配到的各种任务,如:卷积运算任务、池化任务或全连接任务等。本公开对处理单元及处理单元所运行的任务不作限制。
图1示出根据本公开实施例的数据同步方法的处理系统的示意图。如图1所示,处理系统100包括多个处理器101以及存储器102,多个处理器101用于执行指令序列,存储器102用于存储数据,可包括随机存储器(RAM,Random Access Memory)和寄存器堆。处理系统100中的多个处理器101既可共用部分存储空间,例如共用部分RAM存储空间和寄存器堆,又可同时拥有各自的存储空间。
图2示出根据本公开一实施例的数据同步方法的流程图。如图2所示,该方法应用于第一处理器(处理系统中的任意一个处理器),该方法包括:
在步骤S11中:根据待同步的张量数据的描述符,确定所述张量数据的同步信息,所述描述符用于指示待同步的张量数据的形状;
在步骤S12中:根据所述张量数据的同步信息,生成同步指令;
在步骤S13中:向第二处理器发送所述同步指令,所述同步指令用于指示所述第二处理器根据所述同步指令获取所述待同步的张量数据。
举例来说,待同步的数据可包括N维的张量数据(N为大于或等于零的整数,例如N=1、2或3),其中,张量可以包含多种形式的数据组成方式,张量可以是不同维度的,比如标量可以看作是0维张量,向量可以看作1维张量,而矩阵可以是2维或2维以上的张量。张量的形状包括张量的维度、张量各个维度的尺寸等信息。例如对于张量:
Figure PCTCN2020111259-appb-000001
该张量的形状可以被描述符描述为(2,4),也即通过两个参数表示该张量为二维张量,且该张量的第一维度(列)的尺寸为2、第二维度(行)的尺寸为4。需要说明的是,本公开对于描述符指示张量形状的方式并不做限定。在存储器中存储张量数据时,根据其数据地址(或存储区域)无法确定张量数据的形状,进而也无法确定多个张量数据之间相互关系等相关信息,导致处理器对张量数据的存取效率较低,在进行数据同步时的复杂度也较大。
在该情况下,可设定描述符(张量描述符)来指示张量数据(N维的张量数据)的形状。其中,N的取值可根据张量数据的维数(阶数)来确定,也可以根据张量数据的使用需要进行设定。例如,在N的取值为3时,张量数据为三维的张量数据,描述符可用来指示该三维的张量数据在三个维度方向上的形状(例如偏移量、尺寸等)。应当理解,本领域技术人员可以根据实际需要对N的取值进行设置,本公开对此不作限制。
在一种可能的实现方式中,描述符可包括标识和内容等,描述符的标识可用于对描述符进行区分,例如为编号;描述符的内容可包括表示张量数据的形状的至少一个形状参数(例如张量的各个维度方向上的尺寸等),还可以包括表示张量数据的地址的至少一个地址参数(例如数据基准点的基准地址)。本公开对描述符的内容包括的具体参数不作限制。
通过采用描述符来指示张量数据的方式,能够表达张量数据的形状,进而也能够确定多个张量数据之间的相互关系等相关信息,提高对张量数据的存取效率,从而降低数据同步时的复杂度。
在一种可能的实现方式中,在数据处理过程中,可能需要进行多个处理器(例如人工智能芯片的多个核)之间的数据同步,例如将处理器A1的运算结果同步到处理器A2中作为另一项运算的输入数据。在该情况下,可以采用基于描述符的数据同步机制实现数据同步。
在一种可能的实现方式中,第一处理器是数据同步的发送方,第二处理器是数据同步的接收方。当存在待同步的张量数据时,第一处理器可在步骤S11中根据该张量数据的描述符,确定张量数据的同步信息(例如张量数据的标识、形状、来源、存储地址等信息);在步骤S12中根据同步信息生成同步指令,并在步骤S13中向待同步的第二处理器发送该同步指令。其中,第二处理器可包括通用处理 器(例如中央处理器CPU、图形处理器GPU)和专用处理器(例如人工智能处理器、科学计算处理器或数字信号处理器等)。第二处理器可与第一处理器的类型相同或不同,本公开对第二处理器的类型不作限制。
在一种可能的实现方式中,第一处理器可以主动发起对第二处理器的数据同步,例如在第一处理器完成一项运算得到运算结果(张量数据)时,主动发起对需要使用该运算结果的第二处理器的数据同步。在另一个示例中,第一处理器也可以响应于第二处理器的同步请求,发起对第二处理器的数据同步,例如在接收到第二处理器的同步请求指令时,发起对第二处理器的数据同步。本公开对数据同步的发起时机不作限制。
在一种可能的实现方式中,当第一处理器确定存在待同步的张量数据时,可以获取该张量数据的描述符。该描述符可以是已注册(创建)的用于指示该张量数据的形状的描述符,也可以根据该张量数据的形状参数注册(创建)新的描述符,本公开对此不作限制。
在一种可能的实现方式中,根据该张量数据的描述符,可确定该张量数据的同步信息。该同步信息可包括张量数据的标识(例如数据编号)、形状、来源、存储地址等信息中的至少一种。根据该张量数据的同步信息,可生成同步指令。如果第二处理器中已具有该张量数据的信息(例如已注册有指示该待同步的张量数据的描述符),则同步指令可仅包括部分同步信息,例如张量数据的标识,以指示第二处理器根据该张量数据的标识实现张量数据的同步;如果第二处理器中不具有该张量数据的信息,则同步指令可包括更多的同步信息,例如张量数据的标识及存储地址等,以指示第二处理器根据对应的信息实现张量数据的同步。本公开对同步指令包括的具体内容不作限制。
在一种可能的实现方式中,在生成同步指令后,可向第二处理器发送该同步指令,以指示第二处理器根据所述同步指令获取所述待同步的张量数据。如果同步指令包括张量数据的标识,则第二处理器可根据该标识确定待同步的张量数据,并注册或获取指示该待同步的张量数据的描述符,再根据描述符的内容获取描述符所指示的张量数据,从而实现张量数据的同步。如果同步指令包括更多的同步信息(标识及存储地址等),则第二处理器可根据指令中的同步信息注册指示该待同步的张量数据的描述符,并根据描述符的内容直接获取描述符所指示的张量数据,从而实现张量数据的同步。
根据本公开实施例的数据同步方法,通过设定指示张量数据的形状的描述符,根据描述符确定张量数据的同步信息,根据同步信息生成同步指令并向第二处理器发送同步指令,以指示第二处理器根据同步指令获取待同步的张量数据,从而在不改变同步指令的结构的情况下减少同步开销,提高数据同步的效率。
在一种可能的实现方式中,同步信息可包括待同步的张量数据的存储地址。步骤S12可包括:在待同步的张量数据的存储地址处于共用存储空间中时,根据所述待同步的张量数据的存储地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
举例来说,多个处理器(多个核)可以具有共用存储空间,例如第一处理器和第二处理器均可以访问的片下存储器。该共用存储空间可以是多个核(多个处理器)均能够存取数据的存储空间,也可以是部分核(部分处理器)能够存取数据的存储空间,可以预先设定有核间的共用存储空间,本公开对共用存储空间的设定方式不作限制。
在一种可能的实现方式中,可根据待同步的张量数据的描述符的内容,确定出该张量数据的存储地址。如果待同步的张量数据的存储地址处于共用存储空间中,由于第二处理器也可从共用存储空间存取数据,第二处理器根据张量数据的存储地址就可以直接读取张量数据以实现同步。在该情况下,同步指令可包括待同步的张量数据的存储地址,也即,可根据待同步的张量数据的存储地址,生成同步指令。第二处理器接收到同步指令后,可解析指令以得到张量数据的存储地址;根据张量数据的存储地址,第二处理器可注册(创建)待同步的张量数据的描述符,使得描述符的内容与该张量数据的数据地址对应,并从共用存储空间获取所述待同步的张量数据,从而实现整个同步过程。
通过这种方式,可以避免不必要的数据传输,减少张量数据存取次数,提高了同步的处理效率。
在一种可能的实现方式中,同步信息包括待同步的张量数据的存储地址。步骤S12可包括:在待同步的张量数据的存储地址处于非共用存储空间中时,将所述待同步的张量数据存储到共用存储空间;根据所述待同步的张量数据在共用存储空间中的地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
举例来说,第一处理器可以具有非共用存储空间,第一处理器可存取该非共用存储空间中的数据,而第二处理器无法访问到第一处理器的非共用存储空间,不能存取该非共用存储空间中的数据。如果待同步的张量数据的存储地址处于非共用存储空间,则第二处理器无法直接获取该张量数据。在该情况下,第一处理器可将待同步的张量数据转存到共用存储空间,以使第二处理器能够存取该张量数据。在完成转存后,如果第一处理器中未注册指示该待同步的张量数据的描述符,或已注册有指示非共用存储空间中的该张量数据的描述符且该描述符不可修改(例如正在被操作),则第一处理器可生成所述待同步的张量数据的描述符,也即,注册新的描述符,以指示在共用存储空间中的该张量数据。
在一种可能的实现方式中,第一处理器可根据待同步的张量数据在共用存储空间中的地址生成同步指令。第二处理器接收到同步指令后,可解析指令以得到待同步的张量数据的存储地址;根据张量数据的存储地址,第二处理器可注册(创建)待同步的张量数据的描述符,使得描述符的内容与该张量数据的数据地址对应,并从共用存储空间获取所述待同步的张量数据,从而实现整个同步过程。
通过这种方式,可将非共用存储空间中待同步的张量数据主动转存到共用存储空间,以使第二处理器能够获取待同步的张量数据,从而减少同步时处理器间的数据传输,提高同步的处理效率。
在一种可能的实现方式中,所述方法还包括:根据来自第二处理器的同步请求指令,确定所述待同步的张量数据的描述符。
举例来说,第一处理器可响应于第二处理器的同步请求,发起对第二处理器的数据同步。其中,来自第二处理器的同步请求指令可包括待同步的张量数据的信息,例如该待同步的张量数据的数据特征。其中,张量数据的数据特征可包括张量数据的标识、形状、来源、地址等信息,本公开对同步请求指令的具体内容不作限制。根据同步请求指令中的信息,第一处理器可确定待同步的张量数据的描述符,根据描述符确定张量数据的同步信息,进而生成同步指令。
通过这种方式,可以根据第二处理器的同步请求确定待同步的张量数据的描述符,以便生成同步指令,从而避免不必要的数据同步,提高了数据同步的效率。
在一种可能的实现方式中,所述同步请求指令包括所述待同步的张量数据的数据特征,所述根据来自第二处理器的同步请求指令,确定所述待同步的张量数据的描述符的步骤可包括:
解析所述同步请求指令,获得待同步的张量数据的数据特征;
根据待同步的张量数据的数据特征,确定所述待同步的张量数据的描述符。
举例来说,如果第一处理器和第二处理器均具有待同步的张量数据的信息(数据特征),且信息相同或具有对应关系,则同步请求指令中可包括数据特征,例如张量数据的标识。第一处理器可解析来自第二处理器的同步请求指令,得到待同步的张量数据的数据特征。
在一种可能的实现方式中,待同步的张量数据的数据特征可包括张量数据的标识、形状、来源、地址等信息。例如,该张量数据的数据来源为第K个发送方(第K个处理器)、该张量数据的数据来源为编号200的卷积操作的运算结果、该张量数据的地址为特定的地址区域(例如地址ADDR0-ADDR127)、该张量数据的形状为指定的形状(例如20*10的二维张量)等。本领域技术人员可根据实际情况设定待同步的张量数据的数据特征,本公开对此不作限制。
在一种可能的实现方式中,根据该数据特征,第一处理器可查找到待同步的张量数据,并确定该待同步的张量数据的描述符,例如直接获取或新注册对应的描述符。根据该待同步的张量数据的描述符,可确定该张量数据的同步信息,从而生成并发送同步指令,以指示第二处理器实现该张量数据的同步。
通过这种方式,可以根据请求指令中的数据特征确定待同步的张量数据的描述符,以便实现张量 数据的同步,从而在同步时无需传输张量数据本身,减小了传输的数据量和同步开销,提高了处理效率。
图3示出根据本公开实施例的数据同步方法的流程图。该数据同步方法可应用于第二处理器中。如图3所示,该数据同步方法包括:
在步骤S21中,解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
在步骤S22中,根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
在步骤S23中,根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
举例来说,第一处理器(发送方)可以主动发起对第二处理器(接收方)的数据同步,例如在第一处理器完成一项运算得到运算结果(张量数据)时,主动发起对需要使用该运算结果的第二处理器的数据同步。
在一种可能的实现方式中,第二处理器在接收到来自第一处理器的同步指令时,可解析该同步指令,得到待同步的张量数据的同步信息(例如张量数据的标识、形状、存储地址等)。
在一种可能的实现方式中,如果该同步指令仅包括张量数据的标识,则第二处理器可内部查找该张量数据的标识对应的张量数据和/或与张量数据对应的描述符,进而根据描述符内容获取待同步的张量数据,从而实现张量数据的同步。
在一种可能的实现方式中,如果该同步指令包括张量数据的形状和存储地址,则第二处理器可根据张量数据的形状和存储地址,注册指示该待同步的张量数据的描述符,并根据描述符的内容获取待同步的张量数据,从而实现张量数据的同步。
根据本公开实施例的数据同步方法,通过设定指示张量数据的形状的描述符,能够根据同步指令中的待同步的张量数据的同步信息,确定该张量数据的描述符,进而获取该张量数据,实现张量数据的同步,从而减少同步开销,降低数据同步的复杂度,提高数据同步的效率。
在一种可能的实现方式中,所述同步信息包括所述待同步的张量数据的存储地址,
步骤S22包括:根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
步骤S23包括:根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
举例来说,如果待同步的张量数据的存储地址处于共用存储空间中,则第二处理器可从共用存储空间存取数据。在该情况下,同步指令可包括待同步的张量数据的存储地址。第二处理器接收到同步指令后,可解析指令以得到待同步的张量数据的存储地址;根据该张量数据的存储地址,创建或修改与该张量数据对应的描述符。根据该描述符的内容,第二处理器可从共用存储空间获取待同步的张量数据,从而实现整个同步过程。
通过这种方式,可以避免不必要的数据传输,减少张量数据存取次数,提高了同步的处理效率,并且实现了指令传递及处理过程中的指令兼容。
图4示出根据本公开实施例的数据同步方法的流程图。该数据同步方法可应用于第二处理器中。如图4所示,该数据同步方法包括:
在步骤S31中,当存在待同步的张量数据时,生成同步请求指令,所述同步请求指令用于指示第一处理器根据所述同步请求指令确定待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
在步骤S32中,向所述第一处理器发送所述同步请求指令。
举例来说,当第二处理器中存在待同步的张量数据时,可以主动向第一处理器发送同步请求指令,以便获取该待同步的张量数据。第二处理器可根据待同步的张量数据的信息,例如该待同步的张量数 据的数据特征,生成同步请求指令。本公开对同步请求指令的具体内容不作限制。根据同步请求指令中的信息,第一处理器可确定待同步的张量数据的描述符,进而生成同步指令。
通过这种方式,可以在需要同步时主动发起同步请求,提高数据同步的效率。
在一种可能的实现方式中,所述同步请求指令包括所述待同步的张量数据的数据特征,以使第一处理器能够确定待同步的张量数据。其中,张量数据的数据特征可包括张量数据的标识、形状、来源、地址等信息。本领域技术人员可根据实际情况设定待同步的张量数据的数据特征,本公开对此不作限制。
在一种可能的实现方式中,所述方法还包括:
解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符;
根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
举例来说,第二处理器在接收到来自第一处理器的同步指令时,可解析该同步指令,得到待同步的张量数据的同步信息(例如张量数据的标识、形状、存储地址等)。
在一种可能的实现方式中,如果该同步指令仅包括张量数据的标识,则第二处理器可内部查找该张量数据的标识对应的张量数据和/或与张量数据对应的描述符,进而根据描述符内容获取待同步的张量数据,从而实现张量数据的同步。
在一种可能的实现方式中,如果该同步指令包括张量数据的形状和存储地址等,则第二处理器可根据张量数据的形状和存储地址,创建指示该待同步的张量数据的描述符,并根据描述符的内容获取待同步的张量数据,从而实现张量数据的同步。
通过这种方式,可以降低数据同步的复杂度,提高数据同步的效率。
在一种可能的实现方式中,所述同步信息包括所述待同步的张量数据的存储地址,
所述根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符的步骤可包括:根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
所述根据所述待同步的张量数据的描述符,获取所述待同步的张量数据的步骤可包括:根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
举例来说,如果待同步的张量数据的存储地址处于共用存储空间中,则第二处理器可从共用存储空间存取数据。在该情况下,同步指令可包括待同步的张量数据的存储地址。第二处理器接收到同步指令后,可解析指令以得到待同步的张量数据的存储地址;根据该张量数据的存储地址,创建或修改与该张量数据对应的描述符。根据该描述符的内容,第二处理器可从共用存储空间获取待同步的张量数据,从而实现整个同步过程。
通过这种方式,可以避免不必要的数据传输,减少张量数据存取次数,提高了同步的处理效率,并且实现了指令传递及处理过程中的指令兼容。
在一种可能的实现方式中,描述符的标识和内容可存储在描述符存储空间中,该描述符存储空间可以为处理器的内部存储器(例如寄存器、片上的SRAM或其他介质缓存等)中的存储空间。描述符所指示的张量数据的数据存储空间可为处理器的内部存储器(例如片上缓存)或与处理器连接的外部存储器(片下存储器)中的存储空间。数据存储空间中的数据地址可以为实际的物理地址或虚拟地址。本公开对描述符存储空间及数据存储空间的位置以及数据地址的类型不作限制。
在一种可能的实现方式中,描述符的标识、内容以及描述符所指示的张量数据可以位于同一块区域,例如,可使用片上缓存的一块连续区域存储描述符的相关内容,其地址为ADDR0-ADDR1023,其中,地址ADDR0-ADDR31可用于存储描述符的标识,地址ADDR32-ADDR63可用于存储描述符的内容,地址ADDR64-ADDR1023可用于存储描述符指示的张量数据。其中,地址ADDR并不限于1位或一个字节,此处用来表示一个地址,是一个地址单位。本领域技术人员可以实际情况确定存储区域 及其地址,本公开对此不作限制。
在一种可能的实现方式中,描述符的标识、内容以及描述符所指示的张量数据可以分开存储在内部存储器的不同区域,例如,可以将寄存器作为描述符存储空间,在寄存器中存储描述符的标识及内容,将片上缓存作为数据存储空间,存储描述符所指示的张量数据。
在一种可能的实现方式中,还可以设置专门供描述符使用的专用寄存器(SR),描述符中的数据可以是立即数也可以从专用寄存器中获取。在使用寄存器存储描述符的标识和内容时,可以使用寄存器的编号来表示描述符的标识,例如,寄存器的编号为0时,其存储的描述符的标识为0。当寄存器中的描述符有效时,可根据描述符所指示的张量数据的大小在缓存空间中分配一块区域(例如在缓存中为每个张量数据创建一个张量缓存单元)用于存储该张量数据。应当理解,也可以采用预设的缓存空间存储该张量数据,本公开对此不作限制。
在一种可能的实现方式中,描述符的标识及内容可存储在内部存储器,描述符所指示的张量数据可存储在外部存储器。例如,可以采用在片上存储描述符的标识及内容、在片下存储描述符所指示的张量数据的方式。
在一种可能的实现方式中,与描述符对应的数据存储空间的数据地址可以是固定地址。例如,可以为张量数据划分单独的数据存储空间,每个张量数据在数据存储空间的起始地址与描述符的标识一一对应。在这种情况下,处理器根据描述符的内容即可确定张量数据的数据地址。
在一种可能的实现方式中,在与描述符对应的数据存储空间的数据地址为可变地址时,所述描述符还可用于指示N维的张量数据的地址,其中,所述描述符的内容还可包括表示张量数据的地址的至少一个地址参数。例如,张量数据为3维数据,在描述符指向该张量数据的地址时,描述符的内容可包括表示该张量数据的地址的一个地址参数,例如张量数据的起始地址,也可以包括该张量数据的地址的多个地址参数,例如张量数据的起始地址+地址偏移量,或张量数据基于各维度的地址参数。本领域技术人员可以根据实际需要对地址参数进行设置,本公开对此不作限制。
在一种可能的实现方式中,所述张量数据的地址参数包括所述描述符的数据基准点在所述张量数据的数据存储空间中的基准地址。其中,基准地址可根据数据基准点的变化而不同。本公开对数据基准点的选取不作限制。
在一种可能的实现方式中,所述基准地址可包括所述数据存储空间的起始地址。在描述符的数据基准点是数据存储空间的第一个数据块时,描述符的基准地址即为数据存储空间的起始地址。在描述符的数据基准点是数据存储空间中第一个数据块以外的其他数据时,描述符的基准地址即为该数据块在数据存储空间中的物理地址。
在一种可能的实现方式中,所述张量数据的形状参数包括以下至少一种:所述张量数据的数据存储空间在N个维度方向的至少一个方向上的尺寸、所述存储区域在N个维度方向的至少一个方向上的尺寸、所述存储区域在N个维度方向的至少一个方向上的偏移量、处于N个维度方向的对角位置的至少两个顶点相对于所述数据基准点的位置、所述描述符所指示的张量数据的数据描述位置与数据地址之间的映射关系。其中,数据描述位置是描述符所指示的张量数据中的点或区域的映射位置,例如,张量数据为3维数据时,描述符可使用三维空间坐标(x,y,z)来表示该张量数据的形状,该张量数据的数据描述位置可以是使用三维空间坐标(x,y,z)表示的、该张量数据映射在三维空间中的点或区域的位置。
应当理解,本领域技术人员可以根据实际情况选择表示张量数据的形状参数,本公开对此不作限制。
图5示出根据本公开实施例的数据同步方法的数据存储空间的示意图。如图5所示,数据存储空间21采用行优先的方式存储了一个二维数据,可通过(x,y)来表示(其中,X轴水平向右,Y轴垂直向下),X轴方向上的尺寸(每行的尺寸)为ori_x(图中未示出),Y轴方向上的尺寸(总行数)为ori_y(图中未示出),数据存储空间21的起始地址PA_start(基准地址)为第一个数据块22的物理地址。数据块23是数据存储空间21中的部分数据,其在X轴方向上的偏移量25表示为offset_x,在Y轴方向上的偏移量24表示为offset_y,在X轴方向上的尺寸表示为size_x,在Y轴方向上的尺寸表示为size_y。
在一种可能的实现方式中,使用描述符来定义数据块23时,描述符的数据基准点可使用数据存储空间21的第一个数据块,描述符的基准地址为数据存储空间21的起始地址PA_start,然后可以结合数据存储空间21在X轴的尺寸ori_x、在Y轴上的尺寸ori_y,以及数据块23在Y轴方向的偏移量offset_y、X轴方向上的偏移量offset_x、X轴方向上的尺寸size_x以及Y轴方向上的尺寸size_y来确定数据块23的描述符的内容。
在一种可能的实现方式中,可以使用下述公式(1)来表示描述符的内容:
Figure PCTCN2020111259-appb-000002
应当理解,虽然上述示例中,描述符描述的是二维空间,但本领域技术人员可以根据实际情况对描述符的内容表示的维度进行设置,本公开对此不作限制。
在一种可能的实现方式中,可根据所述描述符的数据基准点在所述数据存储空间中的基准地址、处于N个维度方向的对角位置的至少两个顶点相对于所述数据基准点的位置,确定所述张量数据的描述符的内容。
举例来说,可以使用描述符的数据基准点在数据存储空间中的基准地址PA_base,以及对角位置的两个顶点相对于数据基准点的位置,确定出图2中数据块23的描述符的内容。首先,确定描述符的数据基准点以及其在数据存储空间中的基准地址PA_base,例如,可以在数据存储空间21中选取一个数据(例如,位置为(2,2)的数据)作为数据基准点,将该数据在数据存储空间中的物理地址作为基准地址PA_base;然后,确定数据块23的对角位置的至少两个顶点相对于数据基准点的位置,例如,使用左上至右下方向的对角位置顶点相对于数据基准点的位置,其中,左上角顶点的相对位置为(x_min,y_min),右下角顶点的相对位置为(x_max,y_max),然后可以根据基准地址PA_base、左上角顶点的相对位置(x_min,y_min)以及右下角顶点的相对位置(x_max,y_max)确定出数据块23的描述符的内容。
在一种可能的实现方式中,可以使用下述公式(2)来表示描述符的内容:
Figure PCTCN2020111259-appb-000003
应当理解,虽然上述示例中使用左上角和右下角两个顶点来确定描述符的内容,但本领域技术人员可以根据实际需要对至少两个顶点的具体顶点进行设置,本公开对此不作限制。
在一种可能的实现方式中,可根据所述描述符的数据基准点在所述数据存储空间中的基准地址,以及所述描述符所指示的张量数据的数据描述位置与数据地址之间的映射关系,确定所述张量数据的描述符的内容。其中,数据描述位置与数据地址之间的映射关系可以根据实际需要进行设定,例如,描述符所指示的张量数据为三维空间数据时,可是使用函数f(x,y,z)来定义数据描述位置与数据地址之间的映射关系。
在一种可能的实现方式中,可以使用下述公式(3)来表示描述符的内容:
Figure PCTCN2020111259-appb-000004
应当理解,本领域技术人员可以根据实际情况对数据描述位置与数据地址之间的映射关系进行设定,本公开对此不作限制。
在采用公式(1)表示描述符的内容的情况下,对于张量数据中的任意一个数据点,设其数据描述位置为(x q,y q),那么,该数据点在数据存储空间中的数据地址PA2 (x,y)可以使用下述公式(4)来确定:
PA2 (x,y)=PA_start+(offset_y+y q-1)*ori_x+(offset_x+x q)  (4)
通过这种方式,处理器可以根据描述符的内容计算出描述符所指示的张量数据在数据存储空间中 的数据地址,进而根据该地址执行对应的处理(例如数据运算、数据同步等),从而可降低数据存取的复杂度,提高处理器的处理效率。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。
进一步需要说明的是,虽然流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。
图6示出根据本公开实施例的数据同步装置的框图。该数据同步装置应用于第一处理器,如图6所示,该数据同步装置包括:
第一信息确定模块51,用于根据待同步的张量数据的描述符,确定所述张量数据的同步信息,所述描述符用于指示待同步的张量数据的形状;
第一指令生成模块52,用于根据所述张量数据的同步信息,生成同步指令;
第一指令发送模块53,用于向第二处理器发送所述同步指令,所述同步指令用于指示所述第二处理器根据所述同步指令获取所述待同步的张量数据。
在一种可能的实现方式中,所述同步信息包括待同步的张量数据的存储地址,所述第一指令生成模块包括:第一生成子模块,用于在待同步的张量数据的存储地址处于共用存储空间中时,根据所述待同步的张量数据的存储地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
在一种可能的实现方式中,所述同步信息包括待同步的张量数据的存储地址,所述第一指令生成模块包括:转存子模块,用于在待同步的张量数据的存储地址处于非共用存储空间中时,将所述待同步的张量数据存储到共用存储空间;第二生成子模块,用于根据所述待同步的张量数据在共用存储空间中的地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
在一种可能的实现方式中,所述装置还包括:第一描述符确定模块,用于根据来自第二处理器的同步请求指令,确定所述待同步的张量数据的描述符。
在一种可能的实现方式中,所述同步请求指令包括所述待同步的张量数据的数据特征,所述第一描述符确定模块包括:指令解析子模块,用于解析所述同步请求指令,获得待同步的张量数据的数据特征;第一描述符确定子模块,用于根据待同步的张量数据的数据特征,确定所述待同步的张量数据的描述符。
图7示出根据本公开实施例的数据同步装置的框图。该数据同步装置应用于第二处理器,如图7所示,该数据同步装置包括:
第二信息确定模块61,用于解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
第二描述符确定模块62,用于根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
第一数据获取模块63,用于根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
在一种可能的实现方式中,所述同步信息包括所述待同步的张量数据的存储地址,所述第二描述符确定模块包括:第一确定子模块,用于根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
所述第一数据获取模块包括:第一数据获取子模块,用于根据所述待同步的张量数据的描述符的 内容,从共用存储空间获取所述待同步的张量数据。
图8示出根据本公开实施例的数据同步装置的框图。该数据同步装置应用于第二处理器,如图8所示,该数据同步装置包括:
第二指令生成模块71,用于当存在待同步的张量数据时,生成同步请求指令,所述同步请求指令用于指示第一处理器根据所述同步请求指令确定待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
第二指令发送模块72,用于向所述第一处理器发送所述同步请求指令。
在一种可能的实现方式中,所述同步请求指令包括所述待同步的张量数据的数据特征。
在一种可能的实现方式中,所述装置还包括:第三信息确定模块,用于解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;第三描述符确定模块,用于根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符;第二数据获取模块,用于根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
在一种可能的实现方式中,所述同步信息包括所述待同步的张量数据的存储地址,所述第三描述符确定模块包括:第二确定子模块,用于根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
所述第二数据获取模块包括:第二数据获取子模块,用于根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
应该理解,上述的装置实施例仅是示意性的,本公开的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。
另外,若无特别说明,在本公开各个实施例中的各功能单元/模块可以集成在一个单元/模块中,也可以是各个单元/模块单独物理存在,也可以两个或两个以上单元/模块集成在一起。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。若无特别说明,所述人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,所述存储单元可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)、混合存储立方HMC(Hybrid Memory Cube)等等。
所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
在一种可能的实现方式中,还公开了一种人工智能芯片,其包括了上述数据同步装置。
在一种可能的实现方式中,还公开了一种板卡,其包括存储器件、接口装置和控制器件以及上述人工智能芯片;其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;所述存储器件,用于存储数据;所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;所述控制器件,用于对所述人工智能芯片的状态进行监控。
图9示出根据本公开实施例的板卡的结构框图,参阅图9,上述板卡除了包括上述芯片389以外, 还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392;
所述存储器件390与所述人工智能芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元393。每一组所述存储单元与所述人工智能芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述人工智能芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组所述存储单元中采用DDR4-3200颗粒时,数据传输的理论带宽可达到25600MB/s。
在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。
所述接口装置与所述人工智能芯片电连接。所述接口装置用于实现所述人工智能芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。优选的,当采用PCIE3.0X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,所述接口装置还可以是其他的接口,本公开并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述人工智能芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。
所述控制器件与所述人工智能芯片电连接。所述控制器件用于对所述人工智能芯片的状态进行监控。具体的,所述人工智能芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述人工智能芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述人工智能芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述人工智能芯片中多个处理芯片、多个处理和或多个处理电路的工作状态的调控。
在一种可能的实现方式中,公开了一种电子设备,其包括了上述人工智能芯片。电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。
所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
依据以下条款可更好地理解前述内容:
条款A1、一种数据同步方法,所述方法应用于第一处理器,包括:
根据待同步的张量数据的描述符,确定所述张量数据的同步信息,所述描述符用于指示待同步的张量数据的形状;
根据所述张量数据的同步信息,生成同步指令;
向第二处理器发送所述同步指令,所述同步指令用于指示所述第二处理器根据所述同步指令获取所述待同步的张量数据。
条款A2、根据条款A1所述的方法,所述同步信息包括待同步的张量数据的存储地址,所述根据所述张量数据的同步信息,生成同步指令,包括:
在待同步的张量数据的存储地址处于共用存储空间中时,根据所述待同步的张量数据的存储地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
条款A3、根据条款A1或条款A2所述的方法,所述同步信息包括待同步的张量数据的存储地址,所述根据所述张量数据的同步信息,生成同步指令,包括:
在待同步的张量数据的存储地址处于非共用存储空间中时,将所述待同步的张量数据存储到共用存储空间;
根据所述待同步的张量数据在共用存储空间中的地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
条款A4、根据条款A1-条款A3中任意一项所述的方法,所述方法还包括:
根据来自第二处理器的同步请求指令,确定所述待同步的张量数据的描述符。
条款A5、根据条款A4所述的方法,所述同步请求指令包括所述待同步的张量数据的数据特征,所述根据来自第二处理器的同步请求指令,确定所述待同步的张量数据的描述符,包括:解析所述同步请求指令,获得待同步的张量数据的数据特征;
根据待同步的张量数据的数据特征,确定所述待同步的张量数据的描述符。
条款A6、一种数据同步方法,所述方法应用于第二处理器,包括:
解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
条款A7、根据条款A6所述的方法,所述同步信息包括所述待同步的张量数据的存储地址,所述根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,包括:根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
所述根据所述待同步的张量数据的描述符,获取所述待同步的张量数据,包括:根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
条款A8、一种数据同步方法,所述方法应用于第二处理器,包括:
当存在待同步的张量数据时,生成同步请求指令,所述同步请求指令用于指示第一处理器根据所述同步请求指令确定待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;向所述第一处理器发送所述同步请求指令。
条款A9、根据条款A8所述的方法,所述同步请求指令包括所述待同步的张量数据的数据特征。
条款A10、根据条款A8或条款A9所述的方法,所述方法还包括:
解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符;
根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
条款A11、根据条款A10所述的方法,所述同步信息包括所述待同步的张量数据的存储地址,所述根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,包括:根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
所述根据所述待同步的张量数据的描述符,获取所述待同步的张量数据,包括:根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
条款A12、一种数据同步装置,所述装置应用于第一处理器,包括:
第一信息确定模块,用于根据待同步的张量数据的描述符,确定所述张量数据的同步信息,所述描述符用于指示待同步的张量数据的形状;
第一指令生成模块,用于根据所述张量数据的同步信息,生成同步指令;
第一指令发送模块,用于向第二处理器发送所述同步指令,所述同步指令用于指示所述第二处理器根据所述同步指令获取所述待同步的张量数据。
条款A13、根据条款A12所述的装置,所述同步信息包括待同步的张量数据的存储地址,所述第一指令生成模块包括:第一生成子模块,用于在待同步的张量数据的存储地址处于共用存储空间中时,根据所述待同步的张量数据的存储地址,生成同步指令,以指示所述第二处理器从所述共用存储空间 获取所述待同步的张量数据。
条款A14、根据条款A12或条款A13所述的装置,所述同步信息包括待同步的张量数据的存储地址,所述第一指令生成模块包括:转存子模块,用于在待同步的张量数据的存储地址处于非共用存储空间中时,将所述待同步的张量数据存储到共用存储空间;
第二生成子模块,用于根据所述待同步的张量数据在共用存储空间中的地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
条款A15、根据条款A12-条款A14中任意一项所述的装置,所述装置还包括:第一描述符确定模块,用于根据来自第二处理器的同步请求指令,确定所述待同步的张量数据的描述符。
条款A16、根据条款A15所述的装置,所述同步请求指令包括所述待同步的张量数据的数据特征,所述第一描述符确定模块包括:指令解析子模块,用于解析所述同步请求指令,获得待同步的张量数据的数据特征;第一描述符确定子模块,用于根据待同步的张量数据的数据特征,确定所述待同步的张量数据的描述符。
条款A17、一种数据同步装置,所述装置应用于第二处理器,包括:
第二信息确定模块,用于解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
第二描述符确定模块,用于根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
第一数据获取模块,用于根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
条款A18、根据条款A17所述的装置,所述同步信息包括所述待同步的张量数据的存储地址,所述第二描述符确定模块包括:第一确定子模块,用于根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
所述第一数据获取模块包括:第一数据获取子模块,用于根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
条款A19、一种数据同步装置,所述装置应用于第二处理器,包括:
第二指令生成模块,用于当存在待同步的张量数据时,生成同步请求指令,所述同步请求指令用于指示第一处理器根据所述同步请求指令确定待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
第二指令发送模块,用于向所述第一处理器发送所述同步请求指令。
条款A20、根据条款A19所述的装置,所述同步请求指令包括所述待同步的张量数据的数据特征。
条款A21、根据条款A19或条款A20所述的装置,所述装置还包括:第三信息确定模块,用于解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;第三描述符确定模块,用于根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符;第二数据获取模块,用于根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
条款A22、根据条款A21所述的装置,所述同步信息包括所述待同步的张量数据的存储地址,所述第三描述符确定模块包括:第二确定子模块,用于根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
所述第二数据获取模块包括:第二数据获取子模块,用于根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
条款A23、一种人工智能芯片,所述芯片包括如条款A12-条款A22中任意一项所述的数据同步装置。
条款A24、一种电子设备,所述电子设备包括如条款A23所述的人工智能芯片。
条款A25、一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如条款A23所述的人工智能芯片;其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;所述存储器件,用于存储数据;所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;所述控制器件,用于对所述人工智能芯片的状态进行监控。
条款A26、根据条款A25所述的板卡,所述存储器件包括:多组存储单元,每一组所述存储单元 与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;所述接口装置为:标准PCIE接口。
以上对本公开实施例进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明仅用于帮助理解本公开的方法及其核心思想。同时,本领域技术人员依据本公开的思想,基于本公开的具体实施方式及应用范围上做出的改变或变形之处,都属于本公开保护的范围。综上所述,本说明书内容不应理解为对本公开的限制。

Claims (26)

  1. 一种数据同步方法,其特征在于,所述方法应用于第一处理器,包括:
    根据待同步的张量数据的描述符,确定所述张量数据的同步信息,所述描述符用于指示待同步的张量数据的形状;
    根据所述张量数据的同步信息,生成同步指令;
    向第二处理器发送所述同步指令,所述同步指令用于指示所述第二处理器根据所述同步指令获取所述待同步的张量数据。
  2. 根据权利要求1所述的方法,其特征在于,所述同步信息包括待同步的张量数据的存储地址,
    所述根据所述张量数据的同步信息,生成同步指令,包括:
    在待同步的张量数据的存储地址处于共用存储空间中时,根据所述待同步的张量数据的存储地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
  3. 根据权利要求1或2所述的方法,其特征在于,所述同步信息包括待同步的张量数据的存储地址,
    所述根据所述张量数据的同步信息,生成同步指令,包括:
    在待同步的张量数据的存储地址处于非共用存储空间中时,将所述待同步的张量数据存储到共用存储空间;
    根据所述待同步的张量数据在共用存储空间中的地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
  4. 根据权利要求1-3中任意一项所述的方法,其特征在于,所述方法还包括:
    根据来自第二处理器的同步请求指令,确定所述待同步的张量数据的描述符。
  5. 根据权利要求4所述的方法,其特征在于,所述同步请求指令包括所述待同步的张量数据的数据特征,
    所述根据来自第二处理器的同步请求指令,确定所述待同步的张量数据的描述符,包括:
    解析所述同步请求指令,获得待同步的张量数据的数据特征;
    根据待同步的张量数据的数据特征,确定所述待同步的张量数据的描述符。
  6. 一种数据同步方法,其特征在于,所述方法应用于第二处理器,包括:
    解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
    根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
    根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
  7. 根据权利要求6所述的方法,其特征在于,所述同步信息包括所述待同步的张量数据的存储地址,
    所述根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,包括:
    根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
    所述根据所述待同步的张量数据的描述符,获取所述待同步的张量数据,包括:
    根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
  8. 一种数据同步方法,其特征在于,所述方法应用于第二处理器,包括:
    当存在待同步的张量数据时,生成同步请求指令,所述同步请求指令用于指示第一处理器根据所 述同步请求指令确定待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
    向所述第一处理器发送所述同步请求指令。
  9. 根据权利要求8所述的方法,其特征在于,所述同步请求指令包括所述待同步的张量数据的数据特征。
  10. 根据权利要求8或9所述的方法,其特征在于,所述方法还包括:
    解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
    根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符;
    根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
  11. 根据权利要求10所述的方法,其特征在于,所述同步信息包括所述待同步的张量数据的存储地址,
    所述根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,包括:
    根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
    所述根据所述待同步的张量数据的描述符,获取所述待同步的张量数据,包括:
    根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
  12. 一种数据同步装置,其特征在于,所述装置应用于第一处理器,包括:
    第一信息确定模块,用于根据待同步的张量数据的描述符,确定所述张量数据的同步信息,所述描述符用于指示待同步的张量数据的形状;
    第一指令生成模块,用于根据所述张量数据的同步信息,生成同步指令;
    第一指令发送模块,用于向第二处理器发送所述同步指令,所述同步指令用于指示所述第二处理器根据所述同步指令获取所述待同步的张量数据。
  13. 根据权利要求12所述的装置,其特征在于,所述同步信息包括待同步的张量数据的存储地址,所述第一指令生成模块包括:
    第一生成子模块,用于在待同步的张量数据的存储地址处于共用存储空间中时,根据所述待同步的张量数据的存储地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
  14. 根据权利要求12或13所述的装置,其特征在于,所述同步信息包括待同步的张量数据的存储地址,所述第一指令生成模块包括:
    转存子模块,用于在待同步的张量数据的存储地址处于非共用存储空间中时,将所述待同步的张量数据存储到共用存储空间;
    第二生成子模块,用于根据所述待同步的张量数据在共用存储空间中的地址,生成同步指令,以指示所述第二处理器从所述共用存储空间获取所述待同步的张量数据。
  15. 根据权利要求12-14中任意一项所述的装置,其特征在于,所述装置还包括:
    第一描述符确定模块,用于根据来自第二处理器的同步请求指令,确定所述待同步的张量数据的描述符。
  16. 根据权利要求15所述的装置,其特征在于,所述同步请求指令包括所述待同步的张量数据的数据特征,所述第一描述符确定模块包括:
    指令解析子模块,用于解析所述同步请求指令,获得待同步的张量数据的数据特征;
    第一描述符确定子模块,用于根据待同步的张量数据的数据特征,确定所述待同步的张量数据的描述符。
  17. 一种数据同步装置,其特征在于,所述装置应用于第二处理器,包括:
    第二信息确定模块,用于解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
    第二描述符确定模块,用于根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
    第一数据获取模块,用于根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
  18. 根据权利要求17所述的装置,其特征在于,所述同步信息包括所述待同步的张量数据的存储地址,所述第二描述符确定模块包括:
    第一确定子模块,用于根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
    所述第一数据获取模块包括:
    第一数据获取子模块,用于根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
  19. 一种数据同步装置,其特征在于,所述装置应用于第二处理器,包括:
    第二指令生成模块,用于当存在待同步的张量数据时,生成同步请求指令,所述同步请求指令用于指示第一处理器根据所述同步请求指令确定待同步的张量数据的描述符,所述描述符用于指示待同步的张量数据的形状;
    第二指令发送模块,用于向所述第一处理器发送所述同步请求指令。
  20. 根据权利要求19所述的装置,其特征在于,所述同步请求指令包括所述待同步的张量数据的数据特征。
  21. 根据权利要求19或20所述的装置,其特征在于,所述装置还包括:
    第三信息确定模块,用于解析来自第一处理器的同步指令,得到待同步的张量数据的同步信息;
    第三描述符确定模块,用于根据所述待同步的张量数据的同步信息,确定所述待同步的张量数据的描述符;
    第二数据获取模块,用于根据所述待同步的张量数据的描述符,获取所述待同步的张量数据。
  22. 根据权利要求21所述的装置,其特征在于,所述同步信息包括所述待同步的张量数据的存储地址,所述第三描述符确定模块包括:
    第二确定子模块,用于根据所述待同步的张量数据的存储地址,确定所述待同步的张量数据的描述符的标识和/或所述描述符的内容;
    所述第二数据获取模块包括:
    第二数据获取子模块,用于根据所述待同步的张量数据的描述符的内容,从共用存储空间获取所述待同步的张量数据。
  23. 一种人工智能芯片,其特征在于,所述芯片包括如权利要求12-22中任意一项所述的数据同步装置。
  24. 一种电子设备,其特征在于,所述电子设备包括如权利要求23所述的人工智能芯片。
  25. 一种板卡,其特征在于,所述板卡包括:存储器件、接口装置和控制器件以及如权利要求23所述的人工智能芯片;
    其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;
    所述存储器件,用于存储数据;
    所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;
    所述控制器件,用于对所述人工智能芯片的状态进行监控。
  26. 根据权利要求25所述的板卡,其特征在于,
    所述存储器件包括:多组存储单元,每一组所述存储单元与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;
    所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;
    所述接口装置为:标准PCIE接口。
PCT/CN2020/111259 2019-07-30 2020-08-26 数据同步方法及装置以及相关产品 WO2021018313A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910694672.X 2019-07-30
CN201910694672.XA CN112306945B (zh) 2019-07-30 2019-07-30 数据同步方法及装置以及相关产品

Publications (1)

Publication Number Publication Date
WO2021018313A1 true WO2021018313A1 (zh) 2021-02-04

Family

ID=74229236

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111259 WO2021018313A1 (zh) 2019-07-30 2020-08-26 数据同步方法及装置以及相关产品

Country Status (2)

Country Link
CN (1) CN112306945B (zh)
WO (1) WO2021018313A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114489790A (zh) * 2020-11-13 2022-05-13 中科寒武纪科技股份有限公司 数据处理装置、数据处理方法及相关产品

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104823164A (zh) * 2012-12-06 2015-08-05 相干逻辑公司 具有同步指令的处理系统
CN104967658A (zh) * 2015-05-08 2015-10-07 成都品果科技有限公司 一种多终端设备上的数据同步方法
CN108076126A (zh) * 2016-11-18 2018-05-25 中兴通讯股份有限公司 一种数据同步方法及服务器
CN109977169A (zh) * 2019-03-19 2019-07-05 广州品唯软件有限公司 数据同步方法、装置、计算机可读存储介质及系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7461180B2 (en) * 2006-05-08 2008-12-02 Cisco Technology, Inc. Method and apparatus for synchronizing use of buffer descriptor entries for shared data packets in memory
US9785565B2 (en) * 2014-06-30 2017-10-10 Microunity Systems Engineering, Inc. System and methods for expandably wide processor instructions
CN106302238A (zh) * 2015-05-13 2017-01-04 深圳市中兴微电子技术有限公司 一种队列管理方法及装置
CN107103004B (zh) * 2016-02-23 2020-11-06 创新先进技术有限公司 网页中的数据处理方法、装置及系统
GB2575294B8 (en) * 2018-07-04 2022-07-20 Graphcore Ltd Host Proxy On Gateway
CN109685201B (zh) * 2018-12-14 2020-10-30 安徽寒武纪信息科技有限公司 运算方法、装置及相关产品
CN109711539B (zh) * 2018-12-17 2020-05-29 中科寒武纪科技股份有限公司 运算方法、装置及相关产品

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104823164A (zh) * 2012-12-06 2015-08-05 相干逻辑公司 具有同步指令的处理系统
CN104967658A (zh) * 2015-05-08 2015-10-07 成都品果科技有限公司 一种多终端设备上的数据同步方法
CN108076126A (zh) * 2016-11-18 2018-05-25 中兴通讯股份有限公司 一种数据同步方法及服务器
CN109977169A (zh) * 2019-03-19 2019-07-05 广州品唯软件有限公司 数据同步方法、装置、计算机可读存储介质及系统

Also Published As

Publication number Publication date
CN112306945A (zh) 2021-02-02
CN112306945B (zh) 2023-05-12

Similar Documents

Publication Publication Date Title
CN110096310B (zh) 运算方法、装置、计算机设备和存储介质
CN110119807B (zh) 运算方法、装置、计算机设备和存储介质
WO2021027972A1 (zh) 数据同步方法及装置以及相关产品
US11687339B2 (en) Data processing method and apparatus, and related product
US20240111536A1 (en) Data processing apparatus and related products
US20240004650A1 (en) Data processing method and apparatus, and related product
WO2021018313A1 (zh) 数据同步方法及装置以及相关产品
WO2021027973A1 (zh) 数据同步方法及装置以及相关产品
WO2021223642A1 (zh) 数据处理方法及装置以及相关产品
CN111047005A (zh) 运算方法、装置、计算机设备和存储介质
CN111813449A (zh) 运算方法、装置及相关产品
WO2021082723A1 (zh) 运算装置
CN112347026B (zh) 数据同步方法及装置以及相关产品
CN112395008A (zh) 运算方法、装置、计算机设备和存储介质
US20240126553A1 (en) Data processing method and apparatus, and related product
WO2021223638A1 (zh) 数据处理方法及装置以及相关产品
CN110458286B (zh) 数据处理方法、装置、计算机设备和存储介质
CN112395002B (zh) 运算方法、装置、计算机设备和存储介质
CN112347185A (zh) 数据同步方法及装置以及相关产品
WO2021223644A1 (zh) 数据处理方法及装置以及相关产品
CN111831722A (zh) 数据同步方法及装置以及相关产品
CN111062483A (zh) 运算方法、装置、计算机设备和存储介质
CN112394999A (zh) 运算方法、装置及相关产品
CN111047027A (zh) 运算方法、装置及相关产品
CN111813376A (zh) 运算方法、装置及相关产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20846384

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20846384

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20846384

Country of ref document: EP

Kind code of ref document: A1