WO2021027972A1 - Procédé et appareil de synchronisation de données, et produit associé - Google Patents

Procédé et appareil de synchronisation de données, et produit associé Download PDF

Info

Publication number
WO2021027972A1
WO2021027972A1 PCT/CN2020/111270 CN2020111270W WO2021027972A1 WO 2021027972 A1 WO2021027972 A1 WO 2021027972A1 CN 2020111270 W CN2020111270 W CN 2020111270W WO 2021027972 A1 WO2021027972 A1 WO 2021027972A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
descriptor
tensor
sub
synchronization
Prior art date
Application number
PCT/CN2020/111270
Other languages
English (en)
Chinese (zh)
Inventor
曾洪博
王秉睿
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021027972A1 publication Critical patent/WO2021027972A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a data synchronization method and device and related products.
  • the present disclosure proposes a data synchronization technical solution.
  • a data synchronization method which is applied to a first processor and includes: generating a state query instruction according to a descriptor of tensor data to be synchronized, wherein the descriptor is To indicate the shape of the tensor data to be synchronized, the state query instruction is used to instruct the second processor to determine the amount of data that can be synchronized for the tensor data and generate a synchronization state instruction, the state query instruction includes the description And/or the content of the descriptor; sending the status query instruction to the second processor.
  • a data synchronization method which is applied to a second processor, and includes: when a status query instruction from the first processor is received, determining the status of the tensor data to be synchronized Descriptor, the descriptor is used to indicate the shape of the tensor data to be synchronized; according to the descriptor of the tensor data, determine the amount of synchronizable data for the tensor data; according to the description of the tensor data Symbol and the synchronizable data amount to generate a synchronization state instruction, the synchronization state instruction is used to instruct the first processor to determine the first sub-data of the tensor data, and the data amount of the first sub-data is equal to The amount of synchronizable data corresponds; the synchronization state instruction is sent to the first processor.
  • a data synchronization device which is applied to a first processor and includes: a query instruction generation module, configured to generate a state query instruction according to a descriptor of tensor data to be synchronized , Wherein the descriptor is used to indicate the shape of the tensor data to be synchronized, and the state query instruction is used to instruct the second processor to determine the amount of data that can be synchronized for the tensor data and generate a synchronization state instruction, so
  • the status query instruction includes the identifier of the descriptor and/or the content of the descriptor; the query instruction sending module is configured to send the status query instruction to the second processor.
  • a data synchronization device which is applied to a second processor, and includes: a query instruction receiving module, configured to determine when a status query instruction from the first processor is received The descriptor of the tensor data to be synchronized, the descriptor is used to indicate the shape of the tensor data to be synchronized; the data amount determination module is used to determine the tensor data according to the descriptor of the tensor data The amount of synchronizable data; the state instruction generation module is used to generate a synchronization state instruction according to the descriptor of the tensor data and the amount of synchronizable data, and the synchronization state instruction is used to instruct the first processor to determine The first sub-data of the tensor data, the data amount of the first sub-data corresponds to the synchronizable data amount; a state instruction sending module, configured to send the synchronization state instruction to the first processor .
  • an artificial intelligence chip including the data synchronization device as described above.
  • an electronic device including the artificial intelligence chip as described above.
  • a board card comprising: a storage device, an interface device, a control device, and the artificial intelligence chip as described above; wherein the artificial intelligence chip and the storage device , The control device and the interface device are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and external equipment; the control device, Used to monitor the state of the artificial intelligence chip.
  • the data-synchronized sender by setting a descriptor indicating the shape of the tensor data, the data-synchronized sender actively queries the status of the receiver according to the descriptor, so as to achieve partial data synchronization between the sender and the receiver, Thereby reducing synchronization overhead and improving the efficiency of data synchronization.
  • Fig. 1 shows a schematic diagram of a processing system of a data synchronization method according to an embodiment of the present disclosure.
  • Fig. 2 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure.
  • Fig. 3 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure.
  • Fig. 4 shows a schematic diagram of data storage space of a data synchronization method according to an embodiment of the present disclosure.
  • Fig. 5 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • Fig. 6 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • Fig. 7 shows a structural block diagram of a board according to an embodiment of the present disclosure.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • the data synchronization method according to the embodiment of the present disclosure can be applied to any processor of a processing system (for example, an artificial intelligence chip) including multiple processors (multi-core).
  • the processor may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor (IPU) for performing artificial intelligence operations.
  • Artificial intelligence operations may include machine learning operations, brain-like operations, etc. Among them, machine learning operations include neural network operations, k-means operations, and support vector machine operations.
  • the artificial intelligence processor may, for example, include GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • GPU Graphics Processing Unit
  • NPU Neuro-Network Processing Unit
  • DSP Digital Signal Process, digital signal processing unit
  • field programmable gate array Field-Programmable Gate Array, FPGA
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as: convolution operation tasks, pooling tasks Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks executed by the processing unit.
  • Fig. 1 shows a schematic diagram of a processing system of a data synchronization method according to an embodiment of the present disclosure.
  • the processing system 100 includes multiple processors 101 and a memory 102.
  • the multiple processors 101 are used to execute instruction sequences.
  • the memory 102 is used to store data, and may include random access memory (RAM, Random Access Memory) and registers. heap.
  • RAM random access memory
  • the multiple processors 101 in the processing system 100 can not only share part of the storage space, for example, share part of the RAM storage space and the register file, but also have their own storage space at the same time.
  • Fig. 2 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure. As shown in Figure 2, the method is applied to the first processor (any processor in the processing system), and the method includes:
  • step S11 generate a state query instruction according to the descriptor of the tensor data to be synchronized, wherein the descriptor is used to indicate the shape of the tensor data to be synchronized, and the state query instruction is used to indicate the second processing
  • the device determines the amount of data that can be synchronized for the tensor data and generates a synchronization state instruction, where the state query instruction includes the identifier of the descriptor and/or the content of the descriptor;
  • step S12 send the state query instruction to the second processor.
  • Quantities can be of different dimensions.
  • a scalar can be regarded as a 0-dimensional tensor
  • a vector can be regarded as a 1-dimensional tensor
  • a matrix can be a tensor of 2 dimensions or more than 2 dimensions.
  • the shape of a tensor includes information such as the dimensions of the tensor and the size of each dimension of the tensor. For example, for tensors:
  • the shape of the tensor can be described by the descriptor as (2, 4), which means that the tensor is a two-dimensional tensor through two parameters, and the size of the first dimension (column) of the tensor is 2. The size of the second dimension (row) is 4. It should be noted that the present disclosure does not limit the manner in which the descriptor indicates the shape of the tensor. When storing tensor data in the memory, the shape of the tensor data cannot be determined according to its data address (or storage area), and the relationship between multiple tensor data cannot be determined. The access efficiency is low, and the complexity of data synchronization is also greater.
  • a descriptor (or called a tensor descriptor) can be set to indicate the shape of the tensor data (ie, N-dimensional tensor data).
  • the value of N can be determined according to the dimensionality (order) of the tensor data, or can be set according to the needs of the tensor data.
  • the tensor data is three-dimensional tensor data
  • the descriptor can be used to indicate the shape of the three-dimensional tensor data in three dimensions (such as offset, size, etc.). It should be understood that those skilled in the art can set the value of N according to actual needs, which is not limited in the present disclosure.
  • the descriptor may include identification and content, etc.
  • the identifier of the descriptor may be used to distinguish the descriptor, for example, a number; the content of the descriptor may include at least one shape representing the shape of the tensor data
  • the parameters (for example, the size in each dimension of the tensor, etc.) may also include at least one address parameter representing the address of the tensor data (for example, the reference address of the data reference point).
  • the present disclosure does not limit the specific parameters included in the content of the descriptor.
  • the shape of tensor data can be expressed, and related information such as the relationship between multiple tensor data can be determined, which improves the efficiency of access to tensor data, thereby reducing The complexity of data synchronization.
  • data synchronization between multiple processors may be required, for example, the calculation result of processor A1 is synchronized to the processor A2 is used as input data for another operation.
  • a descriptor-based data synchronization mechanism can be used to achieve data synchronization.
  • the non-shared storage space of each processor may be allocated to the tensor data to be synchronized.
  • the space may be limited, and the overall synchronization of the tensor data cannot be achieved.
  • partial synchronization of tensor data can be performed, and the entire tensor data synchronization process can be realized through multiple partial synchronizations.
  • the first processor of the multiple processors is the sender of data synchronization
  • the second processor is the receiver of data synchronization.
  • Both the first processor and the second processor are any of the multiple processors.
  • the second processor may be of the same or different type as the first processor. The type is not restricted.
  • the sender of data synchronization has tensor data to be synchronized.
  • the sender can query the status of the receiver , Determine the amount of data that the unshared storage space of the receiver of the data synchronization can be allocated to the space of the tensor data, so as to perform partial synchronization of the tensor data.
  • the first processor of the multiple processors is the sender of data synchronization
  • the second processor is the receiver of data synchronization. Both the first processor and the second processor are any of the multiple processors.
  • the second processor may be of the same or different type as the first processor. The type is not restricted.
  • the first processor may generate a state query instruction according to the descriptor of the tensor data to be synchronized in step S11.
  • the state query instruction may include the identifier of the descriptor of the tensor data to be synchronized and/or the content of the descriptor, which is used to instruct the second processor to determine and reply its own state (that is, the amount of data that can be synchronized for the tensor data ).
  • the first processor may send the state query instruction to the second processor in step S12.
  • the second processor can parse the instruction to determine the identifier of the descriptor and/or the content of the descriptor. According to the identifier of the descriptor and/or the content of the descriptor, the second processor can determine the tensor data to be synchronized, and then determine the space that can be allocated to the tensor data, and determine the amount of synchronizable data for the tensor data.
  • the second processor can generate and send a synchronization state instruction, so that the first processor can determine the descriptor of the tensor data to be synchronized and the synchronizable data The amount of synchronized data.
  • the sender of the data synchronization can actively query the status of the receiver, so as to achieve partial data synchronization between the sender and the receiver, thereby improving the efficiency of data synchronization.
  • the method further includes:
  • the first sub-data of the tensor data Upon receiving the synchronization state instruction from the second processor, determine the first sub-data of the tensor data according to the descriptor of the tensor data in the synchronization state instruction and the amount of data that can be synchronized.
  • the data amount of the first sub-data corresponds to the synchronizable data amount
  • a descriptor synchronization instruction is generated and sent to the second processor to instruct the second processor to obtain the first sub-data.
  • the instruction can be parsed to obtain the content of the instruction (such as the identifier of the descriptor, the amount of data that can be synchronized, etc.).
  • the descriptor of the tensor data to be synchronized can be determined, so as to determine the tensor data to be synchronized; and the part of the data that can be synchronized this time is determined from the tensor data according to the amount of synchronized data. That is the first sub-data.
  • the data amount of the first sub-data may correspond to the synchronizable data amount, for example, the data amount of the first sub-data is less than or equal to the synchronizable data amount.
  • data with a synchronizable amount of data can be selected from the tensor data as the first sub-data; if part of the data of the tensor data Unsynchronized, and the amount of unsynchronized partial data is greater than the amount of data that can be synchronized, you can select the amount of data that can be synchronized from the unsynchronized partial data (that is, the second sub-data of the tensor data) as the first Sub-data; if the amount of unsynchronized partial data is less than or equal to the amount of synchronizable data, the unsynchronized partial data can be directly used as the first sub-data. It should be understood that those skilled in the art can determine the first sub-data according to the actual situation. Data, this disclosure does not limit this.
  • the synchronization state instruction may also include the range of part of the tensor data to be synchronized, such as the descriptor content or storage address range of the part of the sub-data, so as to specify the part to be synchronized. data.
  • the first processor may directly determine the first sub-data to be synchronized according to the range of the partial data.
  • the first processor may generate a descriptor synchronization instruction according to the first sub-data and send the descriptor synchronization instruction to the second processor.
  • the instruction may include the identifier of the descriptor of the tensor data to be synchronized and the first sub-data.
  • the second processor can parse the instruction to determine the descriptor of the tensor data to be synchronized and the first sub-data of the tensor data, determine the tensor data to be synchronized according to the descriptor, and Store the first sub-data of the tensor data in its own unshared storage space.
  • the tensor data can be determined according to the descriptor in the synchronization state command
  • the sub-data of this synchronization can be determined according to the amount of synchronizable data of the receiver
  • the descriptor synchronization command can be generated and sent according to the sub-data to make the receiver
  • the party obtains the sub-data synchronized this time, thereby reducing synchronization overhead and improving the efficiency of data synchronization.
  • the synchronization state command includes the identifier of the descriptor.
  • the step of determining the first sub-data of the tensor data according to the descriptor of the tensor data in the synchronization state instruction and the amount of synchronizable data Can include:
  • the descriptor of the tensor data to be synchronized is determined.
  • the synchronization status command may include the identifier of the descriptor (for example, the identifier is TR1) and the amount of data that can be synchronized.
  • the first processor may parse the synchronization state instruction to obtain the identifier of the descriptor and the amount of synchronizable data; and then determine the descriptor of the tensor data to be synchronized according to the identifier of the descriptor.
  • the tensor is determined according to the descriptor of the tensor data in the synchronization state instruction and the amount of synchronizable data
  • the step of the first sub-data of the data may include:
  • the first processor receives the synchronization state instruction from the second processor, it can determine the tensor data to be synchronized according to the descriptor; according to the state of the data in the tensor data, it can be determined that it is in synchronization
  • the second sub-data of the state; according to the second sub-data and the amount of synchronizable data indicated by the synchronization state instruction, the first sub-data of this synchronization can be determined.
  • the first sub-data synchronized this time can be selected from the second sub-data; if the data amount of the second sub-data If it is less than or equal to the amount of synchronizable data, the second sub-data can be directly used as the first sub-data.
  • part of the data synchronized this time can be determined, so as to achieve partial synchronization of tensor data and improve the efficiency of data synchronization.
  • the method further includes: changing the state of the first sub-data of the tensor data from the pending state to the synchronized state.
  • the first processor can The state of the data in the tensor data is changed, that is, the state of the first sub-data is changed from the pending state to the synchronized state.
  • the next synchronized data can be determined from the partial data in the to-be-synchronized state, thereby avoiding repeated synchronization of data and improving data The efficiency of synchronization.
  • Fig. 3 shows a flowchart of a data synchronization method according to an embodiment of the present disclosure. As shown in FIG. 3, the method is applied to the second processor, and the method includes:
  • step S31 when a state query instruction from the first processor is received, a descriptor of the tensor data to be synchronized is determined, and the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • step S32 determine the amount of synchronizable data for the tensor data according to the descriptor of the tensor data
  • step S33 according to the descriptor of the tensor data and the amount of synchronizable data, a synchronization state instruction is generated, and the synchronization state instruction is used to instruct the first processor to determine the first value of the tensor data.
  • the data amount of the first sub-data corresponds to the synchronizable data amount;
  • step S34 the synchronization state instruction is sent to the first processor.
  • the sender of data synchronization when the sender of data synchronization has tensor data to be synchronized, the sender can also query the status of the receiver.
  • the first processor (sender) can generate and send a state query instruction, and when the second processor receives the state query instruction in step S31, it can parse the instruction and determine the descriptor of the tensor data to be synchronized.
  • the second processor may determine the tensor data to be synchronized according to the descriptor, and determine that its own non-shared storage space can be allocated to the space of the tensor data.
  • the amount of data that can be accommodated that is, the amount of data to be synchronized, for partial synchronization of tensor data.
  • the second processor may generate and send a synchronization state instruction to the first processor according to the determined amount of synchronizable data and the descriptor of the tensor data to indicate
  • the first processor determines the descriptor of the tensor data to be synchronized and the synchronizable data amount for this synchronization.
  • the first processor may generate a descriptor synchronization instruction and send the descriptor synchronization instruction to the second processor in step S34.
  • the instruction may include the identifier of the descriptor of the tensor data to be synchronized and the first sub-data.
  • the sender can query the status of the receiver, and the receiver can determine and reply its own status (that is, the amount of data to be synchronized) after receiving the status query instruction, and achieve partial synchronization of tensor data through interaction to improve data synchronization s efficiency.
  • the method further includes:
  • the first sub-data of the tensor data is stored.
  • the second processor when it receives a descriptor synchronization instruction, it can parse the instruction to determine the descriptor of the tensor data to be synchronized and the first sub-data of the tensor data synchronized this time; and then according to the description The symbol determines the tensor data to be synchronized, and stores the first sub-data of the tensor data in its own non-shared storage space.
  • the receiver can determine the descriptor according to the descriptor synchronization instruction and obtain the sub-data synchronized this time, thereby reducing synchronization overhead and improving the efficiency of data synchronization.
  • the data synchronization receiver can initiate a partial synchronization request for tensor data, that is, the receiver sends a descriptor synchronization request instruction, which can indicate the descriptor of the tensor data to be synchronized And the amount of data that can be synchronized for the tensor data, that is, the amount of data that the receiver's unshared storage space can be allocated to the space of the tensor data.
  • a data synchronization method is also provided, which is applied to the first processor, and the method includes: when a descriptor synchronization request instruction from the second processor is received, determining to be synchronized The descriptor of the tensor data of and the amount of synchronizable data for the tensor data, wherein the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • a descriptor synchronization instruction is generated and sent to the second processor to instruct the second processor to obtain the first sub-data.
  • the first processor when it receives the descriptor synchronization request instruction from the second processor, it can parse the instruction to obtain the content of the instruction (for example, the content of the tensor data to be synchronized).
  • the first processor may determine the tensor data to be synchronized according to the descriptor, and determine the part of the data that can be synchronized this time from the tensor data according to the amount of synchronizable data, that is, The first sub-data.
  • the data amount of the first sub-data may correspond to the synchronizable data amount, for example, the data amount of the first sub-data is less than or equal to the synchronizable data amount.
  • data with a synchronizable amount of data can be selected from the tensor data as the first sub-data; if part of the data of the tensor data Unsynchronized, and the amount of unsynchronized partial data is greater than the amount of data that can be synchronized, you can select the amount of data that can be synchronized from the unsynchronized partial data (that is, the second sub-data of the tensor data) as the first Sub-data; if the amount of unsynchronized partial data is less than or equal to the amount of synchronizable data, the unsynchronized partial data can be directly used as the first sub-data. It should be understood that those skilled in the art can determine the first sub-data according to the actual situation. Data, this disclosure does not limit this.
  • the first processor may generate a descriptor synchronization instruction according to the first sub-data and send the descriptor synchronization instruction to the second processor.
  • the instruction may include the identifier of the descriptor of the tensor data to be synchronized and the first sub-data.
  • the second processor can parse the instruction to determine the descriptor of the tensor data to be synchronized and the first sub-data of the tensor data, determine the tensor data to be synchronized according to the descriptor, and Store the first sub-data of the tensor data in its own unshared storage space.
  • the tensor data can be determined according to the descriptor in the descriptor synchronization request instruction, and the current time can be determined according to the amount of synchronizable data of the receiver.
  • a descriptor synchronization instruction is generated and sent according to the sub-data, so that the receiver can obtain the synchronized sub-data this time, thereby reducing synchronization overhead and improving the efficiency of data synchronization.
  • the descriptor synchronization request instruction may include the identifier of the descriptor, and when the descriptor synchronization request instruction from the second processor is received, the descriptor of the tensor data to be synchronized is determined, and The synchronizable data volume of the tensor data includes: parsing the descriptor synchronization request instruction to obtain the identifier of the descriptor and the synchronizable data volume;
  • the descriptor of the tensor data to be synchronized is determined.
  • the descriptor synchronization instruction may only include the identifier of the descriptor (for example, in the identifier of the descriptor When it is TR1, the descriptor synchronization command is expressed as Send TR1) and the amount of data that can be synchronized.
  • the first processor may parse the descriptor synchronization request instruction to obtain the identifier of the descriptor and the amount of synchronizable data; and then determine the descriptor of the tensor data to be synchronized according to the identifier of the descriptor.
  • the descriptor synchronization request instruction includes the data characteristics of the tensor data to be synchronized, and when the descriptor synchronization request instruction from the second processor is received, the tensor data to be synchronized is determined.
  • the descriptor of and the amount of data that can be synchronized for the tensor data include:
  • the descriptor of the tensor data is determined according to the data characteristics of the tensor data.
  • the descriptor synchronization instruction may include the data characteristics of the tensor data to be synchronized.
  • the data feature may include information such as the identification, shape, source, and address of the tensor data.
  • the data source of the tensor data is the Kth sender (the Kth processor), the data source of the tensor data is the result of the convolution operation numbered 200, and the address of the tensor data is specific
  • the address area for example, addresses ADDR0-ADDR127
  • the shape of the tensor data is a specified shape (for example, a two-dimensional tensor of 20*10), etc.
  • Those skilled in the art can set the data characteristics of the tensor data to be synchronized according to the actual situation, which is not limited in the present disclosure.
  • the first processor can find the tensor data to be synchronized, and determine the descriptor of the tensor data to be synchronized, for example, directly obtain or newly register the corresponding description symbol. According to the descriptor of the tensor data to be synchronized, the tensor data can be determined, and then the sub-data synchronized this time can be determined according to the amount of synchronizable data.
  • the descriptor of the tensor data to be synchronized can be determined according to the data characteristics in the descriptor synchronization request instruction, so as to achieve partial synchronization of the tensor data, so that the tensor data itself does not need to be transmitted during synchronization, which reduces The amount of transmitted data and synchronization overhead improve processing efficiency.
  • determining the first sub-data of the tensor data according to the descriptor of the tensor data and the amount of synchronizable data includes:
  • the first processor receives the descriptor synchronization request instruction from the second processor, it can determine the tensor data to be synchronized according to the descriptor; according to the state of the data in the tensor data, it can be determined that it is in The second sub-data in the to-be-synchronized state; according to the second sub-data and the amount of synchronizable data indicated by the descriptor synchronization request instruction, the first sub-data to be synchronized this time can be determined.
  • the first sub-data synchronized this time can be selected from the second sub-data; if the data amount of the second sub-data If it is less than or equal to the amount of synchronizable data, the second sub-data can be directly used as the first sub-data.
  • part of the data synchronized this time can be determined, so as to achieve partial synchronization of tensor data and improve the efficiency of data synchronization.
  • the method further includes: changing the state of the first sub-data of the tensor data from the pending state to the synchronized state.
  • the first processor can The state of the data in the tensor data is changed, that is, the state of the first sub-data is changed from the pending state to the synchronized state.
  • the next synchronization data can be determined from the partial data in the to-be-synchronized state, thereby avoiding repeated data synchronization and improving the efficiency of data synchronization.
  • a data synchronization method is also provided, which is applied to the second processor, and the method includes: according to the descriptor of the tensor data to be synchronized and the synchronizable data for the tensor data
  • the descriptor synchronization request instruction is generated, where the descriptor is used to indicate the shape of the tensor data to be synchronized, and the descriptor synchronization request instruction is used to instruct the first processor to determine according to the descriptor synchronization request instruction
  • the descriptor of the tensor data to be synchronized and the first sub-data of the tensor data, the data amount of the first sub-data corresponds to the synchronizable data amount; sending the first processor Descriptor synchronization request command.
  • the second processor of the multiple processors is the receiver of data synchronization, and the second processor initiates a partial synchronization request for tensor data.
  • step S31 when there is tensor data to be synchronized in the second processor, it can be determined that the descriptor of the tensor data and the non-shared storage space of the second processor itself can be allocated to the space of the tensor data. The amount of data can also be synchronized.
  • the second processor may generate a descriptor synchronization request command and send the command in step S32.
  • the descriptor synchronization request instruction may include at least one of the identifier of the descriptor, the content of the descriptor, and the data characteristic of the tensor data, and is used to instruct the first processor to determine the descriptor of the tensor data to be synchronized according to the instruction. And the first sub-data of the tensor data.
  • the first processor may parse the instruction to determine the descriptor of the tensor data to be synchronized and the amount of synchronized data; according to the descriptor Determine the tensor data to be synchronized, and determine the part of data that can be synchronized this time from the tensor data according to the amount of synchronizable data, that is, the first sub-data.
  • the data amount of the first sub-data may correspond to the synchronizable data amount, for example, the data amount of the first sub-data is less than or equal to the synchronizable data amount.
  • data with a synchronizable amount of data can be selected from the tensor data as the first sub-data; if part of the data of the tensor data Unsynchronized, and the amount of unsynchronized partial data is greater than the amount of data that can be synchronized, you can select the amount of data that can be synchronized from the unsynchronized partial data (that is, the second sub-data of the tensor data) as the first Sub-data; if the amount of unsynchronized partial data is less than or equal to the amount of synchronizable data, the unsynchronized partial data can be directly used as the first sub-data. It should be understood that those skilled in the art can determine the first sub-data according to the actual situation. Data, this disclosure does not limit this.
  • the descriptor synchronization request instruction may also include a range of partial data of the tensor data to be synchronized, such as the descriptor content or storage address range of the partial sub-data, so as to specify the acquisition Part of the data.
  • the receiver can initiate a partial synchronization request of the tensor data, so that the sender can determine the sub-data to be synchronized this time, thereby improving the efficiency of data synchronization.
  • the method further includes:
  • the first sub-data of the tensor data is stored.
  • the first processor may generate and send a descriptor synchronization instruction according to the descriptor of the tensor data and the first sub-data.
  • the second processor receives the descriptor synchronization instruction, it can parse the instruction to determine the descriptor of the tensor data to be synchronized and the first sub-data of the tensor data synchronized this time; and then determine the descriptor to be synchronized according to the descriptor. Synchronize the tensor data, and store the first sub-data of the tensor data in its own non-shared storage space.
  • the receiver can determine the descriptor according to the descriptor synchronization instruction and obtain the sub-data synchronized this time, thereby reducing synchronization overhead and improving the efficiency of data synchronization.
  • the identifier and content of the descriptor can be stored in the descriptor storage space, which can be the internal memory of the processor (such as registers, on-chip SRAM or other media cache, etc.) Storage space.
  • the data storage space of the tensor data indicated by the descriptor may be a storage space in the internal memory of the processor (for example, on-chip cache) or an external memory (off-chip memory) connected to the processor.
  • the data address in the data storage space may be an actual physical address or a virtual address.
  • the present disclosure does not limit the location of the descriptor storage space and the data storage space, and the type of data address.
  • the identifier and content of the descriptor and the tensor data indicated by the descriptor can be located in the same area.
  • a continuous area of the on-chip cache can be used to store the relevant content of the descriptor
  • the address is ADDR0-ADDR1023, where the address ADDR0-ADDR31 can be used to store the identifier of the descriptor, the address ADDR32-ADDR63 can be used to store the content of the descriptor, and the address ADDR64-ADDR1023 can be used to store the tensor data indicated by the descriptor.
  • the address ADDR is not limited to one bit or one byte. It is used here to indicate an address and is an address unit. Those skilled in the art can determine the storage area and its address in actual conditions, and this disclosure does not limit this.
  • the identifier and content of the descriptor and the tensor data indicated by the descriptor can be stored separately in different areas of the internal memory.
  • a register can be used as a descriptor storage space, and the description can be stored in the register.
  • the identifier and content of the symbol use the on-chip cache as the data storage space to store the tensor data indicated by the descriptor.
  • a special register (SR) dedicated to the descriptor can also be set, and the data in the descriptor can be an immediate value or can be obtained from a special register.
  • the number of the register can be used to represent the identifier of the descriptor. For example, when the number of the register is 0, the identifier of the stored descriptor is 0.
  • an area can be allocated in the cache space according to the size of the tensor data indicated by the descriptor (for example, a tensor cache unit is created for each tensor data in the cache) for storing the Tensor data. It should be understood that a preset cache space may also be used to store the tensor data, which is not limited in the present disclosure.
  • the identifier and content of the descriptor can be stored in the internal memory, and the tensor data indicated by the descriptor can be stored in the external memory.
  • a method of storing the identifier and content of the descriptor on the chip, and storing the tensor data indicated by the descriptor off the chip may be adopted.
  • the data address of the data storage space corresponding to the descriptor may be a fixed address.
  • a separate data storage space can be divided for tensor data, and the starting address of each tensor data in the data storage space corresponds to the identifier of the descriptor in a one-to-one correspondence.
  • the processor can determine the data address of the tensor data based on the content of the descriptor.
  • the descriptor may also be used to indicate the address of N-dimensional tensor data, where the descriptor
  • the content of may also include at least one address parameter representing the address of the tensor data.
  • tensor data is three-dimensional data.
  • the content of the descriptor may include an address parameter indicating the address of the tensor data, such as the starting address of the tensor data, and It may include multiple address parameters of the address of the tensor data, such as the start address of the tensor data + address offset, or the address parameters of the tensor data based on each dimension.
  • address parameters such as the start address of the tensor data + address offset, or the address parameters of the tensor data based on each dimension.
  • the address parameter of the tensor data includes a reference address of the data reference point of the descriptor in the data storage space of the tensor data.
  • the reference address can be different according to the change of the data reference point.
  • the present disclosure does not limit the selection of data reference points.
  • the reference address may include the start address of the data storage space.
  • the reference address of the descriptor is the starting address of the data storage space.
  • the reference address of the descriptor is the physical address of the data block in the data storage space.
  • the shape parameter of the tensor data includes at least one of the following: the size of the data storage space of the tensor data in at least one of the N dimensional directions, and the storage area The size in at least one of the N dimensional directions, the offset of the storage area in at least one of the N dimensional directions, and at least two vertices at diagonal positions in the N dimensional directions relative to the data The position of the reference point, the data description position of the tensor data indicated by the descriptor and the mapping relationship between the data address. Among them, the data description position is the mapping position of the point or region in the tensor data indicated by the descriptor.
  • the descriptor can be represented by 3D space coordinates (x, y, z)
  • the shape of the tensor data and the data description position of the tensor data may be the position of a point or area in the three-dimensional space that the tensor data is mapped to, which is represented by three-dimensional space coordinates (x, y, z).
  • Fig. 4 shows a schematic diagram of data storage space of a data synchronization method according to an embodiment of the present disclosure.
  • the data storage space 21 stores a two-dimensional data in a row-first manner, which can be represented by (x, y) (where the X axis goes horizontally to the right, and the Y axis goes vertically downwards).
  • the size (the size of each row) is ori_x (not shown in the figure)
  • the size in the Y-axis direction (the total number of rows)
  • ori_y not shown in the figure
  • the start address of the data storage space 21 is PA_start (reference Address) is the physical address of the first data block 22.
  • the data block 23 is part of the data in the data storage space 21, the offset 25 in the X axis direction is represented as offset_x, the offset 24 in the Y axis direction is represented as offset_y, and the size in the X axis direction is represented Is size_x, and the size in the Y-axis direction is expressed as size_y.
  • the data reference point of the descriptor can use the first data block of the data storage space 21, and the reference address of the descriptor is the start of the data storage space 21.
  • the start address PA_start can then be combined with the size ori_x of the data storage space 21 on the X axis, the size ori_y on the Y axis, and the offset of the data block 23 in the Y axis direction offset_y, the offset amount offset_x in the X axis direction,
  • the size size_x in the X-axis direction and the size size_y in the Y-axis direction determine the content of the descriptor of the data block 23.
  • the descriptor describes a two-dimensional space
  • those skilled in the art can set the dimension represented by the content of the descriptor according to the actual situation, which is not limited in the present disclosure.
  • At least two vertices at diagonal positions in N dimensions relative to the data reference may be based on the reference address of the data reference point of the descriptor in the data storage space. The position of the point determines the content of the descriptor of the tensor data.
  • the reference address PA_base of the data reference point of the descriptor in the data storage space and the position of the two diagonal vertices relative to the data reference point can be used to determine the descriptor value of the data block 23 in FIG. 2 content.
  • one data (for example, the data at position (2, 2)) can be selected as the data reference point in the data storage space 21 ,
  • the physical address of the data in the data storage space is used as the reference address PA_base; then, determine the position of at least two vertices of the diagonal position of the data block 23 relative to the data reference point, for example, using the upper left to lower right direction pair
  • the position of the corner position vertex relative to the data reference point where the relative position of the upper left corner vertex is (x_min, y_min), the relative position of the lower right corner vertex is (x_max, y_max), and then can be based on the reference address PA_base, the upper left corner vertex
  • the relative position (x_min, y_min) and the relative position (x_max, y_max) of the vertex at the lower right corner determine the content of the descriptor of the data block 23.
  • the data reference point of the descriptor may be based on the reference address in the data storage space, and between the data description position of the tensor data indicated by the descriptor and the data address To determine the content of the descriptor of the tensor data.
  • the mapping relationship between the data description location and the data address can be set according to actual needs. For example, when the tensor data indicated by the descriptor is three-dimensional spatial data, the function f(x, y, z) can be used to define The data describes the mapping relationship between the location and the data address.
  • mapping relationship between the data description location and the data address can be set according to the actual situation, which is not limited in the present disclosure.
  • PA2 (x,y) PA_start+(offset_y+y q -1)*ori_x+(offset_x+x q ) (4)
  • the processor can calculate the data address of the tensor data indicated by the descriptor in the data storage space according to the content of the descriptor, and then perform corresponding processing (such as data operation, data synchronization, etc.) according to the address, Therefore, the complexity of data access can be reduced, and the processing efficiency of the processor can be improved.
  • partial synchronization of tensor data can be achieved when the receiver space of the data synchronization is insufficient, and the synchronization of the entire tensor data is achieved through multiple partial synchronizations, thereby avoiding the lack of space
  • the overall synchronization failure or synchronization delay of tensor data improves the efficiency of data synchronization; and a descriptor indicating the shape of the tensor data is set, and the tensor data is determined according to the descriptor during the data synchronization process.
  • steps in the flowchart are displayed in sequence according to the directions of the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least part of the steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • Fig. 5 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • the data synchronization device is applied to the first processor.
  • the data synchronization device includes:
  • the query instruction generating module 51 is configured to generate a state query instruction according to the descriptor of the tensor data to be synchronized, wherein the descriptor is used to indicate the shape of the tensor data to be synchronized, and the state query instruction is used to indicate
  • the second processor determines the amount of synchronizable data for the tensor data and generates a synchronization state instruction, where the state query instruction includes the identifier of the descriptor and/or the content of the descriptor;
  • the query instruction sending module 52 is configured to send the state query instruction to the second processor.
  • the device further includes:
  • the sub-data determining module is used to determine the size of the tensor data according to the descriptor of the tensor data and the amount of synchronizable data in the synchronization state instruction when the synchronization state instruction from the second processor is received
  • the first sub-data, the data amount of the first sub-data corresponds to the synchronizable data amount
  • the synchronization instruction generating and sending module is configured to generate a descriptor synchronization instruction according to the first sub-data and send the descriptor synchronization instruction to the second processor to instruct the second processor to obtain the first A sub-data.
  • the synchronization state instruction includes an identifier of a descriptor
  • the sub-data determining module includes:
  • the parsing sub-module is used to parse the synchronization state instruction to obtain the identifier of the descriptor and the amount of data that can be synchronized;
  • the descriptor determining submodule is used to determine the descriptor of the tensor data to be synchronized according to the identifier of the descriptor.
  • the sub-data determining module includes:
  • the first determining submodule is configured to determine the tensor data and the second sub-data in the to-be-synchronized state in the tensor data according to the descriptor of the tensor data;
  • the second determining sub-module is configured to determine the first sub-data according to the second sub-data and the amount of synchronizable data in the synchronization state instruction.
  • the device further includes:
  • the state change module is used to change the state of the first sub-data of the tensor data from the pending state to the synchronized state.
  • Fig. 6 shows a block diagram of a data synchronization device according to an embodiment of the present disclosure.
  • the data synchronization device is applied to the second processor.
  • the data synchronization device includes:
  • the query instruction receiving module 61 is configured to determine the descriptor of the tensor data to be synchronized when receiving the state query instruction from the first processor, and the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • the data amount determining module 62 is configured to determine the synchronizable data amount for the tensor data according to the descriptor of the tensor data;
  • the state instruction generating module 63 is configured to generate a synchronization state instruction according to the descriptor of the tensor data and the amount of synchronizable data, and the synchronization state instruction is used to instruct the first processor to determine the tensor data
  • the data amount of the first sub data corresponds to the synchronizable data amount
  • the status command sending module 64 is configured to send the synchronization status command to the first processor.
  • the device further includes:
  • a synchronization instruction receiving module configured to determine the descriptor of the tensor data to be synchronized and the first sub-data of the tensor data when receiving the descriptor synchronization instruction from the first processor;
  • the data storage module is configured to store the first sub-data of the tensor data according to the descriptor of the tensor data.
  • the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of the units/modules in the foregoing embodiment is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules, or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist.
  • the modules are integrated together.
  • the above-mentioned integrated units/modules can be implemented in the form of hardware or software program modules.
  • the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Static random access memory SRAM Static Random-Access Memory
  • enhanced dynamic random access memory EDRAM Enhanced Dynamic Random Access Memory
  • high-bandwidth memory HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other various media that can store program codes.
  • an artificial intelligence chip is also disclosed, which includes the above-mentioned data synchronization device.
  • a board card which includes a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip; wherein, the artificial intelligence chip and the storage device and the control device And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to implement data transmission between the artificial intelligence chip and external equipment; the control device is used to The state of the artificial intelligence chip is monitored.
  • Fig. 7 shows a structural block diagram of a board according to an embodiment of the present disclosure.
  • the board may include other supporting components in addition to the chip 389 described above.
  • the supporting components include, but are not limited to: a storage device 390, Interface device 391 and control device 392;
  • the storage device 390 is connected to the artificial intelligence chip through a bus for storing data.
  • the storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage unit may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips).
  • the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of the storage unit, the theoretical bandwidth of data transmission can reach 25600MB/s.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the artificial intelligence chip.
  • the interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function.
  • the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the artificial intelligence chip.
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the artificial intelligence chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multiple load and light load.
  • the control device can realize the regulation of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
  • an electronic device which includes the aforementioned artificial intelligence chip.
  • Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • a data synchronization method which is applied to a first processor, includes:
  • a state query instruction is generated, wherein the descriptor is used to indicate the shape of the tensor data to be synchronized, and the state query instruction is used to instruct the second processor to determine the The synchronizable data amount of the tensor data and generating a synchronization state instruction, the state query instruction including the identifier of the descriptor and/or the content of the descriptor;
  • the first sub-data of the tensor data Upon receiving the synchronization state instruction from the second processor, determine the first sub-data of the tensor data according to the descriptor of the tensor data in the synchronization state instruction and the amount of data that can be synchronized.
  • the data amount of the first sub-data corresponds to the synchronizable data amount
  • a descriptor synchronization instruction is generated and sent to the second processor to instruct the second processor to obtain the first sub-data.
  • the first sub-data of the tensor data is determined according to the descriptor of the tensor data in the synchronization state instruction and the amount of data that can be synchronized, include:
  • the descriptor of the tensor data to be synchronized is determined.
  • Clause A4 when a synchronization state instruction from the second processor is received, according to the tensor data descriptor and the amount of synchronizable data in the synchronization state instruction, Determining the first sub-data of the tensor data includes:
  • Clause A5 The method according to any one of Clauses A2 to A4, the method further comprising:
  • the state of the first sub-data of the tensor data is changed from the pending state to the synchronized state.
  • a data synchronization method applied to a second processor including:
  • a synchronization state instruction is generated, and the synchronization state instruction is used to instruct the first processor to determine the first sub-data of the tensor data, the The data amount of the first sub-data corresponds to the synchronizable data amount;
  • the first sub-data of the tensor data is stored.
  • a data synchronization device applied to a first processor including:
  • the query instruction generation module is used to generate a state query instruction according to the descriptor of the tensor data to be synchronized, wherein the descriptor is used to indicate the shape of the tensor data to be synchronized, and the state query instruction is used to indicate the first
  • the second processor determines the amount of data that can be synchronized for the tensor data and generates a synchronization state instruction, where the state query instruction includes the identifier of the descriptor and/or the content of the descriptor;
  • the query instruction sending module is configured to send the status query instruction to the second processor.
  • the sub-data determining module is used to determine the size of the tensor data according to the descriptor of the tensor data and the amount of synchronizable data in the synchronization state instruction when the synchronization state instruction from the second processor is received
  • the first sub-data, the data amount of the first sub-data corresponds to the synchronizable data amount
  • the synchronization instruction generating and sending module is configured to generate a descriptor synchronization instruction according to the first sub-data and send the descriptor synchronization instruction to the second processor to instruct the second processor to obtain the first A sub-data.
  • the parsing sub-module is used to parse the synchronization state instruction to obtain the identifier of the descriptor and the amount of data that can be synchronized;
  • the descriptor determining submodule is used to determine the descriptor of the tensor data to be synchronized according to the identifier of the descriptor.
  • the first determining submodule is configured to determine the tensor data and the second sub-data in the to-be-synchronized state in the tensor data according to the descriptor of the tensor data;
  • the second determining sub-module is configured to determine the first sub-data according to the second sub-data and the amount of synchronizable data in the synchronization state instruction.
  • the state change module is used to change the state of the first sub-data of the tensor data from the pending state to the synchronized state.
  • a data synchronization device applied to a second processor including:
  • the query instruction receiving module is configured to determine the descriptor of the tensor data to be synchronized when receiving the state query instruction from the first processor, and the descriptor is used to indicate the shape of the tensor data to be synchronized;
  • a data amount determination module configured to determine the synchronizable data amount for the tensor data according to the descriptor of the tensor data
  • the state instruction generating module is configured to generate a synchronization state instruction according to the descriptor of the tensor data and the amount of synchronizable data, and the synchronization state instruction is used to instruct the first processor to determine the status of the tensor data
  • the first sub-data, the data amount of the first sub-data corresponds to the synchronizable data amount;
  • the status command sending module is configured to send the synchronization status command to the first processor.
  • a synchronization instruction receiving module configured to determine the descriptor of the tensor data to be synchronized and the first sub-data of the tensor data when receiving the descriptor synchronization instruction from the first processor;
  • the data storage module is configured to store the first sub-data of the tensor data according to the descriptor of the tensor data.
  • Clause A16 An electronic device comprising the artificial intelligence chip as described in Clause A15.
  • a board card comprising: a storage device, an interface device and a control device, and the artificial intelligence chip as described in clause A15; wherein the artificial intelligence chip is related to the storage device and the control device And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to implement data transmission between the artificial intelligence chip and external equipment; the control device is used to The state of the artificial intelligence chip is monitored.
  • the storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
  • the chip includes a DDR controller, which is used to control the data transmission and data storage of each storage unit;
  • the interface device is a standard PCIE interface.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un appareil de synchronisation de données ainsi qu'un produit associé. Le procédé comprend : lors de la réception d'une instruction d'Interrogation d'état provenant d'un premier processeur, la détermination d'un descripteur de données de tenseur à synchroniser (S31) ; la détermination de la quantité de données qui peuvent être synchronisées pour les données de tenseur en fonction du descripteur des données de tenseur (S32) ; la génération d'une instruction d'état synchrone en fonction du descripteur des données de tenseur et de la quantité de données qui peuvent être synchronisées (S33) ; et l'envoi de l'instruction d'état synchrone au premier processeur (S34). Le procédé permet d'améliorer l'efficacité de synchronisation de données.
PCT/CN2020/111270 2019-08-09 2020-08-26 Procédé et appareil de synchronisation de données, et produit associé WO2021027972A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910735425.X 2019-08-09
CN201910735425.XA CN112347186B (zh) 2019-08-09 2019-08-09 数据同步方法及装置以及相关产品

Publications (1)

Publication Number Publication Date
WO2021027972A1 true WO2021027972A1 (fr) 2021-02-18

Family

ID=74366935

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/111270 WO2021027972A1 (fr) 2019-08-09 2020-08-26 Procédé et appareil de synchronisation de données, et produit associé

Country Status (2)

Country Link
CN (1) CN112347186B (fr)
WO (1) WO2021027972A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799852A (zh) * 2021-04-12 2021-05-14 北京一流科技有限公司 逻辑节点的多维sbp分布式签名决策系统及其方法
CN114706813A (zh) * 2022-05-05 2022-07-05 上海壁仞智能科技有限公司 多核异构片上系统、非对称同步方法、计算设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7620666B1 (en) * 2002-07-29 2009-11-17 Symantec Operating Company Maintaining persistent data change maps for fast data synchronization and restoration
CN101950282A (zh) * 2010-08-30 2011-01-19 中国科学院计算技术研究所 一种多处理器系统及其同步引擎
CN104967658A (zh) * 2015-05-08 2015-10-07 成都品果科技有限公司 一种多终端设备上的数据同步方法
CN105159795A (zh) * 2015-08-21 2015-12-16 小米科技有限责任公司 数据同步方法、装置和系统
CN106453511A (zh) * 2016-09-14 2017-02-22 广东欧珀移动通信有限公司 一种数据备份方法及设备

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10678619B2 (en) * 2011-07-27 2020-06-09 Pure Storage, Inc. Unified logs and device statistics
US9785565B2 (en) * 2014-06-30 2017-10-10 Microunity Systems Engineering, Inc. System and methods for expandably wide processor instructions
US9977619B2 (en) * 2015-11-06 2018-05-22 Vivante Corporation Transfer descriptor for memory access commands
CN107103004B (zh) * 2016-02-23 2020-11-06 创新先进技术有限公司 网页中的数据处理方法、装置及系统
US9959498B1 (en) * 2016-10-27 2018-05-01 Google Llc Neural network instruction set architecture
CN109685201B (zh) * 2018-12-14 2020-10-30 安徽寒武纪信息科技有限公司 运算方法、装置及相关产品
CN109766296A (zh) * 2019-01-08 2019-05-17 郑州云海信息技术有限公司 一种数据处理方法、装置、系统和dma控制器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7620666B1 (en) * 2002-07-29 2009-11-17 Symantec Operating Company Maintaining persistent data change maps for fast data synchronization and restoration
CN101950282A (zh) * 2010-08-30 2011-01-19 中国科学院计算技术研究所 一种多处理器系统及其同步引擎
CN104967658A (zh) * 2015-05-08 2015-10-07 成都品果科技有限公司 一种多终端设备上的数据同步方法
CN105159795A (zh) * 2015-08-21 2015-12-16 小米科技有限责任公司 数据同步方法、装置和系统
CN106453511A (zh) * 2016-09-14 2017-02-22 广东欧珀移动通信有限公司 一种数据备份方法及设备

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112799852A (zh) * 2021-04-12 2021-05-14 北京一流科技有限公司 逻辑节点的多维sbp分布式签名决策系统及其方法
CN112799852B (zh) * 2021-04-12 2021-07-30 北京一流科技有限公司 逻辑节点的多维sbp分布式签名决策系统及其方法
CN114706813A (zh) * 2022-05-05 2022-07-05 上海壁仞智能科技有限公司 多核异构片上系统、非对称同步方法、计算设备和介质

Also Published As

Publication number Publication date
CN112347186B (zh) 2023-02-28
CN112347186A (zh) 2021-02-09

Similar Documents

Publication Publication Date Title
CN110096310B (zh) 运算方法、装置、计算机设备和存储介质
CN110119807B (zh) 运算方法、装置、计算机设备和存储介质
WO2021027972A1 (fr) Procédé et appareil de synchronisation de données, et produit associé
EP3825842B1 (fr) Procédé et appareil de traitement de données et produit associé
US11687339B2 (en) Data processing method and apparatus, and related product
US20240111536A1 (en) Data processing apparatus and related products
US20240004650A1 (en) Data processing method and apparatus, and related product
WO2021027973A1 (fr) Procédé et dispositif de synchronisation de données et produits apparentés
WO2021018313A1 (fr) Procédé et appareil de synchronisation de données, et produit associé
WO2021223642A1 (fr) Procédé et appareil de traitement de données, et produit associé
CN111047005A (zh) 运算方法、装置、计算机设备和存储介质
CN111813449A (zh) 运算方法、装置及相关产品
EP4141685A1 (fr) Procédé et dispositif pour construire une structure de topologie de communication sur la base de multiples noeuds de traitement
WO2021082723A1 (fr) Appareil d'execution
CN112347026B (zh) 数据同步方法及装置以及相关产品
WO2021223645A1 (fr) Procédé et appareil de traitement de données, et produit associé
WO2021223638A1 (fr) Procédé et dispositif de traitement de données, et produit associé
CN112395008A (zh) 运算方法、装置、计算机设备和存储介质
CN112347185A (zh) 数据同步方法及装置以及相关产品
WO2021223644A1 (fr) Procédé et dispositif de traitement de données, et produit associé
CN112395002B (zh) 运算方法、装置、计算机设备和存储介质
WO2021037083A1 (fr) Procédé et appareil de traitement de données, et produit associé
CN111831722A (zh) 数据同步方法及装置以及相关产品
CN111124497A (zh) 运算方法、装置、计算机设备和存储介质
CN111062483A (zh) 运算方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20852830

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20852830

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20852830

Country of ref document: EP

Kind code of ref document: A1