CN112347186A

CN112347186A - Data synchronization method and device and related product

Info

Publication number: CN112347186A
Application number: CN201910735425.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Anhui Cambricon Information Technology Co Ltd
Current assignee: Anhui Cambricon Information Technology Co Ltd
Priority date: 2019-08-09
Filing date: 2019-08-09
Publication date: 2021-02-09
Anticipated expiration: 2039-08-09
Also published as: CN112347186B; WO2021027972A1

Abstract

The present disclosure relates to a data synchronization method and apparatus and related products, the products including a control module, the control module including: the device comprises an instruction cache unit, an instruction processing unit and a storage queue unit; the instruction cache unit is used for storing the calculation instruction associated with the artificial neural network operation; the instruction processing unit is used for analyzing the calculation instruction to obtain a plurality of operation instructions; the storage queue unit is configured to store an instruction queue, where the instruction queue includes: and a plurality of operation instructions or calculation instructions to be executed according to the front and back sequence of the queue. Through the method, the operation efficiency of the related product in the operation of the neural network model can be improved.

Description

Data synchronization method and device and related product

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a data synchronization method and apparatus, and a related product.

Background

With the continuous development of artificial intelligence technology, the application field of the artificial intelligence technology is more and more extensive, and the artificial intelligence technology is well applied to the fields of image recognition, voice recognition, natural language processing and the like. However, as the complexity of artificial intelligence algorithms increases, the amount of data and the data dimension that needs to be processed are increasing, and multiple cores and/or multiple chips are often required for data processing. When data synchronization between cores or between chips is performed, the synchronization method adopting the related technology has high synchronization overhead and low processing efficiency.

Disclosure of Invention

In view of this, the present disclosure provides a data synchronization method.

According to an aspect of the present disclosure, there is provided a data synchronization method applied to a first processor, including: generating a state query instruction according to a descriptor of tensor data to be synchronized, wherein the descriptor is used for indicating the shape of the tensor data to be synchronized, the state query instruction is used for indicating a second processor to determine the synchronizable data amount aiming at the tensor data and generating a synchronization state instruction, and the state query instruction comprises the identifier of the descriptor and/or the content of the descriptor; and sending the state inquiry instruction to a second processor.

According to another aspect of the present disclosure, there is provided a data synchronization method applied to a second processor, including: determining a descriptor of tensor data to be synchronized when a state query instruction from a first processor is received, wherein the descriptor is used for indicating the shape of the tensor data to be synchronized; determining, from descriptors of the tensor data, an amount of synchronizable data for the tensor data; generating a synchronization state instruction according to the descriptor of the tensor data and the synchronizable data volume, wherein the synchronization state instruction is used for instructing the first processor to determine first sub-data of the tensor data, and the data volume of the first sub-data corresponds to the synchronizable data volume; sending the synchronization state instruction to the first processor.

According to another aspect of the present disclosure, there is provided a data synchronization apparatus applied to a first processor, including: the query instruction generation module is used for generating a state query instruction according to a descriptor of tensor data to be synchronized, wherein the descriptor is used for indicating the shape of the tensor data to be synchronized, the state query instruction is used for indicating a second processor to determine the synchronizable data amount aiming at the tensor data and generating a synchronization state instruction, and the state query instruction comprises the identifier of the descriptor and/or the content of the descriptor; and the query instruction sending module is used for sending the state query instruction to the second processor.

According to another aspect of the present disclosure, there is provided a data synchronization apparatus applied to a second processor, including: the query instruction receiving module is used for determining a descriptor of tensor data to be synchronized when a state query instruction from the first processor is received, and the descriptor is used for indicating the shape of the tensor data to be synchronized; a data volume determination module for determining a synchronizable data volume for the tensor data according to the descriptor of the tensor data; a state instruction generating module, configured to generate a synchronous state instruction according to the descriptor of the tensor data and the synchronizable data volume, where the synchronous state instruction is used to instruct the first processor to determine first sub-data of the tensor data, and a data volume of the first sub-data corresponds to the synchronizable data volume; and the state instruction sending module is used for sending the synchronous state instruction to the first processor.

According to another aspect of the present disclosure, there is provided an artificial intelligence chip comprising the data synchronization apparatus as described above.

According to another aspect of the present disclosure, there is provided an electronic device including the artificial intelligence chip as described above.

According to another aspect of the present disclosure, a board card is provided, which includes: a memory device, an interface device and a control device and an artificial intelligence chip as described above; wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment; and the control device is used for monitoring the state of the artificial intelligence chip.

According to the embodiment of the disclosure, by setting the descriptor indicating the shape of the tensor data, the sender of the data synchronization actively queries the state of the receiver according to the descriptor so as to realize partial data synchronization between the sender and the receiver, thereby reducing the synchronization overhead and improving the efficiency of the data synchronization.

Through deducing technical characteristics in the claims, the beneficial effects corresponding to the technical problems in the background art can be achieved. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a schematic diagram of a processing system of a data synchronization method according to an embodiment of the present disclosure.

Fig. 2 shows a flow diagram of a data synchronization method according to an embodiment of the present disclosure.

Fig. 3 shows a flow diagram of a data synchronization method according to an embodiment of the present disclosure.

Fig. 4 illustrates a schematic diagram of a data storage space of a data synchronization method according to an embodiment of the present disclosure.

Fig. 5 shows a block diagram of a data synchronization apparatus according to an embodiment of the present disclosure.

Fig. 6 shows a block diagram of a data synchronization apparatus according to an embodiment of the present disclosure.

Fig. 7 shows a block diagram of a board card according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It should be understood that the terms "first," "second," and "third," etc. in the claims, description, and drawings of the present disclosure are used for distinguishing between different objects and not for describing a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

The data synchronization method according to the embodiment of the present disclosure may be applied to any one processor of a processing system (e.g., an artificial intelligence chip) including a plurality of processors (multi-core). The processor may be a general purpose processor, such as a Central Processing Unit (CPU), or an artificial Intelligence Processor (IPU) for performing artificial intelligence operations. The artificial intelligence operations may include machine learning operations, brain-like operations, and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor may include, for example, one or a combination of a GPU (Graphics Processing Unit), a NPU (Neural-Network Processing Unit), a DSP (Digital Signal Processing Unit), and a Field Programmable Gate Array (FPGA) chip. The present disclosure is not limited to a particular type of processor. Further, the type of processors in the processing system may be the same or different, and the disclosure is not limited thereto.

In one possible implementation, the processor referred to in this disclosure may include multiple processing units, each of which may independently run various tasks assigned thereto, such as: a convolution operation task, a pooling task, a full connection task, or the like. The present disclosure is not limited to processing units and tasks executed by processing units.

Fig. 1 shows a schematic diagram of a processing system of a data synchronization method according to an embodiment of the present disclosure. As shown in FIG. 1, processing system 100 includes a plurality of processors 101 for executing sequences of instructions, and a Memory 102 for storing data, which may include a Random Access Memory (RAM) and a register file. Multiple processors 101 in the processing system 100 may share part of the memory space, such as part of the RAM memory space and the register file, or may have separate memory spaces at the same time.

Fig. 2 shows a flow diagram of a data synchronization method according to an embodiment of the present disclosure. As shown in fig. 2, the method is applied to a first processor (any one of processors in a processing system), and the method includes:

in step S11: generating a state query instruction according to a descriptor of tensor data to be synchronized, wherein the descriptor is used for indicating the shape of the tensor data to be synchronized, the state query instruction is used for indicating a second processor to determine the synchronizable data amount aiming at the tensor data and generating a synchronization state instruction, and the state query instruction comprises the identifier of the descriptor and/or the content of the descriptor;

in step S12: and sending the state inquiry instruction to a second processor.

For example, the data to be synchronized may include N-dimensional tensor data (N is an integer greater than or equal to zero, e.g., N ═ 1, 2, or 3), where the tensor may include multiple forms of data composition, the tensor may be of different dimensions, e.g., a scalar may be regarded as a 0-dimensional tensor, a vector may be regarded as a 1-dimensional tensor, and a matrix may be a 2-dimensional or higher-than-2-dimensional tensor. The shape of the tensor includes information such as the dimensions of the tensor, the sizes of the dimensions of the tensor, and the like. For example for tensors:

the shape of the tensor can be described by a descriptor as (2, 4), i.e. the tensor is represented by two parameters as a two-dimensional tensor, with the size of the first dimension (column) of the tensor being 2 and the size of the second dimension (row) being 4. It should be noted that the present disclosure is not limited to the way the descriptor indicates the tensor shape. When storing tensor data in a memory, the shape of the tensor data cannot be determined according to the data address (or the storage area) of the tensor data, and further, related information such as the interrelation among a plurality of tensor data cannot be determined, which results in low access efficiency of the processor to the tensor data and high complexity in data synchronization.

In this case, a descriptor (or referred to as a tensor descriptor) may be set to indicate the shape of the tensor data (i.e., the N-dimensional tensor data). The value of N may be determined according to the dimension (order) of the tensor data, or may be set according to the usage requirement of the tensor data. For example, when the value of N is 3, the tensor data is three-dimensional tensor data, and the descriptor may be used to indicate the shape (e.g., offset, size, etc.) of the three-dimensional tensor data in three dimensional directions. It should be understood that the value of N can be set by those skilled in the art according to actual needs, and the disclosure does not limit this.

In one possible implementation, the descriptor may include an identifier and content, etc., and the identifier of the descriptor may be used to distinguish the descriptor, such as a number; the content of the descriptor may include at least one shape parameter (e.g., a size in each dimension direction of the tensor, etc.) representing the shape of the tensor data, and may further include at least one address parameter (e.g., a reference address of the data reference point) representing the address of the tensor data. The present disclosure does not limit the specific parameters included in the content of the descriptor.

By adopting the mode of indicating the tensor data by the descriptors, the shape of the tensor data can be expressed, and further, relevant information such as the interrelation among a plurality of tensor data can be determined, so that the access efficiency of the tensor data is improved, and the complexity of data synchronization is reduced.

In one possible implementation, during data processing, data synchronization between multiple processors (e.g., multiple cores of an artificial intelligence chip) may be required, for example, to synchronize the operation result of processor a1 into processor a2 as input data of another operation. In this case, a descriptor-based data synchronization mechanism may be employed to achieve data synchronization.

In one possible implementation, the space in which the unshared memory space of each processor can be allocated to the tensor data to be synchronized may be limited, and the overall synchronization of the tensor data cannot be achieved. In this case, partial synchronization of the tensor data is performed, and the entire tensor data can be synchronized by the partial synchronization a plurality of times.

In one possible implementation, a first processor of the plurality of processors may be set as a sender of data synchronization, and a second processor may be set as a receiver of data synchronization. The first processor and the second processor are any of a plurality of processors, the second processor may be of the same type as the first processor or of a different type, and the disclosure does not limit the types of the first processor and the second processor.

In one possible implementation, when the sender of the data synchronization obtains the tensor data to be synchronized, for example, when the first processor completes an operation to obtain an operation result (tensor data), the sender may query the state of the receiver, and determine the amount of data that can be accommodated in the space where the non-shared storage space of the receiver of the data synchronization can be allocated to the tensor data, so as to perform the partial synchronization of the tensor data. It may be set that a first processor of the plurality of processors is a sender of data synchronization and a second processor is a receiver of data synchronization. The first processor and the second processor are any of a plurality of processors, the second processor may be of the same type as the first processor or of a different type, and the disclosure does not limit the types of the first processor and the second processor.

In one possible implementation, the first processor may generate a state query instruction according to the descriptor of the tensor data to be synchronized in step S11. The state query instruction may include an identification of the descriptor of the tensor data to be synchronized and/or the contents of the descriptor, for instructing the second processor to determine and reply to its state (i.e., the amount of synchronizable data for the tensor data).

In one possible implementation, the first processor may send a status query instruction to the second processor in step S12. The second processor, upon receiving the status query instruction, may parse the instruction to determine the identity of the descriptor and/or the content of the descriptor. According to the identifier of the descriptor and/or the content of the descriptor, the second processor can determine tensor data to be synchronized, further determine a space capable of being allocated to the tensor data, and determine the synchronizable data amount of the tensor data. According to the synchronizable data volume and the descriptor aiming at the tensor data, the second processor can generate and send a synchronization state instruction, so that the first processor can determine the descriptor of the tensor data to be synchronized and the synchronizable data volume of the current synchronization.

In this way, the sender of the data synchronization can actively inquire the state of the receiver so as to realize the partial data synchronization between the sender and the receiver, thereby improving the efficiency of the data synchronization.

In one possible implementation, the method further includes:

when a synchronization state instruction from the second processor is received, determining first sub-data of tensor data according to descriptors of the tensor data in the synchronization state instruction and a synchronizable data volume, wherein the data volume of the first sub-data corresponds to the synchronizable data volume;

and generating a descriptor synchronization instruction according to the first subdata and sending the descriptor synchronization instruction to the second processor so as to instruct the second processor to acquire the first subdata.

For example, upon receiving a synchronization status instruction from the second processor, the first processor may parse the instruction to obtain the contents of the instruction (e.g., the descriptor's identification, the amount of synchronizable data, etc.). According to the identifier of the descriptor, the descriptor of the tensor data to be synchronized can be determined, and therefore the tensor data to be synchronized can be determined; and determining part of data which can be synchronized at this time, namely the first subdata, from the tensor data according to the synchronizable data volume. The data amount of the first sub data may correspond to the synchronizable data amount, for example, the data amount of the first sub data is less than or equal to the synchronizable data amount.

In one possible implementation manner, if all data of the tensor data are not synchronized, data with a data synchronizable amount can be selected from the tensor data to serve as first sub data; if the partial data of the tensor data are not synchronized and the data volume of the unsynchronized partial data is larger than the synchronizable data volume, selecting the data with the synchronizable data volume from the unsynchronized partial data (namely, the second sub-data of the tensor data) as the first sub-data; if the data amount of the unsynchronized partial data is less than or equal to the synchronizable data amount, the unsynchronized partial data can be directly used as the first sub-data, and it should be understood that a person skilled in the art can determine the first sub-data according to practical situations, and the disclosure is not limited thereto.

In a possible implementation manner, the synchronization state instruction may also include a range of partial data of the tensor data to be synchronized, for example, a descriptor content or a storage address range of the partial sub-data, so as to specify to acquire the partial data to be synchronized. The first processor may directly determine the first sub-data to be synchronized according to the range of the portion of data.

In one possible implementation, the first processor may generate a descriptor synchronization instruction from the first sub-data and send the descriptor synchronization instruction to the second processor. The instruction may include an identification of a descriptor of tensor data to be synchronized and the first child data. After receiving the descriptor synchronization instruction, the second processor may parse the instruction to determine a descriptor of the tensor data to be synchronized and first sub-data of the tensor data, determine the tensor data to be synchronized according to the descriptor, and store the first sub-data of the tensor data in its own non-shared storage space.

By the method, tensor data can be determined according to the descriptors in the synchronous state instruction, the subdata of the current synchronization is determined according to the synchronizable data volume of the receiving party, and the descriptor synchronous instruction is generated and sent according to the subdata, so that the receiving party can acquire the subdata of the current synchronization, the synchronization overhead is reduced, and the data synchronization efficiency is improved.

In one possible implementation, the synchronization status instruction includes an identification of a descriptor. When receiving a synchronization state instruction from the second processor, the step of determining the first sub-data of the tensor data according to the descriptor of the tensor data in the synchronization state instruction and the synchronizable data amount may include:

analyzing the synchronous state instruction to obtain the identifier of the descriptor and the synchronizable data volume;

and determining the descriptor of the tensor data to be synchronized according to the identifier of the descriptor.

For example, the identification of the descriptor (e.g., identified as TR1) and the amount of synchronizable data may be included in the synchronization status instruction. The first processor can analyze the synchronous state instruction to obtain the identifier of the descriptor and the synchronous data volume; and then according to the identifier of the descriptor, determining the descriptor of the tensor data to be synchronized.

By the method, the data volume transmitted during synchronization can be reduced, and the processing efficiency is improved.

In a possible implementation manner, when receiving a synchronization state instruction from the second processor, the step of determining the first sub-data of the tensor data according to the descriptor of the tensor data in the synchronization state instruction and the synchronizable data amount may include:

according to the descriptor of the tensor data, determining the tensor data and second subdata in a state to be synchronized in the tensor data;

and determining the first subdata according to the second subdata and the synchronizable data volume in the synchronous state instruction.

For example, the state of the data in the tensor data may be set, the synchronized partial data set as synchronized, and the unsynchronized partial data set to be synchronized. In this case, when the first processor receives a synchronization state instruction from the second processor, tensor data to be synchronized may be determined from the descriptors; according to the state of the data in the tensor data, second subdata in a state to be synchronized can be determined; and determining the synchronized first subdata according to the second subdata and the synchronizable data volume indicated by the synchronization state instruction.

In a possible implementation manner, if the data volume of the second sub-data is greater than the synchronizable data volume, the synchronized first sub-data of this time can be selected from the second sub-data; and if the data volume of the second sub-data is less than or equal to the synchronizable data volume, directly using the second sub-data as the first sub-data.

By the method, partial data of the current synchronization can be determined, so that partial synchronization of tensor data is realized, and the efficiency of data synchronization is improved.

In one possible implementation, the method further includes: and changing the state of the first sub data of the tensor data from a state to be synchronized to a synchronized state.

For example, after the first processor generates and sends the descriptor synchronization instruction according to the first sub-data of the tensor data, so that the second processor realizes synchronization of the first sub-data of the tensor data, the first processor may change the state of the data in the tensor data, that is, change the state of the first sub-data from the state to be synchronized to the synchronized state. Therefore, when the state of the second processor is inquired next time and the synchronous state instruction of the second processor is received, the next synchronous data can be determined from the partial data in the state to be synchronized, so that the repeated synchronization of the data is avoided, and the data synchronization efficiency is improved.

Fig. 3 shows a flow diagram of a data synchronization method according to an embodiment of the present disclosure. As shown in fig. 3, the method is applied to a second processor, and the method includes:

in step S31, upon receiving a state query instruction from the first processor, a descriptor of tensor data to be synchronized is determined, the descriptor indicating a shape of the tensor data to be synchronized;

in step S32, determining a synchronizable data amount for the tensor data based on the descriptor of the tensor data;

in step S33, a synchronization state instruction is generated according to the descriptor of the tensor data and the synchronizable data volume, where the synchronization state instruction is used to instruct the first processor to determine first sub-data of the tensor data, and a data volume of the first sub-data corresponds to the synchronizable data volume;

in step S34, the synchronization status instruction is sent to the first processor.

For example, when the sender of the data synchronization has tensor data to be synchronized, the sender can also inquire the state of the receiver. The first processor (sender) may generate and send a status query instruction, and the second processor, upon receiving the status query instruction in step S31, may parse the instruction to determine the descriptor of the tensor data to be synchronized.

In one possible implementation, in step S32, the second processor may determine the tensor data to be synchronized according to the descriptor, and determine the data amount that can be accommodated by the space that can be allocated to the tensor data by its own non-shared storage space, that is, the synchronized data amount, so as to perform the partial synchronization of the tensor data.

In one possible implementation manner, in step S33, the second processor may generate and send a synchronization state instruction to the first processor according to the determined synchronizable data amount and the descriptor of the tensor data, so as to instruct the first processor to determine the descriptor of the tensor data to be synchronized and the synchronizable data amount of this synchronization. After determining the partial data (i.e., the first sub-data) that can be synchronized this time, the first processor may generate a descriptor synchronization instruction and send the descriptor synchronization instruction to the second processor in step S34. The instruction may include an identification of a descriptor of tensor data to be synchronized and the first child data.

By the mode, the state of the receiver can be inquired by the sender, the receiver determines and replies the state (namely the synchronous data volume) of the receiver after receiving the state inquiry instruction, partial synchronization of tensor data is realized through interaction, and the data synchronization efficiency is improved.

In one possible implementation, the method further includes:

when a descriptor synchronization instruction from the first processor is received, determining a descriptor of tensor data to be synchronized and first sub-data of the tensor data;

and storing first subdata of the tensor data according to the descriptor of the tensor data.

For example, when receiving a descriptor synchronization instruction, the second processor may parse the instruction to determine a descriptor of tensor data to be synchronized and first sub-data of the tensor data synchronized this time; and then, tensor data to be synchronized are determined according to the descriptors, and the first sub data of the tensor data are stored in the non-shared storage space of the tensor data.

In this way, the receiver can determine the descriptor according to the descriptor synchronization instruction and obtain the subdata of the synchronization, thereby reducing the synchronization overhead and improving the efficiency of data synchronization.

In a possible implementation manner, a receiver of data synchronization may initiate a partial synchronization request for tensor data, that is, the receiver sends out a descriptor synchronization request instruction, where the descriptor synchronization request instruction may indicate a descriptor of the tensor data to be synchronized and a synchronizable data amount for the tensor data, that is, an amount of data that can be accommodated by a space where an unshared storage space of the receiver can be allocated to the tensor data.

In a possible implementation manner, there is also provided a data synchronization method applied to a first processor, including: upon receiving a descriptor synchronization request instruction from the second processor, determining a descriptor of tensor data to be synchronized and a synchronizable data amount for the tensor data, wherein the descriptor is used for indicating a shape of the tensor data to be synchronized;

determining first sub-data of the tensor data according to the descriptors of the tensor data and the synchronizable data volume, wherein the data volume of the first sub-data corresponds to the synchronizable data volume;

In one possible implementation, when receiving a descriptor synchronization request instruction from the second processor, the first processor may parse the instruction to obtain the content of the instruction (e.g., an identifier of a descriptor of tensor data to be synchronized, a data characteristic of the tensor data to be synchronized, an amount of synchronizable data, and the like), so as to determine the descriptor of the tensor data to be synchronized and the amount of synchronizable data.

In a possible implementation manner, the first processor may determine tensor data to be synchronized according to the descriptor, and determine partial data that can be synchronized this time, that is, the first sub-data, from the tensor data according to the synchronizable data amount. The data amount of the first sub data may correspond to the synchronizable data amount, for example, the data amount of the first sub data is less than or equal to the synchronizable data amount.

According to the data synchronization method of the embodiment of the disclosure, by setting the descriptor indicating the shape of the tensor data, the tensor data can be determined according to the descriptor in the descriptor synchronization request instruction, the sub-data of the current synchronization is determined according to the synchronizable data amount of the receiving party, and the descriptor synchronization instruction is generated and sent according to the sub-data, so that the receiving party obtains the sub-data of the current synchronization, thereby reducing the synchronization overhead and improving the efficiency of the data synchronization.

In one possible implementation, the descriptor synchronization request instruction may include an identification of a descriptor, and determining, upon receiving the descriptor synchronization request instruction from the second processor, a descriptor of tensor data to be synchronized and a synchronizable data amount for the tensor data includes: analyzing the descriptor synchronization request instruction to obtain the identifier of the descriptor and the synchronizable data volume;

For example, if a descriptor indicating the tensor data to be synchronized has been registered in both the first processor and the second processor, the descriptor synchronization instruction may include only an identification of the descriptor (e.g., representing the descriptor synchronization instruction as Send TR1 when the identification of the descriptor is TR1) and an amount of data that can be synchronized. The first processor may parse the descriptor synchronization request instruction to obtain an identification of the descriptor and an amount of synchronizable data; and then according to the identifier of the descriptor, determining the descriptor of the tensor data to be synchronized.

In one possible implementation, the descriptor synchronization request instruction includes data characteristics of tensor data to be synchronized, and determining a descriptor of the tensor data to be synchronized and a synchronizable data amount for the tensor data when the descriptor synchronization request instruction is received from the second processor includes:

analyzing the descriptor synchronization request instruction to obtain the data characteristics and the synchronizable data volume of tensor data to be synchronized;

and determining descriptors of the tensor data according to the data characteristics of the tensor data.

For example, if no descriptor indicating the tensor data to be synchronized is registered in the first processor or the identifier of the descriptor does not correspond, the descriptor synchronization instruction may include a data feature of the tensor data to be synchronized. The data characteristics may include identification, shape, source, address, etc. information of the tensor data. For example, the data source of the tensor data is the kth sender (the kth processor), the data source of the tensor data is the operation result of the convolution operation of number 200, the address of the tensor data is a specific address region (for example, addresses ADDR0-ADDR127), the shape of the tensor data is a predetermined shape (for example, a two-dimensional tensor of 20 × 10), and the like. The data characteristics of the tensor data to be synchronized can be set by one skilled in the art according to practical situations, and the present disclosure is not limited thereto.

In a possible implementation manner, according to the data feature, the first processor may find the tensor data to be synchronized and determine a descriptor of the tensor data to be synchronized, for example, directly obtain or newly register the corresponding descriptor. And determining the tensor data according to the descriptor of the tensor data to be synchronized, and further determining the sub-data synchronized this time according to the synchronizable data volume.

By the method, the descriptor of the tensor data to be synchronized can be determined according to the data characteristics in the descriptor synchronization request instruction, so that partial synchronization of the tensor data is realized, the tensor data do not need to be transmitted during synchronization, the transmitted data volume and the synchronization overhead are reduced, and the processing efficiency is improved.

In one possible implementation manner, determining the first sub-data of the tensor data according to the descriptor of the tensor data and the synchronizable data amount includes:

and determining the first subdata according to the second subdata and the synchronizable data volume.

For example, the state of the data in the tensor data may be set, the synchronized partial data set as synchronized, and the unsynchronized partial data set to be synchronized. In this case, when the first processor receives a descriptor synchronization request instruction from the second processor, tensor data to be synchronized may be determined from the descriptors; according to the state of the data in the tensor data, second subdata in a state to be synchronized can be determined; and determining the synchronized first sub-data according to the second sub-data and the synchronizable data volume indicated by the descriptor synchronization request instruction.

For example, after the first processor generates and sends the descriptor synchronization instruction according to the first sub-data of the tensor data, so that the second processor realizes synchronization of the first sub-data of the tensor data, the first processor may change the state of the data in the tensor data, that is, change the state of the first sub-data from the state to be synchronized to the synchronized state. Therefore, when a synchronization request of the second processor is received next time, the data synchronized next time can be determined from the partial data in the state to be synchronized, so that the repeated synchronization of the data is avoided, and the efficiency of data synchronization is improved.

In a possible implementation manner, there is also provided a data synchronization method applied to a second processor, including: generating a descriptor synchronization request instruction according to a descriptor of tensor data to be synchronized and a synchronizable data volume of the tensor data, wherein the descriptor is used for indicating the shape of the tensor data to be synchronized, and the descriptor synchronization request instruction is used for indicating a first processor to determine the descriptor of the tensor data to be synchronized and first sub-data of the tensor data according to the descriptor synchronization request instruction, and the data volume of the first sub-data corresponds to the synchronizable data volume; sending the descriptor synchronization request instruction to the first processor.

For example, a second processor of the plurality of processors may be set to be the recipient of the data synchronization, the second processor initiating a partial synchronization request for the tensor data. In step S31, when there is tensor data to be synchronized in the second processor, the descriptor of the tensor data and the amount of data that can be accommodated by the space that the second processor' S own unshared storage space can allocate to the tensor data, that is, the synchronizable data amount, can be determined. Based on the descriptors of the tensor data and the synchronizable data amount, the second processor may generate a descriptor synchronization request instruction and send the instruction in step S32. The descriptor synchronization request instruction may include at least one of an identifier of a descriptor, a content of the descriptor, and a data characteristic of tensor data, for instructing the first processor to determine the descriptor of the tensor data to be synchronized and the first sub-data of the tensor data according to the instruction.

In a possible implementation manner, when receiving a descriptor synchronization request instruction, a first processor may analyze the instruction to determine a descriptor of tensor data to be synchronized and a synchronizable data amount; tensor data to be synchronized are determined according to the descriptors, and partial data which can be synchronized at this time, namely the first subdata, are determined from the tensor data according to the data volume which can be synchronized. The data amount of the first sub data may correspond to the synchronizable data amount, for example, the data amount of the first sub data is less than or equal to the synchronizable data amount.

In a possible implementation manner, the descriptor synchronization request instruction may also include a range of partial data of the tensor data to be synchronized, for example, a descriptor content or a storage address range of the partial sub-data, so as to specify to acquire the partial data to be synchronized.

By the method, the receiver can initiate partial synchronous request of tensor data, so that the sender determines the sub data of the synchronization, and the efficiency of data synchronization is improved.

In one possible implementation, the method further includes:

For example, the first processor may generate and send a descriptor synchronization instruction from the descriptors of the tensor data and the first sub-data. When receiving the descriptor synchronization instruction, the second processor can analyze the instruction to determine the descriptor of the tensor data to be synchronized and the first subdata of the tensor data synchronized this time; and then, tensor data to be synchronized are determined according to the descriptors, and the first sub data of the tensor data are stored in the non-shared storage space of the tensor data.

In one possible implementation, the identity and content of the descriptor may be stored in a descriptor storage space, which may be a storage space in an internal memory of the processor (e.g., a register, an on-chip SRAM, or other media cache, etc.). The data storage space of the tensor data indicated by the descriptors may be a storage space in an internal memory of the processor (e.g., an on-chip cache) or an external memory connected to the processor (e.g., an off-chip memory). The data addresses in the data storage space may be actual physical addresses or virtual addresses. The present disclosure does not limit the location of the descriptor storage space and the data storage space and the type of data address.

In one possible implementation, the descriptor's identification, content, and tensor data indicated by the descriptor may be located in the same block, for example, a contiguous block of on-chip cache may be used to store the descriptor's associated content at addresses ADDR0-ADDR1023, where addresses ADDR0-ADDR31 may be used to store the descriptor's identification, addresses ADDR32-ADDR63 may be used to store the descriptor's content, and addresses ADDR64-ADDR1023 may be used to store the tensor data indicated by the descriptor. Here, the address ADDR is not limited to 1 bit or one byte, and is used herein to indicate one address, which is one address unit. The storage area and its address can be determined by those skilled in the art in practical situations, and the present disclosure is not limited thereto.

In one possible implementation, the identifier and content of the descriptor and the tensor data indicated by the descriptor may be separately stored in different areas of the internal memory, for example, a register may be used as a descriptor storage space, the identifier and content of the descriptor may be stored in the register, an on-chip cache may be used as a data storage space, and the tensor data indicated by the descriptor may be stored.

In a possible implementation, a Special Register (SR) dedicated to the descriptor may be provided, and the data in the descriptor may be an immediate number or may be obtained from the special register. When the register is used to store the identifier and the content of the descriptor, the identifier of the descriptor may be represented by using the number of the register, for example, when the number of the register is 0, the identifier of the descriptor stored therein is 0. When the descriptor in the register is valid, an area may be allocated in the buffer space according to the size of the tensor data indicated by the descriptor (for example, a tensor buffer unit is created in the buffer for each tensor data) for storing the tensor data. It should be understood that the tensor data may also be stored in a preset buffer space, which is not limited by the present disclosure.

In one possible implementation, the identity and content of the descriptors may be stored in an internal memory and the tensor data indicated by the descriptors may be stored in an external memory. For example, the identification and content of the descriptors may be stored on-chip, and the tensor data indicated by the descriptors may be stored under-chip.

In one possible implementation, the data address of the data storage space corresponding to the descriptor may be a fixed address. For example, separate data storage spaces may be divided for tensor data, each of which has a one-to-one correspondence with the descriptor at the start address of the data storage space. In this case, the processor can determine the data address of the tensor data according to the content of the descriptor.

In one possible implementation, when the data address of the data storage space corresponding to the descriptor is a variable address, the descriptor may be further used to indicate an address of tensor data of the N dimension, where the content of the descriptor may further include at least one address parameter indicating the address of the tensor data. For example, the tensor data is 3-dimensional data, when the descriptor points to an address of the tensor data, the content of the descriptor may include one address parameter indicating the address of the tensor data, such as a start address of the tensor data, or may include a plurality of address parameters of the address of the tensor data, such as a start address of the tensor data + an address offset, or the address parameters of the tensor data based on each dimension. The address parameters can be set by those skilled in the art according to actual needs, and the disclosure does not limit this.

In one possible implementation, the address parameter of the tensor data includes a reference address of a data reference point of the descriptor in a data storage space of the tensor data. Wherein the reference address may be different according to a variation of the data reference point. The present disclosure does not limit the selection of data reference points.

In one possible implementation, the base address may include a start address of the data storage space. When the data reference point of the descriptor is the first data block of the data storage space, the reference address of the descriptor is the start address of the data storage space. When the data reference point of the descriptor is data other than the first data block in the data storage space, the reference address of the descriptor is the physical address of the data block in the data storage space.

In one possible implementation, the shape parameters of the tensor data include at least one of: a size of a data storage space of the tensor data in at least one of N dimensional directions, a size of the storage region in at least one of N dimensional directions, an offset of the storage region in at least one of N dimensional directions, positions of at least two vertices at diagonal positions of the N dimensional directions with respect to the data reference point, and a mapping relationship between a data description position of tensor data indicated by the descriptor and a data address. Where the data description position is a mapping position of a point or a region in the tensor data indicated by the descriptor, for example, when the tensor data is 3-dimensional data, the descriptor may represent a shape of the tensor data using three-dimensional space coordinates (x, y, z), and the data description position of the tensor data may be a position of a point or a region in the three-dimensional space to which the tensor data is mapped, which is represented using three-dimensional space coordinates (x, y, z).

It should be understood that the shape parameters representing tensor data can be selected by one skilled in the art based on practical circumstances, and the present disclosure is not limited thereto.

Fig. 4 illustrates a schematic diagram of a data storage space of a data synchronization method according to an embodiment of the present disclosure. As shown in fig. 4, the data storage space 21 stores a two-dimensional data in a line-first manner, which can be represented by (X, Y) (where the X axis is horizontally right and the Y axis is vertically downward), the size in the X axis direction (the size of each line) is ori _ X (not shown in the figure), the size in the Y axis direction (the total number of lines) is ori _ Y (not shown in the figure), and the starting address PA _ start (the base address) of the data storage space 21 is the physical address of the first data block 22. The data block 23 is partial data in the data storage space 21, and its offset amount 25 in the X-axis direction is denoted as offset _ X, the offset amount 24 in the Y-axis direction is denoted as offset _ Y, the size in the X-axis direction is denoted as size _ X, and the size in the Y-axis direction is denoted as size _ Y.

In one possible implementation, when the data block 23 is defined by using a descriptor, a data reference point of the descriptor may use a first data block of the data storage space 21, and the reference address of the descriptor is a starting address PA _ start of the data storage space 21, and then the content of the descriptor of the data block 23 may be determined by combining a size ori _ X of the data storage space 21 in the X axis direction, a size ori _ Y of the data storage space 21 in the Y axis direction, an offset _ Y of the data block 23 in the Y axis direction, an offset _ X in the X axis direction, a size _ X in the X axis direction, and a size _ Y in the Y axis direction.

In one possible implementation, the content of the descriptor can be represented using the following formula (1):

it should be understood that, although the descriptor describes a two-dimensional space in the above example, the dimension of the content representation of the descriptor can be set by those skilled in the art according to the actual situation, and the disclosure does not limit this.

In one possible implementation, the content of the descriptor of the tensor data may be determined according to a reference address of a data reference point of the descriptor in the data storage space, and positions of at least two vertices located at diagonal positions in N dimensional directions relative to the data reference point.

For example, the content of the descriptor of the data block 23 in fig. 2 may be determined using the reference address PA _ base of the data reference point of the descriptor in the data storage space, and the positions of the two vertices of the angular position relative to the data reference point. First, a data reference point of the descriptor and its reference address PA _ base in the data storage space are determined, for example, one data (e.g., data with position (2, 2)) can be selected as the data reference point in the data storage space 21, and the physical address of the data in the data storage space is taken as the reference address PA _ base; then, the positions of at least two vertices of the diagonal positions of the data block 23 with respect to the data reference point are determined, for example, using the positions of the diagonal position vertices with respect to the data reference point in the top-left to bottom-right direction, where the relative position of the top-left vertex is (x _ min, y _ min) and the relative position of the bottom-right vertex is (x _ max, y _ max), and then the content of the descriptor of the data block 23 can be determined according to the reference address PA _ base, the relative position of the top-left vertex (x _ min, y _ min), and the relative position of the bottom-right vertex (x _ max, y _ max).

In one possible implementation, the content of the descriptor can be represented using the following equation (2):

it should be understood that although the above examples use two vertices of the upper left corner and the lower right corner to determine the content of the descriptor, those skilled in the art can set the specific vertex of the at least two vertices according to actual needs, and the disclosure is not limited thereto.

In one possible implementation manner, the content of the descriptor of the tensor data can be determined according to a reference address of the data reference point of the descriptor in the data storage space and a mapping relation between the data description position and the data address of the tensor data indicated by the descriptor. The mapping relationship between the data description position and the data address may be set according to actual needs, for example, when tensor data indicated by the descriptor is three-dimensional space data, the mapping relationship between the data description position and the data address may be defined by using a function f (x, y, z).

In one possible implementation, the content of the descriptor can be represented using the following equation (3):

it should be understood that, a person skilled in the art may set the mapping relationship between the data description location and the data address according to practical situations, and the disclosure does not limit this.

In the case where the content of the descriptor is expressed by equation (1), the data description position is set to (x) for any data point in the tensor data_q，y_q) Then the data address PA2 of the data point in the data storage space_(x,y)The following equation (4) may be used to determine:

PA2_(x,y)＝PA_start+(offset_y+y_q-1)*ori_x+(offset_x+x_q) (4)

in this way, the processor can calculate the data address of the tensor data indicated by the descriptor in the data storage space according to the content of the descriptor, and further execute corresponding processing (such as data operation, data synchronization and the like) according to the address, so that the complexity of data access can be reduced, and the processing efficiency of the processor can be improved.

According to the data synchronization method disclosed by the embodiment of the disclosure, partial synchronization of tensor data can be realized when the space of a receiving party of data synchronization is insufficient, and the whole tensor data is synchronized through multiple times of partial synchronization, so that the problems of failure or synchronization delay and the like of integral synchronization of the tensor data under the condition of insufficient space are avoided, and the efficiency of data synchronization is improved; and a descriptor indicating the shape of the tensor data is set, and the tensor data is determined according to the descriptor in the data synchronization process, so that the synchronization overhead is reduced, and the complexity of data access is reduced.

It is noted that while for simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that acts and modules referred to are not necessarily required by the disclosure.

It should be further noted that, although the steps in the flowchart are shown in sequence as indicated by the arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in the flowchart may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

Fig. 5 shows a block diagram of a data synchronization apparatus according to an embodiment of the present disclosure. The data synchronization apparatus is applied to a first processor, and as shown in fig. 5, the data synchronization apparatus includes:

a query instruction generating module 51, configured to generate a state query instruction according to a descriptor of tensor data to be synchronized, where the descriptor is used to indicate a shape of the tensor data to be synchronized, and the state query instruction is used to instruct a second processor to determine a synchronizable data amount for the tensor data and generate a synchronization state instruction, where the state query instruction includes an identifier of the descriptor and/or a content of the descriptor;

and the query instruction sending module 52 is configured to send the status query instruction to the second processor.

In one possible implementation, the apparatus further includes:

a sub-data determining module, configured to determine, when a synchronization state instruction from the second processor is received, first sub-data of tensor data according to a descriptor of the tensor data in the synchronization state instruction and a synchronizable data volume, where a data volume of the first sub-data corresponds to the synchronizable data volume;

and the synchronous instruction generating and sending module is used for generating a descriptor synchronous instruction according to the first subdata and sending the descriptor synchronous instruction to the second processor so as to instruct the second processor to acquire the first subdata.

In a possible implementation manner, the synchronization status instruction includes an identifier of a descriptor, where the sub-data determining module includes:

the analysis submodule is used for analyzing the synchronous state instruction to obtain the identifier of the descriptor and the synchronizable data volume;

and the descriptor determining submodule is used for determining the descriptor of the tensor data to be synchronized according to the identifier of the descriptor.

In a possible implementation manner, the sub-data determining module includes:

the first determining submodule is used for determining the tensor data and second subdata in a state to be synchronized in the tensor data according to the descriptors of the tensor data;

and the second determining submodule is used for determining the first subdata according to the second subdata and the synchronizable data volume in the synchronous state instruction.

In one possible implementation, the apparatus further includes:

and the state changing module is used for changing the state of the first subdata of the tensor data from a state to be synchronized to a synchronized state.

Fig. 6 shows a block diagram of a data synchronization apparatus according to an embodiment of the present disclosure. The data synchronization apparatus is applied to the second processor, and as shown in fig. 6, the data synchronization apparatus includes:

a query instruction receiving module 61, configured to determine a descriptor of tensor data to be synchronized when a state query instruction from the first processor is received, where the descriptor is used to indicate a shape of the tensor data to be synchronized;

a data amount determination module 62, configured to determine a synchronizable data amount for the tensor data according to the descriptor of the tensor data;

a state instruction generating module 63, configured to generate a synchronous state instruction according to the descriptor of the tensor data and the synchronizable data volume, where the synchronous state instruction is used to instruct the first processor to determine first sub-data of the tensor data, and a data volume of the first sub-data corresponds to the synchronizable data volume;

a status instruction sending module 64, configured to send the synchronization status instruction to the first processor.

In one possible implementation, the apparatus further includes:

a synchronous instruction receiving module, configured to determine descriptors of tensor data to be synchronized and first sub-data of the tensor data when receiving a descriptor synchronous instruction from the first processor;

and the data storage module is used for storing first subdata of the tensor data according to the descriptors of the tensor data.

It should be understood that the above-described apparatus embodiments are merely illustrative and that the apparatus of the present disclosure may be implemented in other ways. For example, the division of the units/modules in the above embodiments is only one logical function division, and there may be another division manner in actual implementation. For example, multiple units, modules, or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented.

In addition, unless otherwise specified, each functional unit/module in each embodiment of the present disclosure may be integrated into one unit/module, each unit/module may exist alone physically, or two or more units/modules may be integrated together. The integrated units/modules may be implemented in the form of hardware or software program modules.

If the integrated unit/module is implemented in hardware, the hardware may be digital circuits, analog circuits, etc. Physical implementations of hardware structures include, but are not limited to, transistors, memristors, and the like. The artificial intelligence processor may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, etc., unless otherwise specified. Unless otherwise specified, the Memory unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as resistive Random Access Memory rram (resistive Random Access Memory), Dynamic Random Access Memory dram (Dynamic Random Access Memory), Static Random Access Memory SRAM (Static Random-Access Memory), enhanced Dynamic Random Access Memory edram (enhanced Dynamic Random Access Memory), High-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid Memory cubic hmc (hybrid Memory cube), and so on.

The integrated units/modules, if implemented in the form of software program modules and sold or used as a stand-alone product, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a memory and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

In a possible implementation manner, an artificial intelligence chip is also disclosed, which comprises the data synchronization device.

In a possible implementation manner, a board card is further disclosed, which comprises a storage device, an interface device, a control device and the artificial intelligence chip; wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment; and the control device is used for monitoring the state of the artificial intelligence chip.

Fig. 7 shows a block diagram of a board according to an embodiment of the present disclosure, and referring to fig. 7, the board may include other kit components besides the chip 389, where the kit components include, but are not limited to: memory device 390, interface device 391 and control device 392;

the storage device 390 is connected to the artificial intelligence chip through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the artificial intelligence chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.

In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.

The interface device is electrically connected with the artificial intelligence chip. The interface device is used for realizing data transmission between the artificial intelligence chip and external equipment (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device may also be another interface, and the disclosure does not limit the specific expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the artificial intelligence chip is still transmitted back to the external device (e.g. server) by the interface device.

The control device is electrically connected with the artificial intelligence chip. The control device is used for monitoring the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device can be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). As the artificial intelligence chip can comprise a plurality of processing chips, a plurality of processing cores or a plurality of processing circuits, a plurality of loads can be driven. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the artificial intelligence chip.

In one possible implementation, an electronic device is disclosed that includes the artificial intelligence chip described above. The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments. The technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The foregoing may be better understood in light of the following clauses:

a1, a data synchronization method, applied to a first processor, comprising:

generating a state query instruction according to a descriptor of tensor data to be synchronized, wherein the descriptor is used for indicating the shape of the tensor data to be synchronized, the state query instruction is used for indicating a second processor to determine the synchronizable data amount aiming at the tensor data and generating a synchronization state instruction, and the state query instruction comprises the identifier of the descriptor and/or the content of the descriptor;

and sending the state inquiry instruction to a second processor.

A2, the method of claim 1, the method further comprising:

A3, the method of claim A2, the synchronization state instruction including an identification of a descriptor,

when a synchronization state instruction from the second processor is received, determining first sub-data of tensor data according to descriptors of the tensor data in the synchronization state instruction and a synchronizable data volume, including:

A4, the method according to claim a2 or A3, wherein the determining the first sub-data of the tensor data according to the descriptor of the tensor data and the synchronizable data volume in the synchronization state instruction when the synchronization state instruction is received from the second processor comprises:

A5, the method of any one of claims a2-a4, the method further comprising:

and changing the state of the first sub data of the tensor data from a state to be synchronized to a synchronized state.

A6, a data synchronization method, applied to a second processor, comprising:

determining a descriptor of tensor data to be synchronized when a state query instruction from a first processor is received, wherein the descriptor is used for indicating the shape of the tensor data to be synchronized;

determining, from descriptors of the tensor data, an amount of synchronizable data for the tensor data;

generating a synchronization state instruction according to the descriptor of the tensor data and the synchronizable data volume, wherein the synchronization state instruction is used for instructing the first processor to determine first sub-data of the tensor data, and the data volume of the first sub-data corresponds to the synchronizable data volume;

sending the synchronization state instruction to the first processor.

A7, the method of claim a6, the method further comprising:

A8, a data synchronization device, applied to a first processor, comprising:

the query instruction generation module is used for generating a state query instruction according to a descriptor of tensor data to be synchronized, wherein the descriptor is used for indicating the shape of the tensor data to be synchronized, the state query instruction is used for indicating a second processor to determine the synchronizable data amount aiming at the tensor data and generating a synchronization state instruction, and the state query instruction comprises the identifier of the descriptor and/or the content of the descriptor;

and the query instruction sending module is used for sending the state query instruction to the second processor.

A9, the apparatus of claim A8, the apparatus further comprising:

A10, the apparatus of claim a9, the synchronization status instruction including an identifier of a descriptor, wherein the child data determining module includes:

A11, the apparatus of claim A9 or A10, the sub data determination module comprising:

A12, the device of any one of claims a9-a11, the device further comprising:

A13, a data synchronization device, applied to a second processor, comprising:

the query instruction receiving module is used for determining a descriptor of tensor data to be synchronized when a state query instruction from the first processor is received, and the descriptor is used for indicating the shape of the tensor data to be synchronized;

a data volume determination module for determining a synchronizable data volume for the tensor data according to the descriptor of the tensor data;

a state instruction generating module, configured to generate a synchronous state instruction according to the descriptor of the tensor data and the synchronizable data volume, where the synchronous state instruction is used to instruct the first processor to determine first sub-data of the tensor data, and a data volume of the first sub-data corresponds to the synchronizable data volume;

and the state instruction sending module is used for sending the synchronous state instruction to the first processor.

A14, the apparatus of claim a13, the apparatus further comprising:

A15, an artificial intelligence chip, the chip comprising the data synchronization device of any one of claims A8-A14.

A16, an electronic device comprising the artificial intelligence chip of claim A15.

A17, a board card, comprising: a memory device, an interface device and a control device and an artificial intelligence chip according to claim a 15; wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device; the storage device is used for storing data; the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment; and the control device is used for monitoring the state of the artificial intelligence chip.

A18, the card of claim a17, the memory device comprising: the artificial intelligence chip comprises a plurality of groups of storage units, wherein each group of storage unit is connected with the artificial intelligence chip through a bus, and the storage units are as follows: DDR SDRAM; the chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit; the interface device is as follows: a standard PCIE interface.

The embodiments of the present disclosure have been described in detail, and the principles and embodiments of the present disclosure are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present disclosure. Meanwhile, a person skilled in the art should, based on the idea of the present disclosure, change or modify the specific embodiments and application scope of the present disclosure. In view of the above, the description is not intended to limit the present disclosure.

Claims

1. A data synchronization method applied to a first processor comprises the following steps:

and sending the state inquiry instruction to a second processor.

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein the synchronization state instruction includes an identification of a descriptor,

4. The method according to claim 2 or 3, wherein determining the first sub-data of the tensor data according to the descriptor of the tensor data and the synchronizable data amount in the synchronization state instruction when receiving the synchronization state instruction from the second processor comprises:

5. A method for synchronizing data, the method applied to a second processor, comprising:

sending the synchronization state instruction to the first processor.

6. A data synchronization apparatus applied to a first processor, comprising:

7. A data synchronization apparatus applied to a second processor, comprising:

8. An artificial intelligence chip, characterized in that the chip comprises a data synchronization device according to claim 6 or 7.

9. An electronic device, characterized in that the electronic device comprises an artificial intelligence chip according to claim 8.

10. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface device and a control device and an artificial intelligence chip according to claim 8;

wherein, the artificial intelligence chip is respectively connected with the storage device, the control device and the interface device;

the storage device is used for storing data;

the interface device is used for realizing data transmission between the artificial intelligence chip and external equipment;

and the control device is used for monitoring the state of the artificial intelligence chip.