CN111382858A

CN111382858A - Data sending method and device and related products

Info

Publication number: CN111382858A
Application number: CN201811646708.9A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-07-07

Abstract

The application relates to a data sending method, a data sending device and a related product. The method comprises the following steps: when a transmission task is started, each chip is respectively allocated with respective identification information. And the target sending chip can be accurately positioned according to the identification information of the target sending chip.

Description

Data sending method and device and related products

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a data transmission method, an apparatus, and a related product.

Background

With the development of artificial intelligence technology, the main operation end can not meet the calculation requirement of the existing algorithm, and a special chip for a neural network operates. Practice proves that the artificial intelligence computing task has a unique data structure, a storage mode, a computing mode and the like compared with a general processing task or an image processing task, so that an application-specific integrated circuit can be designed to redistribute chip computing resources for the artificial intelligence computing task, and the computation with low power consumption, low delay and high throughput rate is realized. The npu (neural network Processing unit) is an application-specific integrated circuit, can implement artificial intelligence computing tasks, such as neural network computing, and has the characteristics of low power consumption, high efficiency and small area.

According to moore's law and Dennard Scaling's law, the computational power of a single-core high-performance processor can be a bottleneck due to the limitations of physical factors. To improve the parallelism of computation, the chip design in the industry gradually shifts to the design of multi-core high-efficiency processors. Moreover, with the development of high-performance computers and data centers, more and more computing resources are centralized, and multi-chip cooperative processing is a normal state. In order to realize a high processing performance and high scalability AI processing system based on the NPU, efficient data transfer needs to be supported between NPU chips.

However, no device or method is available to support data transmission between NPU chips.

Disclosure of Invention

In view of the above, it is necessary to provide a communication system, a neural network processing chip, a combination device, and an electronic apparatus.

A method of data transmission, the method comprising:

acquiring a communication configuration information queue and data to be sent, wherein the communication configuration information queue is an information queue for configuring transmission between chips;

analyzing at least one piece of communication configuration information in the communication configuration information queue to obtain a corresponding communication descriptor, wherein the communication descriptor is information describing a process of a sending method;

and sending the data to be sent according to the communication descriptor.

In one embodiment, the obtaining the communication configuration information queue and the data to be sent includes:

detecting whether the data to be sent in the storage space is complete or not;

and if the data to be sent in the storage space is complete, acquiring the communication configuration information queue and the data to be sent.

In one embodiment, the detecting whether the communication configuration information in the storage space and the data to be sent are complete includes:

acquiring an address selection signal;

judging whether the address selection signal is effective or not;

and if the address selection signal is valid, determining that the communication configuration information and the data to be sent in the storage space are complete.

In one embodiment, the analyzing at least one piece of communication configuration information in the communication configuration information queue to obtain corresponding communication descriptors includes:

acquiring a sending control instruction;

reading at least one piece of communication configuration information in the configuration information queue according to a preset rule according to the sending control instruction;

and analyzing at least one piece of communication configuration information to respectively obtain the corresponding communication descriptors.

assigning a communication descriptor identification to each communication descriptor;

and reading the corresponding communication descriptor according to the communication descriptor identification.

In one embodiment, the communication descriptor includes: one or more of a source address of the data to be sent, a destination address of the data to be sent, an offset of the data to be sent, and a data block size of the data to be sent.

In one embodiment, the method further comprises:

obtaining a sending mode character according to the communication descriptor;

and obtaining whether the sending method is a normal sending mode or a hardware acceleration sending mode according to the sending mode symbol.

In one embodiment, the normal sending mode includes obtaining communication configuration information and the control instruction from a main operation terminal, where the main operation terminal is a control device outside a chip.

In one embodiment, the hardware accelerated transmission mode includes obtaining the communication configuration information and the control instruction from a computing device, wherein the computing device is a device inside a chip for executing computation.

In one embodiment, the method further comprises:

packaging each communication descriptor and the data to be sent corresponding to each communication descriptor to obtain a transmission data packet;

acquiring identification information of a target sending chip of data to be sent;

and sending the transmission data packet according to the identification information.

In one embodiment, the apparatus comprises:

the acquisition module is used for acquiring a communication configuration information queue and data to be sent;

the analysis module is used for analyzing at least one piece of communication configuration information in the communication configuration information queue to respectively obtain corresponding communication descriptors;

and the data sending module is used for sending the data to be sent according to the communication descriptor.

A board card is applied to a data sending method, and comprises: the memories corresponding to the artificial intelligence processors are multi-channel memories; the target artificial intelligence processor is used for accessing a physical memory corresponding to a memory channel through the memory channel corresponding to the target parallel thread according to an artificial intelligence processor computing instruction after receiving the artificial intelligence processor computing instruction sent by a general processor CPU through the target parallel thread; the target artificial intelligence processor is any artificial intelligence processor in the plurality of artificial intelligence processors, and the target parallel thread is any parallel thread started by the CPU; at least two threads in the plurality of parallel threads correspond to different memory channels.

A motherboard for use in neural network data processing, the motherboard comprising: a general purpose processor CPU and the board card.

An electronic device is applied to neural network data processing and comprises the mainboard.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above embodiments.

According to the data sending method, the data sending device and the related products, data transmission between the current chip and the next chip is completed through mutual cooperation between the receiving device and the sending device, and data transmission between the chips is achieved.

Drawings

FIG. 1 is a schematic diagram of a communication system provided in one embodiment;

fig. 2 is an internal structural view of a transmitting apparatus provided in one embodiment;

FIG. 3 is a schematic diagram of a combination device provided in one embodiment;

fig. 4 is a flowchart illustrating a data transmission method according to an embodiment;

FIG. 5 is a schematic diagram of a data transmission apparatus according to an embodiment;

fig. 6 is a schematic diagram of a board card provided in an embodiment;

FIG. 7 is a schematic diagram of a motherboard provided in one embodiment;

FIG. 8 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In one embodiment, referring to fig. 1, a communication system is provided. The communication system as described in fig. 1 comprises: the device comprises a receiving device 110, a sending device 120, a computing device 130 and a memory 140, wherein one end of the computing device 130 is connected with the receiving device 110, and the other end is connected with the sending device 120. Specifically, the receiving device 110 and the sending device 120 are respectively connected to the memory 140.

In one embodiment, referring to fig. 2, an internal structure diagram of the transmitting device 120 is provided. The transmission device 120 includes: a transmission configuration circuit 121, a transmission control circuit 123, and a transmission port circuit 122; the transmission control circuit 123 is connected to the transmission configuration circuit 121 and the transmission port circuit, respectively.

In one embodiment, the transmission configuration circuit 121 includes a configuration information acquisition circuit 1211 and a configuration information parsing circuit 1212; the configuration information obtaining circuit 1211 is connected to the configuration information analyzing circuit 1212 and the transmission control circuit 123, respectively. The transmitting device 120 is connected to a memory 140, wherein the memory 140 is respectively connected to the transmitting port circuit 122 and the transmitting configuration circuit 121, and the memory 140 is configured to store data to be transmitted and configuration information. Optionally, the communication descriptor generated by the configuration information parsing circuit is stored in a descriptor cache. The descriptor cache is located inside the sending device. In one embodiment, the data to be transmitted and the communication descriptor are stored correspondingly.

In one embodiment, the transmission configuration circuit 121 and the transmission control circuit 123 are respectively connected to the main operation terminal 150. Specifically, the transmission configuration circuit acquires transmission configuration information from the main operation terminal 150, and the transmission control circuit 123 acquires a control instruction from the main operation terminal.

In one embodiment, referring to FIG. 3, a combination apparatus is provided. The combined device comprises a plurality of neural network processing chips 200, and the neural network processing chips 200 are connected in sequence. Any two of the neural network processing chips can be connected with each other, and any two of the adjacent neural network processing chips can be connected with each other.

In one embodiment, each of the neural network processing chips is connected to the main operation terminal 150. In one embodiment, each neural network processing chip includes a communication system 100 as shown in fig. 1, and the communication system 100 includes a receiving device 110, a transmitting device 120, a computing device 130, and a memory 140.

In one embodiment, an electronic device is provided that includes a neural network processing chip 200. The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The connection relationship between the elements in any of the above embodiments may be an electrical connection or a wireless connection.

In an embodiment, referring to fig. 4 together, a data transmission method is provided, where the data transmission method provided by the present application may be applied to the apparatuses shown in fig. 1 to 3, and the data transmission method includes:

step S710, obtain the communication configuration information queue and the data to be sent. The communication configuration information queue is an information queue for configuring transmission between chips. The data to be transmitted may be communication data in the above embodiment, and the calculation result may also be other data that needs to be transmitted.

Step S720, parsing at least one piece of communication configuration information in the communication configuration information queue to obtain a corresponding communication descriptor. Wherein the communication descriptor is information describing a process through which the transmission method passes. Wherein the communication configuration information queue comprises at least one piece of communication configuration information. Specifically, when a plurality of communication configuration information exists in the communication configuration queue, the communication configuration information is respectively analyzed to obtain communication descriptors corresponding to the communication configuration information.

Step S730, according to the communication descriptor, sending the data to be sent.

In one embodiment, in step S710, the acquiring the communication configuration information queue and the data to be sent includes:

step S711 detects whether the data to be sent in the storage space is complete. Specifically, whether the data to be sent is complete refers to whether the data to be sent is in the data size and the data size conforms to the preset description.

In one embodiment, in step S711, the detecting whether the communication configuration information in the storage space and the data to be sent are complete includes:

in step S7111, an address selection signal is acquired. Specifically, the address selection signal is a signal that can reflect whether data to be transmitted is complete and accurate.

Step S7112, it is determined whether the address selection signal is valid. Step S7113, if the address selection signal is valid, determining that the communication configuration information and the data to be transmitted in the storage space are complete. In one embodiment, if the address selection signal is invalid, it is determined that the data to be sent is incomplete.

Step S712, if the data to be sent in the storage space is complete, obtaining the communication configuration information queue and the data to be sent. By the method in the implementation, the accuracy of the data to be transmitted can be ensured.

In an embodiment, in step S720, the analyzing at least one piece of communication configuration information in the communication configuration information queue to obtain corresponding communication descriptors includes:

in step S721, a transmission control instruction is acquired. Specifically, when the sending mode is a common sending mode, a sending control instruction is obtained from the main operation end; when the transmission mode is the hardware acceleration mode, a transmission control instruction is acquired from the computing device. It is understood that the main operating terminal is inside the chip and the computing device is inside the chip. When a sending instruction is acquired from a computing device, it is performed that the chip internal transmission increases the transmission speed.

Step S722, according to the sending control instruction, reading at least one piece of the communication configuration information in the configuration information queue according to a preset rule. Specifically, the preset rule is a preset reading rule, and may be read according to a storage sequence of the configuration information in the configuration information queue, or according to another preset rule.

Step S723, parsing at least one piece of the communication configuration information to obtain the corresponding communication descriptors respectively. In one embodiment, each communication descriptor is assigned a communication descriptor identification;

and reading the corresponding communication descriptor according to the communication descriptor identification. In particular, the different communication descriptors are distinguished by respective communication descriptors. In one embodiment, the last communication descriptor read from the plurality of communication descriptors has a communication descriptor identification E. When the communication descriptor id E is read, it indicates that all the reading of the plurality of communication descriptors is completed.

In one embodiment, the method further includes: obtaining a sending mode character according to the communication descriptor; and obtaining whether the sending method is a normal sending mode or a hardware acceleration sending mode according to the sending mode symbol. For example, according to the communication descriptor, a sending mode Type1 is generated, when the Type1 is 0, the normal data transmission is indicated, and when the Type1 is 2, the hardware accelerated transmission is indicated. In another embodiment, when Type1 is 1, communication between the chip and the main operation terminal is indicated. The common data transmission refers to data transmission between chips, and the control command is sent from the main operation terminal. The hardware accelerated transmission refers to data transmission between chips, and sends a control instruction from a computing device inside the chip.

In one embodiment, the method further comprises:

step S740, packing each communication descriptor and the data to be sent corresponding to each communication descriptor to obtain a transmission data packet. Specifically, transmitting the data packet includes compressing each communication descriptor and data to be transmitted corresponding to each communication descriptor to obtain a compressed packet.

Step S750, acquiring identification information of a target sending chip of data to be sent. Specifically, when a transmission task is started, each chip is assigned with its own identification information. And the target sending chip can be accurately positioned according to the identification information of the target sending chip.

Step S760, sending the transmission data packet according to the identification information.

It should be understood that, although the steps in the flowchart of fig. 4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, referring to fig. 5, there is provided a data transmission apparatus, including:

an obtaining module 701, configured to obtain a communication configuration information queue and data to be sent;

an analyzing module 702, configured to analyze at least one piece of communication configuration information in the communication configuration information queue to obtain corresponding communication descriptors respectively;

a data sending module 703, configured to send the data to be sent according to the communication descriptor.

For specific limitations of the data transmission apparatus, reference may be made to the above limitations of the data transmission method, which are not described herein again. The respective modules in the data receiving apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In an embodiment, the present application further provides a board, where the board is applied in a data communication method, and the board may include: the memories corresponding to the artificial intelligence processors are multi-channel memories; the target artificial intelligence processor is used for accessing a physical memory corresponding to a memory channel through the memory channel corresponding to the target parallel thread according to an artificial intelligence processor calculation instruction after receiving the artificial intelligence processor calculation instruction sent by a CPU through the target parallel thread; the target artificial intelligence processor is any artificial intelligence processor in the plurality of artificial intelligence processors, and the target parallel thread is any parallel thread started by the CPU; at least two threads in the plurality of parallel threads correspond to different memory channels.

Referring to fig. 6, the board may contain other components in addition to the artificial intelligence processors 411 (the dedicated processor 41 may include the artificial intelligence processors 411) and the multi-channel memory 42. Such kits include, but are not limited to: memory controller 43, bus, interface 44. The dedicated processor 41 performs instruction transmission and data transmission with an external device through the interface 44. Alternatively, the external device may be a main operation terminal (CPU).

The board card provided by this embodiment may implement the above method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

In an embodiment, the present application further provides a motherboard applied in the neural network data processing method, as shown in fig. 7, the motherboard includes: the main operation end and the board card provided by the embodiment.

The main board provided in this embodiment may implement the above method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

In one embodiment, an electronic device is provided, the electronic device is applied to a data communication method, and the electronic device comprises a main board as shown in fig. 7. The main board comprises a CPU and a board card, wherein the board card comprises a plurality of artificial intelligence processors, and memories corresponding to the artificial intelligence processors are multi-channel memories; the target artificial intelligence processor is used for accessing a physical memory corresponding to a memory channel through the memory channel corresponding to the target parallel thread according to an artificial intelligence processor calculation instruction after receiving the artificial intelligence processor calculation instruction sent by a main operation end CPU through the target parallel thread; the target artificial intelligence processor is any artificial intelligence processor in the plurality of artificial intelligence processors, and the target parallel thread is any parallel thread started by the CPU; at least two threads in the plurality of parallel threads correspond to different memory channels.

Optionally, the electronic device may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store communication configuration information or communication descriptors. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data communication method.

Those skilled in the art will appreciate that the architecture shown in fig. 8 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method of any of the above embodiments when executing the computer program.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for transmitting data, the method comprising:

acquiring a communication configuration information queue and data to be sent; the communication configuration information queue is an information queue for configuring transmission between chips;

analyzing at least one piece of communication configuration information in the communication configuration information queue to obtain a corresponding communication descriptor; wherein the communication descriptor is information describing a process through which the transmission method passes;

and sending the data to be sent according to the communication descriptor.

2. The method of claim 1, wherein the obtaining the communication configuration information queue and the data to be sent comprises:

detecting whether the data to be sent in the storage space is complete or not;

3. The method of claim 2, wherein the detecting whether the communication configuration information and the data to be transmitted in the storage space are complete comprises:

acquiring an address selection signal;

judging whether the address selection signal is effective or not;

4. The method of claim 1, wherein the parsing at least one piece of communication configuration information in the communication configuration information queue to obtain corresponding communication descriptors respectively comprises:

acquiring a sending control instruction;

5. The method of claim 1, wherein the parsing at least one piece of communication configuration information in the communication configuration information queue to obtain corresponding communication descriptors respectively comprises:

6. The method of any of claims 1 to 5, wherein the communication descriptor comprises: one or more of a source address of the data to be sent, a destination address of the data to be sent, an offset of the data to be sent, and a data block size of the data to be sent.

7. The method according to any one of claims 1 to 5, further comprising:

obtaining a sending mode character according to the communication descriptor;

8. The method of claim 7, wherein the normal sending mode comprises obtaining communication configuration information and the control command from a main operation terminal, and the main operation terminal is a control device external to the chip.

9. The method of claim 7, wherein the hardware accelerated transmission mode comprises obtaining communication configuration information and the control instruction from a computing device, wherein the computing device is a device inside a chip for performing computation.

10. The method of claim 1, further comprising:

11. A data transmission apparatus, characterized in that the apparatus comprises:

12. The board card is characterized by being applied to a data sending method, and the board card comprises: the memories corresponding to the artificial intelligence processors are multi-channel memories; the target artificial intelligence processor is used for accessing a physical memory corresponding to a memory channel through the memory channel corresponding to the target parallel thread according to an artificial intelligence processor computing instruction after receiving the artificial intelligence processor computing instruction sent by a general processor CPU through the target parallel thread; the target artificial intelligence processor is any artificial intelligence processor in the plurality of artificial intelligence processors, and the target parallel thread is any parallel thread started by the CPU; at least two threads in the plurality of parallel threads correspond to different memory channels.

13. A motherboard for use in neural network data processing, the motherboard comprising: a general purpose processor CPU and a board as claimed in claim 12.

14. An electronic device, for use in neural network data processing, comprising a motherboard as claimed in claim 13.

15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 10.