CN111382116B

CN111382116B - Data receiving method and device and related product

Info

Publication number: CN111382116B
Application number: CN201811646353.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Cambricon Information Technology Co Ltd
Current assignee: Shanghai Cambricon Information Technology Co Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2022-10-04
Anticipated expiration: 2038-12-29
Also published as: CN111382116A

Abstract

The application relates to a data sending method, a data sending device and a related product. The method comprises the following steps: and when the quantity of the task descriptors corresponding to the receiving task is not enough to reach the queue depth of the state descriptor queue, waiting for the receiving process corresponding to the communication descriptors corresponding to the receiving task to be completely finished, and generating the state descriptor queue.

Description

Data receiving method and device and related product

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a data receiving method, an apparatus, and a related product.

Background

With the development of artificial intelligence technology, the main operation end can not meet the calculation requirement of the existing algorithm, and a special chip for a neural network operates. Practice proves that the artificial intelligence computing task has a unique data structure, a storage mode, a computing mode and the like compared with a general processing task or an image processing task, so that an application-specific integrated circuit can be designed to redistribute chip computing resources for the artificial intelligence computing task, and the computation with low power consumption, low delay and high throughput rate is realized. An NPU (Neural network Processing Unit) is an application-specific integrated circuit, can implement artificial intelligence computing tasks, such as Neural network computing, and has the characteristics of low power consumption, high efficiency and small area.

According to moore's law and Dennard Scaling's law, the computational power of a single-core high-performance processor can be a bottleneck due to the limitations of physical factors. To improve the parallelism of computation, the chip design in the industry gradually shifts to the design of multi-core high-efficiency processors. Moreover, with the development of high-performance computers and data centers, more and more computing resources are centralized, and multi-chip cooperative processing is a normal state. In order to realize a high processing performance and high scalability AI processing system based on the NPU, efficient data transfer needs to be supported between NPU chips.

However, no device or method is available to support data transmission between NPU chips.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data receiving method, an apparatus and a related product.

A method of data reception, the method comprising:

acquiring communication data and communication configuration information;

analyzing the communication configuration information to obtain a communication descriptor; the communication descriptor is information describing a process passed by a sending method;

receiving the communication data according to the communication descriptor.

In one embodiment, the obtaining communication data and communication configuration information includes:

acquiring a transmission data packet;

and obtaining communication data and the communication configuration information according to the transmission data packet.

In one embodiment, the parsing the communication configuration information to obtain the communication descriptor includes:

acquiring an analysis control instruction;

and analyzing the communication configuration information according to the control instruction to obtain a communication descriptor.

In one embodiment, the communication descriptor includes: one or more of a source address of the data to be sent, a destination address of the data to be sent, an offset of the data to be sent in the source address, an offset of the data to be sent in the destination address, and a data block size of the data to be sent.

In one embodiment, the method further comprises:

obtaining a receiving mode character according to the communication descriptor;

and obtaining whether the sending method is a common sending mode or a hardware acceleration sending mode according to the receiving mode symbol.

In one embodiment, the normal sending mode includes obtaining a sending control instruction from a main operation terminal, where the main operation terminal is a control device outside a chip.

In one embodiment, the hardware accelerated transfer mode includes obtaining a send control instruction from a computing device, the computing device being a device inside a chip that performs a computation.

In one embodiment, the method further comprises:

and storing the communication data to a target address according to the communication descriptor.

In one embodiment, the method further comprises:

according to the communication descriptor, when the execution of the receiving task is completed, generating a corresponding state descriptor;

storing the state descriptors to a state descriptor queue;

and judging the execution state of the receiving process according to the state descriptor queue.

In one embodiment, the determining, according to the state descriptor queue, an execution state of the receiving method includes:

selecting the state descriptor in the state descriptor queue according to a preset rule;

determining the number of the executed receiving processes according to the number of the state descriptors in the state descriptor queue;

and when the number of the executed receiving tasks reaches a threshold value, judging that the receiving tasks are finished.

A data receiving apparatus, the apparatus comprising:

the configuration information acquisition module is used for acquiring communication data and communication configuration information;

the descriptor analyzing module is used for analyzing the communication configuration information to obtain a communication descriptor;

and the data receiving module is used for receiving the communication data according to the communication descriptor.

A board card is applied to a data sending method, and comprises: the memories corresponding to the artificial intelligence processors are multi-channel memories; the target artificial intelligence processor is used for accessing a physical memory corresponding to a memory channel through the memory channel corresponding to the target parallel thread according to an artificial intelligence processor computing instruction after receiving the artificial intelligence processor computing instruction sent by a general processor CPU through the target parallel thread; the target artificial intelligence processor is any artificial intelligence processor in the plurality of artificial intelligence processors, and the target parallel thread is any parallel thread started by the CPU; at least two threads in the plurality of parallel threads correspond to different memory channels.

A motherboard for use in neural network data processing, the motherboard comprising: a general purpose processor CPU and the board card.

An electronic device is applied to data processing of a neural network, and comprises the mainboard.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above embodiments.

According to the data receiving method, the data receiving device and the related product, data transmission between the current chip and the next chip is completed through mutual cooperation between the receiving device and the sending device, and data transmission between the chips is achieved.

Drawings

FIG. 1 is a schematic diagram of a communication system provided in one embodiment;

fig. 2 is an internal structural view of a receiving apparatus provided in one embodiment;

FIG. 3 is a schematic diagram of a combination device provided in one embodiment;

fig. 4 is a schematic flow chart of a data receiving method provided in an embodiment;

fig. 5 is a schematic flow chart of a data receiving method provided in another embodiment;

FIG. 6 is a schematic diagram of a data receiving device in accordance with one embodiment;

fig. 7 is a schematic diagram of a board card provided in an embodiment;

FIG. 8 is a schematic diagram of a motherboard provided in one embodiment;

FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.

In one embodiment, referring to fig. 1, a communication system is provided. The communication system as described in fig. 1 comprises: the device comprises a receiving device 110, a sending device 120, a calculating device 130 and a memory 140, wherein one end of the calculating device 130 is connected with the receiving device 110, and the other end of the calculating device 130 is connected with the sending device 120. Specifically, the receiving device 110 and the sending device 120 are respectively connected to the memory 140.

In one embodiment, referring to fig. 2 together, an internal structure diagram of a receiving device 110 is provided, the device is located on a chip, and the receiving device 110 includes: a reception port circuit 111, a reception control circuit 112, and a configuration information analysis circuit 113. The receiving port circuit 111 is connected to the configuration information analyzing circuit 113, and the configuration information analyzing circuit 113 is further connected to the control circuit 112. In one embodiment, the configuration information parsing circuit 113 is configured to parse at least one piece of communication configuration information in the received communication configuration information queue to obtain corresponding communication descriptors respectively.

In one embodiment, the receiving apparatus 110 further comprises a state descriptor buffer circuit 114, wherein the state descriptor buffer 114 is connected to the receiving control circuit 112; the state descriptor cache circuit 114 is used to store state descriptors that are used to mark the completion state of the receiving process.

In one embodiment, the receiving device 110 is coupled to a memory 140. Specifically, the memory 140 is connected to the receiving port circuit 111, and the memory 140 is used for storing data received by the receiving port circuit 111.

In one embodiment, referring to fig. 3, a combination apparatus is provided. The combined device comprises a plurality of neural network processing chips 200, and the neural network processing chips 200 are connected in sequence. Any two of the neural network processing chips can be connected with each other, and adjacent two chips can also be connected with each other.

In one embodiment, each of the neural network processing chips is connected to the main operation terminal 150. In one embodiment, each neural network processing chip includes a communication system 100 as shown in fig. 1, and the communication system 100 includes a receiving device 110, a transmitting device 120, a computing device 130, and a memory 140.

In one embodiment, an electronic device is provided that includes a neural network processing chip 200. The electronic equipment comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance and/or medical equipment.

The connection relationship between the elements in any of the above embodiments may be an electrical connection or a wireless connection.

The receiving device, the transmitting device, the computing device and the memory together form a communication system, by means of which data transmission between the NPU chips can be supported.

In an embodiment, referring to fig. 4 together, a data receiving method is provided, where the data receiving method provided in the present application may be applied to the apparatuses shown in fig. 1 to fig. 3, and the data receiving method includes:

step S601, communication data and communication configuration information are acquired. The communication data refers to data received or sent in a communication process, and the data can be input data or output data of a calculation process; but also raw data that is transmitted into the chip from outside. The communication configuration information refers to process description information of a receiving process or receiving description information of the receiving process in the data communication process. In one embodiment, the communication configuration information includes, but is not limited to: a source address of the communication data, an offset of the communication data in the source address, a destination address of the communication data, an offset of the communication data in the destination address, and a data block size of the communication data.

Step S602, analyzing the communication configuration information to obtain a communication descriptor; wherein the communication descriptor is information describing a process through which the transmission method passes. Specifically, the communication configuration information is transmitted to the NPU chip through the main operation terminal, cannot be directly identified by the NPU chip, and needs to be subjected to parsing operation to generate a communication descriptor. In one embodiment, the communication configuration information has at least one piece, and each piece of communication configuration information is analyzed to obtain a corresponding communication descriptor. It is understood that the communication descriptors in the receiving process include, but are not limited to: a source address of the communication data, an offset of the communication data in the source address, a destination address of the communication data, an offset of the communication data in the destination address, and a data block size of the communication data.

Step S603, receiving the communication data according to the communication descriptor. In one embodiment, the communication data is stored to a destination address based on a communication descriptor.

In one embodiment, step S601 further includes:

step S6011, a transmission data packet is acquired. In particular, the transmission data packet may be a data compression packet. Alternatively, the transport packets may come from the transmitting device of other NPU chips.

Step S6012, according to the transmission data packet, obtaining communication data and the communication configuration information. In one embodiment, after a transmission data packet is obtained, the transmission data packet is decompressed to obtain transmission data.

In one embodiment, the receiving method further comprises:

step S604, obtaining a receiving mode character according to the communication descriptor. It is to be understood that the communication descriptor further includes a reception mode symbol. Furthermore, the communication descriptor obtained by analyzing the communication configuration information includes a receiving mode character.

Step S605, obtaining whether the sending method is the normal sending mode or the hardware acceleration sending mode according to the receiving mode symbol. The reception mode identifier is an identifier that enables the reception method to select a predetermined reception mode. For example, the receiving Type is Type, when Type =0 indicates normal data transmission, and when Type =1 indicates hardware acceleration descriptor.

In one embodiment, the normal sending mode includes obtaining a control instruction from a main operation terminal, where the main operation terminal is a control device outside a chip. In another embodiment, the hardware accelerated transport mode includes obtaining control instructions from a computing device, the computing device being an internal device of the chip that performs the calculations. The transmission control command refers to a control command for hardware generated to implement the transmission method on hardware. As an alternative implementation, the sending control instruction needs to be parsed to obtain a binary instruction corresponding to the sending control instruction.

In an embodiment, referring to fig. 5, the sending method further includes:

step S606, according to the communication descriptor, when the execution of the receiving task is completed, a corresponding state descriptor is generated. In one embodiment, each communication descriptor corresponds to a receiving task, and when the corresponding receiving task is executed, a state descriptor corresponding to the communication descriptor is generated.

Step S607, storing the state descriptor in a state descriptor queue. Specifically, the state descriptor queue includes a plurality of state descriptors. In one embodiment, the plurality of state descriptors are stored sequentially in the order of generation.

Step S608, determining the execution state of the received task according to the state descriptor queue. Specifically, one reception task may correspond to a plurality of communication descriptors. It is to be understood that one communication descriptor corresponds to one reception process; when a receiving process is finished, a state descriptor is correspondingly generated. And when the receiving process corresponding to a plurality of communication descriptors corresponding to one receiving task is completely finished or partially finished, generating a state descriptor queue. Specifically, after the queue depth of the state descriptor queue reaches the upper limit, the state descriptor queue can be generated without waiting for the receiving process corresponding to the communication descriptor to be completely finished. For example, a receiving task a corresponds to 20 state descriptors, and the queue depth of the state descriptor queue is 16, then after the state descriptor queue reaches 16, although all the descriptors of the receiving task a are not completely executed, the state descriptor queue may also be generated. In particular, the depth of the state descriptor queue refers to the number of state descriptors that the state descriptor queue can accommodate.

In another embodiment, when the number of task descriptors corresponding to a receiving task is not enough to reach the queue depth of the state descriptor queue, the state descriptor queue is generated after all receiving processes corresponding to a plurality of communication descriptors corresponding to the receiving task are finished.

In one embodiment, the step S608 of determining, according to the state descriptor queue, an execution state of the receiving method includes:

step S6081, selecting the state descriptor from the state descriptor queue according to a preset rule. Specifically, the preset rule refers to a rule for reading a state descriptor queue input before executing a sending task.

Step S6082, determining the number of executed reception processes according to the number of status descriptors in the status descriptor queue. Specifically, a status descriptor represents that a receiving process is done. In one embodiment, the number of performed receive processes is equal to the number of state descriptors.

Step S6083, when the number of executed reception processes reaches a threshold, determines that the reception task is ended. Wherein, the threshold refers to a preset number of receiving processes. In one embodiment, the threshold is the number of state descriptors in the state descriptor queue. In one embodiment, the state descriptor queue is read in the order in which the state descriptors are generated. In another embodiment, the state descriptors are read after the completion of the state descriptor queue in a predetermined order.

It should be understood that although the various steps in the flow charts of fig. 4-5 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 4-5 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternating with other steps or at least some of the sub-steps or stages of other steps.

In one embodiment, referring to fig. 6, a data receiving device is provided, the device comprising:

a configuration information obtaining module 801, configured to obtain communication data and communication configuration information;

a descriptor parsing module 802, configured to parse the communication configuration information to obtain a communication descriptor;

a data receiving module 803, configured to receive the communication data according to the communication descriptor.

For specific limitations of the data receiving apparatus, reference may be made to the above limitations of the data receiving method, which are not described herein again. The respective modules in the data receiving apparatus may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In an embodiment, the present application further provides a board, where the board is applied in a data communication method, and the board may include: the memories corresponding to the artificial intelligence processors are multi-channel memories; the target artificial intelligence processor is used for accessing a physical memory corresponding to a memory channel through the memory channel corresponding to the target parallel thread according to an artificial intelligence processor calculation instruction after receiving the artificial intelligence processor calculation instruction sent by a CPU through the target parallel thread; the target artificial intelligence processor is any artificial intelligence processor in the plurality of artificial intelligence processors, and the target parallel thread is any parallel thread started by the CPU; at least two threads in the plurality of parallel threads correspond to different memory channels.

Referring to fig. 7, the board may contain other components in addition to the artificial intelligence processors 411 (the dedicated processor 41 may include the artificial intelligence processors 411) and the multi-channel memory 42. Such kits include, but are not limited to: memory controller 43, bus, interface 44. The dedicated processor 41 performs instruction transmission and data transmission with an external device through the interface 44. Alternatively, the external device may be a main operation terminal (CPU).

The board card provided by this embodiment may implement the above method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

In an embodiment, the present application further provides a motherboard applied in the neural network data processing method, as shown in fig. 8, the motherboard includes: the main operation end and the board card provided by the embodiment.

The main board provided in this embodiment may implement the method embodiments, and the implementation principle and technical effect are similar, which are not described herein again.

In one embodiment, an electronic device is provided, and the electronic device is applied to a data communication method, and the electronic device comprises a main board as shown in fig. 8. The main board comprises a CPU and a board card, wherein the board card comprises a plurality of artificial intelligence processors, and memories corresponding to the artificial intelligence processors are multi-channel memories; the target artificial intelligence processor is used for accessing a physical memory corresponding to a memory channel through the memory channel corresponding to the target parallel thread according to an artificial intelligence processor calculation instruction after receiving the artificial intelligence processor calculation instruction sent by a main operation end CPU through the target parallel thread; the target artificial intelligence processor is any artificial intelligence processor in the plurality of artificial intelligence processors, and the target parallel thread is any parallel thread started by the CPU; at least two threads in the plurality of parallel threads correspond to different memory channels.

Optionally, the electronic device may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a mobile phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 9. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The database of the computer device is used to store communication configuration information or communication descriptors. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data communication method.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method of any of the above embodiments when executing the computer program.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A method for receiving data, the method comprising:

acquiring communication data and communication configuration information;

analyzing the communication configuration information to obtain a communication descriptor; wherein the communication descriptor is information describing a process through which the transmission method passes;

receiving the communication data according to the communication descriptor;

according to the communication descriptors, when the receiving process is completed, generating corresponding state descriptors, wherein one receiving task corresponds to a plurality of communication descriptors, each communication descriptor corresponds to one receiving process in the receiving task, and the state descriptors represent that the receiving process is completed;

storing the state descriptors to a state descriptor queue;

and judging the execution state of the received task according to the number of the state descriptors in the state descriptor queue.

2. The method of claim 1, wherein the obtaining communication data and communication configuration information comprises:

acquiring a transmission data packet;

3. The method of claim 1, wherein parsing the communication configuration information to obtain a communication descriptor comprises:

acquiring an analysis control instruction;

4. The method of any of claims 1 to 3, wherein the communication descriptor comprises: one or more of a source address of the data to be sent, a destination address of the data to be sent, an offset of the data to be sent in the source address, an offset of the data to be sent in the destination address, and a data block size of the data to be sent.

5. A method according to any of claims 1 to 3, characterized in that the method further comprises:

obtaining a receiving mode character according to the communication descriptor;

6. The method of claim 5, wherein the normal transmission mode comprises obtaining a transmission control command from a main operation terminal, wherein the main operation terminal is a control device outside a chip.

7. The method of claim 5, wherein the hardware accelerated transport mode comprises obtaining the send control instruction from a computing device, wherein the computing device is an on-chip device that performs the computation.

8. The method of claim 1, further comprising:

9. The method of claim 1, wherein determining the execution status of the received task according to the status descriptor queue comprises:

10. A data receiving apparatus, the apparatus comprising:

the descriptor analyzing module is used for analyzing the communication configuration information to obtain a communication descriptor; wherein the communication descriptor is information describing a process through which the transmission method passes;

a data receiving module, configured to receive the communication data according to the communication descriptor;

the generating module is used for generating corresponding state descriptors according to the communication descriptors when the receiving process is finished, wherein one receiving task corresponds to a plurality of communication descriptors, each communication descriptor corresponds to one receiving process in the receiving task, and the state descriptors represent that the receiving process is finished;

the storage module is used for storing the state descriptors to a state descriptor queue;

and the judging module is used for judging the execution state of the received task according to the number of the state descriptors in the state descriptor queue.

11. A board, used in the data receiving method according to any one of claims 1 to 9, the board comprising: the memories corresponding to the artificial intelligence processors are multi-channel memories; the target artificial intelligence processor is used for accessing a physical memory corresponding to a memory channel through the memory channel corresponding to the target parallel thread according to an artificial intelligence processor computing instruction after receiving the artificial intelligence processor computing instruction sent by a general processor CPU through the target parallel thread; the target artificial intelligence processor is any artificial intelligence processor in the plurality of artificial intelligence processors, and the target parallel thread is any parallel thread started by the CPU; at least two threads in the plurality of parallel threads correspond to different memory channels.

12. A motherboard for use in neural network data processing, the motherboard comprising: a general purpose processor CPU and a board as claimed in claim 11.

13. An electronic device, for use in neural network data processing, comprising a motherboard as claimed in claim 12.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.