CN109726800B

CN109726800B - Operation method, device and related product

Info

Publication number: CN109726800B
Application number: CN201811639690.XA
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Zhongke Cambrian Technology Co Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2019-12-24
Anticipated expiration: 2038-12-29
Also published as: CN109726800A

Abstract

The present disclosure relates to an arithmetic method, apparatus and related product, the product comprising a control module, the control module comprising: the device comprises an instruction cache unit, an instruction processing unit and a storage queue unit; the instruction cache unit is used for storing the calculation instruction associated with the artificial neural network operation; the instruction processing unit is used for analyzing the calculation instruction to obtain a plurality of operation instructions; the storage queue unit is configured to store an instruction queue, where the instruction queue includes: and a plurality of operation instructions or calculation instructions to be executed according to the front and back sequence of the queue. Through the method, the operation efficiency of the related product in the operation of the neural network model can be improved.

Description

Operation method, device and related product

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an operation method, an operation device, and a related product.

Background

The fusion mode and the off-line mode are two specific operation modes of the neural network, and are different from a common layer-by-layer operation mode. The data copying process of a part of fused network layers in the neural network is not processed by a CPU any more, but tasks such as data copying, data operation and the like are directly completed on the MLU board card. The mode of combining all the operation processes of a plurality of network layers and directly completing the calculation on the MLU without passing through the CPU is a fusion mode. On the basis of the fusion mode, the model is separated from the network framework and is changed into a network model independent of the network framework, namely an offline model, and the mode of operating the offline model is the offline mode.

Because the fused network layer data does not flow through the CPU in the fusion mode and the off-line mode, the user cannot obtain the operation result of some layers. Then, when a user has a Debug requirement or needs to directly obtain result data of a certain layer in the network for some special reason, a more complicated processing flow is required. The processing method usually adopted is as follows: splitting the network at the position needing to be output, and separating the network layer needing to output result data from a fusion mode; or, the network layer which needs to output the result data is copied to be used as an additional output.

In the two processing modes, because the network layer which is separated from the fusion mode needs to run on the CPU, the operation rate is obviously reduced, and thus, more serious performance loss is caused; the process of replicating the network layer is complex, which also affects the processing speed and results in performance loss.

Disclosure of Invention

In view of this, the present disclosure provides an operation method, where an additional output parameter is defined in a configuration file of a Caffe model to obtain an adjusted Caffe configuration file, and an intermediate result of the Caffe model is added to an output result of the Caffe model through the additional output parameter, so that when the Caffe model is executed according to the adjusted Caffe configuration file, an output result of the Caffe model including the intermediate result can be obtained. Compared with the traditional intermediate result acquisition mode, the method effectively improves the operation rate and avoids the loss of model performance.

According to an aspect of the present disclosure, there is provided an arithmetic method including:

defining additional output parameters in a configuration file of a Caffe model to obtain an adjusted Caffe configuration file, wherein the additional output parameters are used for indicating that an intermediate result of the Caffe model is added to an output result of the Caffe model, and the intermediate result comprises an operation result of at least one non-output layer in the Caffe model;

executing the Caffe model according to the adjusted Caffe configuration file to obtain an output result of the Caffe model comprising the intermediate result;

wherein, the Caffe model is a fusion model.

In one possible implementation, the method further includes:

and obtaining a Caffe offline model according to the adjusted Caffe file, so that the Caffe offline model outputs an output result of the Caffe model including the intermediate result when being executed.

In one possible implementation, the operation method is suitable for a heterogeneous computing architecture, and the heterogeneous computing architecture comprises a general-purpose processor and an artificial intelligence processor;

the non-output layer includes: a non-output layer in a converged sub-network operating on the artificial intelligence processor; the converged subnetwork is: and (3) carrying out operator fusion on all or network layers in the Caffe model to obtain the network.

In one possible implementation, the value of the additional output parameter comprises an output identification or a non-output identification,

the additional output parameters are used for representing the addition of the intermediate result of the Caffe model to the output result of the Caffe model, and include:

the value of the additional output parameter is an output identification representation that adds the intermediate result of the Caffe model to the output result of the Caffe model.

In a possible implementation manner, in the output results of the Caffe model, the output results of the non-output layers in the intermediate results are arranged in the order of the names of the network layers.

According to another aspect of the present disclosure, there is also provided an arithmetic device including:

the parameter definition module is used for defining additional output parameters in a configuration file of a Caffe model to obtain an adjusted Caffe configuration file, wherein the additional output parameters are used for indicating that an intermediate result of the Caffe model is added to an output result of the Caffe model, and the intermediate result comprises an operation result of at least one non-output layer in the Caffe model;

and the model execution module is used for executing a Caffe fusion model according to the adjusted Caffe configuration file to obtain an output result of the Caffe model comprising the intermediate result.

In one possible implementation, the apparatus further includes:

and the offline model obtaining module is used for obtaining a Caffe offline model according to the adjusted Caffe file, so that the Caffe offline model outputs an output result of the Caffe model including the intermediate result when being executed.

In one possible implementation, the computing device is adapted for a heterogeneous computing architecture that includes a general purpose processor and an artificial intelligence processor;

the non-output layer includes: a non-output layer in a converged sub-network operating on the artificial intelligence processor; the converged subnetwork is: and (3) carrying out operator fusion on all or part of network layers in the Caffe model to obtain the network.

In one possible implementation, the value of the additional output parameter includes an output identification or a non-output identification;

In one possible implementation, in the output results of the Caffe model, the output results of the middle and past middle non-output layers are arranged in order of the names of the network layers.

According to an aspect provided by the present disclosure, there is provided a computer device comprising a memory, a processor, a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the above methods when executing the computer program.

According to an aspect of the present disclosure, there is also provided a readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of any of the methods described above.

According to another aspect of the present disclosure, there is also provided a machine learning arithmetic device, including one or more arithmetic devices as described above, configured to acquire input data and control information to be operated from other processing devices, execute a specified machine learning operation, and transmit an execution result to the other processing devices through an I/O interface;

when the machine learning arithmetic device comprises a plurality of arithmetic devices, the arithmetic devices can be connected through a specific structure and transmit data;

the plurality of operation devices are interconnected through a PCIE bus and transmit data so as to support operation of larger-scale machine learning;

a plurality of the arithmetic devices share the same control system or have respective control systems;

the plurality of computing devices share a memory or own respective memories;

the plurality of arithmetic devices are connected in an arbitrary connection topology.

According to an aspect of the present disclosure, there is provided a combined processing apparatus including the machine learning arithmetic apparatus as described above, a universal interconnection interface, and other processing apparatuses;

and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.

In one possible implementation manner, the method further includes: a storage device;

the storage device is connected to the machine learning arithmetic device and the other processing device, respectively, and is configured to store the machine learning arithmetic device or the combined processing device.

According to an aspect of the present disclosure, there is provided a neural network chip, the chip including a machine learning arithmetic device as described above, or a combined processing device as described above.

According to another aspect of the present disclosure, there is also provided an electronic device including the neural network chip as described above.

According to an aspect of the present disclosure, a board card is provided, the board card including: memory devices, interface devices and control devices and neural network chips as described above;

wherein, the neural network chip is respectively connected with the storage device, the control device and the interface device;

the storage device is used for storing data;

the interface device is used for realizing data transmission between the neural network chip and external equipment;

and the control device is used for monitoring the state of the neural network chip.

In one possible implementation, the storage device includes: a plurality of groups of memory cells, each group of memory cells is connected with the neural network chip through a bus, and the memory cells are: DDR SDRAM;

the chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit;

the interface device is as follows: a standard PCIE interface.

According to the operation method, the adjusted Caffe configuration file is obtained by defining the additional output parameters in the configuration file of the Caffe model, and the intermediate result of the Caffe model is added to the output result of the Caffe model through the representation of the additional output parameters, so that the output result of the Caffe model including the intermediate result can be obtained when the Caffe model is executed according to the adjusted Caffe configuration file. Compared with the traditional intermediate result acquisition mode, the method effectively improves the operation rate and avoids the loss of model performance.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments, features, and aspects of the disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a flow diagram of a method of operation according to an embodiment of the present disclosure;

fig. 2 shows a schematic structural diagram of a neural network of the related art;

FIG. 3 shows a schematic diagram of the logic of operation of a neural network in a layer-by-layer mode;

FIG. 4 shows a schematic diagram of the logic for operation of a neural network in a converged mode;

FIG. 5 shows a schematic diagram of the logic of operation of a neural network under an embodiment of the present disclosure;

FIG. 6 shows a block diagram of a computing device according to an embodiment of the present disclosure;

FIG. 7 shows a block diagram of a combining process according to an embodiment of the present disclosure;

FIG. 8 shows a block diagram of another combined processing device according to an embodiment of the present disclosure;

fig. 9 shows a block diagram of a board card according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

First, it should be noted that the file for generating the Caffe model generally includes two files: one is a structure file (pt), which is also referred to as a configuration file; and the other is a weight file (ca ffeemodel). The object to be adjusted in the above operation method may be a configuration file (pt) stored in a disk, or may be an adjustment performed on the configuration file loaded into the memory after the configuration file and the weight file are loaded into the memory.

Fig. 1 shows a flow diagram of a method of operation according to an embodiment of the present disclosure. Referring to fig. 1, the operation method of the embodiment of the present disclosure includes:

and S100, defining additional output parameters in a configuration file of the Caffe model to obtain an adjusted Caffe configuration file. It should be noted that the additional output parameter is used to indicate that the intermediate result of the Caffe model is added to the output result of the Caffe model. The intermediate result refers to an operation result including at least one non-output layer in the Caffe model.

And step S200, executing a Caffe model according to the adjusted Caffe configuration file to obtain an output result of the Caffe model comprising the intermediate result. Wherein, Caffe model can be fusion model.

It should be noted that the operation method of the present disclosure may be performed based on a fusion mode of a Convolutional neural network framework (referred to as "Caffe" for short), or may be performed based on other neural network structures, and the embodiments disclosed below are all described by taking a Caffe model as an example. It can be understood by those skilled in the art that the operation method of the present disclosure can also be applied to other neural networks, the principles of which are the same or similar, and are not described in detail here.

In addition, in order to more clearly explain the operation method of the present disclosure, the following explains the neural network and the operation mode of the neural network in more detail, so as to more clearly understand the technical solution of the operation method of the present disclosure.

Referring to fig. 2, fig. 2 illustrates a structural diagram of a related art neural network. The neural network architecture includes one input and two outputs (output1 and output 2). Each layer located between an input and an output refers to each network layer in the network. Currently, the operation modes of the neural network generally include a layer-by-layer mode, a fusion mode, and an offline mode.

Referring to fig. 3, fig. 3 illustrates the logic of the neural network operating in a layer-by-layer mode. In the layer-by-layer mode, the operation of each network layer (layer) in the neural network is completed by an MLU (artificial intelligence chip), the data transmission between each network layer is completed by a CPU, and the interaction between layers is realized in the CPU, so that the network output of each layer can be directly acquired in the mode.

In order to accelerate the network operation rate and improve the network performance, two neural network operation modes of a fusion mode and an off-line mode are provided. Referring to fig. 4, fig. 4 illustrates the logic of the neural network operating in the fusion mode. In the fusion mode, the operation of each network layer and the interaction between layers are completed in the MLU, and the CPU only participates in the input and output processes. That is, the fusion model means that the data copying process of the fused network layer in the neural network is not processed by the CPU any more, but the tasks such as data copying, data calculation and the like are directly completed on the MLU board. The operation mode of the fusion model is the fusion mode. In the fusion mode, the operation of the whole network is transparent to the CPU, so that the user cannot directly obtain the operation result of the intermediate network layer at this time.

In the offline mode, on the basis of the fusion mode, the model is separated from the framework (i.e., the Caffe model is separated from the Caffe framework) and is changed into a network model (offline model) operation mode independent of the framework. Similarly, in the off-line mode, the running process of the network is transparent to the CPU, and the user cannot directly obtain the operation result of the intermediate network layer.

The foregoing embodiment of the operation method of the present disclosure may be adopted, where an additional output parameter is defined in a configuration file of a network model (Caffe model), and the defined additional output parameter is used to indicate that an intermediate result of the network model is added to an output result of the network model, so that when the Caffe model is executed according to the adjusted Caffe configuration file, the intermediate result can be directly obtained from the output result of the Caffe model.

The operation method disclosed above can achieve the purpose of directly obtaining the operation result (intermediate result) of the intermediate network layer on the MLU only by adjusting the configuration file of the Caffe model and defining additional output parameters in the configuration file, and is simple to operate and easy to implement. Compared with the traditional mode of splitting the network at the position needing to be output, the operation method disclosed by the invention is simple in processing flow and cannot influence the performance of the network.

As one possible implementation, fig. 5 shows a schematic diagram of an operation method according to an embodiment of the present disclosure. Referring to fig. 5, the additional output parameters defined in the configuration file of the Caffe model may be: external _ output. That is, an external _ output parameter is added to at least one non-output layer (other network layers except the output layer) in the Caffe model, and the operation result of the corresponding network layer can be output according to the external _ output parameter during the operation of the Caffe model. The operation is simple and the realization is easy.

As a possible implementation manner, the method further includes: and obtaining a Caffe offline model according to the adjusted Caffe configuration file, so that the Caffe offline model outputs an output result of the Caffe model including an intermediate result when being executed. Namely, after the Caffe offline model is obtained according to the adjusted Caffe configuration file and the Caffe fusion model is further converted into the offline model, any operation method can be sampled to obtain an intermediate result in the offline mode.

It should be noted that the operation method disclosed by the present disclosure is applicable to heterogeneous computing architectures. The heterogeneous computing architecture includes a general purpose processor (CPU) and an artificial intelligence processor. The artificial intelligence processor may be an artificial Intelligence Processor (IPU) for performing artificial intelligence operations, which may include machine learning operations, brain-like operations, and the like. The machine learning operation comprises neural network operation, k-means operation, support vector machine operation and the like. The artificial intelligence processor may include, for example, one or a combination of a GPU (Graphics Processing Unit), a NPU (Neural-Network Processing Unit), a DSP (Digital Signal Processing Unit), and a Field Programmable Gate Array (FPGA) chip. The non-output layer includes: a non-output layer in a converged sub-network operating on an artificial intelligence processor. Meanwhile, according to the fusion model described above, as can be understood by those skilled in the art, the fused sub-network refers to a network obtained by operator fusion of all or part of network layers in the Caffe model.

In addition, as a possible implementation manner of the operation method disclosed by the present disclosure, the value of the additional output parameter may include an output identifier (e.g., true) or a non-output identifier (e.g., false). Accordingly, when the additional output parameter is defined, the addition of the intermediate result of the Caffe model to the output result of the Caffe model can be represented by defining the value of the additional output parameter.

When the value of the additional output parameter is defined to indicate that the intermediate result is added to the output result of the Caffe model, the following may be:

and setting the value of the extra output parameter of the layer to be output (namely, a certain network layer in the middle of the operation result to be output) as true (namely, external _ output: true), and outputting the output of the network layer as the output result of the fused Caffe model when the extra output parameter of the certain network layer is true.

That is to say, the operation result of the network layer with external _ output set as true can be regarded as the normal output result of the fused network, and its actual processing manner is equivalent to the network end result of the Caffe model. Meanwhile, the operation result still continues to participate in the operation in the fused network, so that the network operation is not influenced, and meanwhile, the performance is not obviously reduced.

As a possible implementation manner, in the output result of the Caffe model, the output results of the non-output layers in the intermediate result are arranged in the order of the names of the network layers. That is, when the intermediate result is added to the output result of the Caffe model so that the output result of the Caffe model includes the intermediate result, the intermediate result may include operation results corresponding to a plurality of network layers. In this case, the operation results of the plurality of network layers (i.e., the plurality of intermediate results) may be sequentially arranged in the order of the names of the network layers in the output result of the Caffe model and output.

The intermediate results are sequentially arranged in an output result list of the Caffe model according to the name sequence of the network layers, so that the network layers corresponding to the output results are clear.

According to an aspect of the present disclosure, an arithmetic device 100 is also provided. FIG. 6 shows a block diagram of an embodiment of the computing device 100 of the present disclosure. Referring to fig. 6, the computing device 100 includes:

a parameter definition module 110, configured to define an additional output parameter in a configuration file of a Caffe model to obtain an adjusted Caffe configuration file, where the additional output parameter is used to indicate that an intermediate result of the Caffe model is added to an output result of the Caffe model, and the intermediate result includes an operation result of at least one non-output layer in the Caffe model;

and the model execution module 120 is configured to execute a Caffe fusion model according to the adjusted Caffe configuration file, so as to obtain an output result of the Caffe model including the intermediate result.

As a possible implementation manner, the method further includes:

and the offline model obtaining module is used for obtaining a Caffe offline model according to the adjusted Caffe configuration file, so that the Caffe offline model outputs an output result of the Caffe model including the intermediate result when being executed.

As a possible implementation, the computing device 100 is suitable for heterogeneous computing architectures, which include general-purpose processors and artificial intelligence processors;

As a possible implementation, the value of the additional output parameter includes an output identification or a non-output identification;

As a possible implementation manner, in the output results of the Caffe model, the output results of the middle and past middle non-output layers are arranged in order of the names of the network layers.

According to another aspect of the present disclosure, there is provided a computer device, including a memory and a processor, where the memory stores thereon a computer program operable on the processor, and the processor implements the steps of any one of the operation methods when executing the computer program.

According to another aspect of the present disclosure, there is also provided a readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing the steps of any one of the above operational methods.

According to an aspect of the present disclosure, there is provided a machine learning arithmetic device including one or more arithmetic devices as any one of the above, for acquiring input data and control information to be operated from other processing devices, executing a specified machine learning operation, and transmitting an execution result to the other processing devices through an I/O interface. Other processing devices such as: the device comprises a camera, a display, a mouse, a keyboard, a network card, a wifi interface and a server. When more than one computing device is included, the computing devices can be linked and transmit data through a specific structure, for example, the computing devices are interconnected and transmit data through a PCIE bus, so as to support larger-scale machine learning operations. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.

The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.

FIG. 7 shows a block diagram of a combined processing device 200a according to an embodiment of the present disclosure. Referring to fig. 7, the present disclosure also provides a combined processing device 200a, which includes the above machine learning computing device (neural network computing device 210), the universal interconnection interface 220 and the other processing device 230. The machine learning arithmetic unit 210 interacts with the other processing unit 230 to complete the operation designated by the user.

Other processing devices 230 include one or more processor types of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing device 230 is not limited. The other processing device 230 is used as an interface for the machine learning arithmetic device and external data and control, and comprises data transportation and completes basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices may cooperate with the machine learning computing device to perform computing tasks.

A universal interconnect interface 220 for transmitting data and control commands between the machine learning computing device 210 and other processing devices 230. The machine learning arithmetic device 210 acquires necessary input data from the other processing device 230 and writes the acquired input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices 230 and written into a control cache on the machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.

Fig. 8 shows a block diagram of a combined processing device 200b according to another embodiment of the present disclosure. Referring to fig. 8, the combined processing device 200b of the present disclosure may further include a storage device 240, and the storage device 240 is connected to the machine learning arithmetic device 210 and the other processing device 230, respectively. The storage device 240 is used to store data in the machine learning arithmetic device 210 and the other processing device 230, and is particularly suitable for data that is required to be calculated and cannot be stored in the internal storage of the local machine learning arithmetic device or the other processing device.

This combination processing apparatus 200b can regard as the SOC chip-on-chip system of equipment such as cell-phone, robot, unmanned aerial vehicle, video monitoring equipment, effectively reduces control part's core area, improves processing speed, reduces whole consumption. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.

In some embodiments, a chip is also disclosed, which includes the above machine learning arithmetic device or combined processing device.

In some embodiments, a chip packaging structure is disclosed, which includes the above chip.

In some embodiments, a board card is disclosed, which includes the above chip package structure. Referring to fig. 9, fig. 9 provides a card that may include other kits in addition to the chip 389, including but not limited to: memory device 390, interface device 391 and control device 392.

The memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.

In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.

The interface device is electrically connected with a chip in the chip packaging structure. The interface device is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the interface device.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing andor a plurality of processing circuits in the chip.

In some embodiments, an electronic device is provided that includes the above board card.

The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terms used herein were chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the techniques in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. An arithmetic method, comprising:

wherein, the Caffe model is a fusion model,

the operation method is suitable for heterogeneous computing architecture, and the heterogeneous computing architecture comprises a general processor and an artificial intelligence processor;

2. The method of claim 1, further comprising:

and obtaining a Caffe offline model according to the adjusted Caffe configuration file, so that the Caffe offline model outputs an output result of the Caffe model including the intermediate result when being executed.

3. The method of claim 1, wherein the value of the additional output parameter comprises an output identification or a non-output identification,

4. The method according to any one of claims 1 to 3, wherein in the output results of the Caffe model, the output results of the non-output layers in the intermediate results are arranged in the order of network layer names.

5. An arithmetic device, comprising:

a model execution module for executing Caffe fusion model according to the adjusted Caffe configuration file to obtain the output result of the Caffe model including the intermediate result,

wherein the computing device is adapted for a heterogeneous computing architecture comprising a general purpose processor and an artificial intelligence processor;

6. The apparatus of claim 5, further comprising:

7. The apparatus of claim 5, wherein the value of the additional output parameter comprises an output identification or a non-output identification;

8. The apparatus according to any one of claims 5 to 7, wherein in the output results of Caffe model, the output results of each non-output layer in the intermediate results are arranged in the order of network layer names.

9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 4 when executing the computer program.

10. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 4.

11. A machine learning arithmetic device, characterized in that the machine learning arithmetic device comprises one or more arithmetic devices according to any one of claims 5 to 8, and is used for acquiring input data and control information to be operated from other processing devices, executing specified machine learning operation, and transmitting the execution result to other processing devices through an I/O interface;

the plurality of computing devices share a memory or own respective memories;

12. A combined processing apparatus, characterized in that the combined processing apparatus comprises the machine learning arithmetic apparatus of claim 11, a universal interconnect interface and other processing apparatuses;

13. The combined processing device according to claim 12, further comprising: a storage device;

the storage device is connected to the machine learning arithmetic device and the other processing device, respectively, and is configured to store data of the machine learning arithmetic device or the combined processing device according to claim 12.

14. A neural network chip, comprising a machine learning computation device according to claim 13, or a combined processing device according to claims 12-13.

15. An electronic device, characterized in that the electronic device comprises the neural network chip of claim 14.

16. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface apparatus and a control device and the neural network chip of claim 14;

the storage device is used for storing data;

17. The board of claim 16,

the memory device includes: a plurality of groups of memory cells, each group of memory cells is connected with the neural network chip through a bus, and the memory cells are: DDR SDRAM;

the interface device is as follows: a standard PCIE interface.