CN114186678A

CN114186678A - Hardware adaptation device and method based on deep learning

Info

Publication number: CN114186678A
Application number: CN202111504826.8A
Authority: CN
Inventors: 洪明; 朱鹏阳; 严春伟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2022-03-15
Anticipated expiration: 2041-12-10
Also published as: CN114186678B

Abstract

The disclosure provides a hardware adaptation device and method based on deep learning, and relates to the technical field of deep learning and graph optimization. One embodiment of the apparatus comprises: the deep learning inference framework module is used for obtaining the intermediate representation of the graph structure based on the input target model file; the subgraph engine module is used for fusing the configuration file and the intermediate representation to obtain a subgraph operator, wherein the configuration file is used for defining the data attribute and the hardware type of the subgraph operator; and the hardware adaptation module is used for converting the sub-graph operator into an instruction code executed on target hardware corresponding to the hardware type.

Description

Hardware adaptation device and method based on deep learning

Technical Field

The present disclosure relates to the field of computers, and in particular, to deep learning and graph optimization, and more particularly, to a hardware adaptation apparatus and method based on deep learning.

Background

With the wide application of deep learning technology in various fields, a large number of Artificial Intelligence (AI) chips which are more efficient than traditional architectures such as a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU) are developed; the good software ecology is the key for successful AI hardware, depends on the maturity of the software stack of the hardware manufacturer and whether the extensive support of a deep learning reasoning framework can be obtained, and the later can help the user to simplify the service deployment process and reduce the migration cost caused by hardware difference, thereby quickly obtaining higher performance and energy efficiency benefits.

At present, the following schemes are generally adopted by various large-depth learning reasoning frameworks: (1) delegation (delete) method: TensorFlow Lite and MindSpore Lite commonly adopt a delegation (Delegate) mode to operate partial operators in the model on accelerators such as a GPU, a Digital Signal Processor (DSP) and a Network Processor (NPU), manufacturers only need to respectively realize Delegate subclasses and interfaces thereof for each hardware during adaptation, a plurality of operators supported by the hardware are fused into a sub-graph operator during model optimization, and the sub-graph is converted into a hardware model when the sub-graph operator is operated during the model execution and a result is returned to a framework after the sub-graph operator is executed. (2) Android NNAPI (Android Networks API) and Android NN Runtime (ART): in order to decouple the inference framework and hardware adaptation, Google standardizes interfaces of device management, model networking, execution and the like of a framework layer to establish an Android NNAPI interface system, and then adapts different AI hardware through Android NN Runtime, so that a hardware manufacturer can theoretically access different deep learning inference frameworks such as TensorFlow Lite and PyTorch Mobile (Mobile terminal) through uniform Android NNAPI after adapting the Android NN Runtime.

Disclosure of Invention

The embodiment of the disclosure provides a hardware adaptation device and method based on deep learning.

In a first aspect, an embodiment of the present disclosure provides a hardware adaptation device based on deep learning, including: the deep learning inference framework module is used for obtaining the intermediate representation of the graph structure based on the input target model file; the subgraph engine module is used for fusing the configuration file and the intermediate representation to obtain a subgraph operator, wherein the configuration file is used for defining the data attribute and the hardware type of the subgraph operator; and the hardware adaptation module is used for converting the sub-graph operator into an instruction code executed on target hardware corresponding to the hardware type.

In a second aspect, an embodiment of the present disclosure provides a hardware adaptation method based on deep learning, including: acquiring a configuration file and a target model file; obtaining an intermediate representation of the graph structure by using a deep learning inference framework; fusing the intermediate representation by using a configuration file and a subgraph engine to obtain a subgraph operator, wherein the configuration file is used for defining the data attribute and the hardware type of the subgraph operator; and converting the sub-graph operator into instruction codes executed on target hardware corresponding to the hardware type according to the hardware adaptation framework.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the second aspect.

In a fourth aspect, the disclosed embodiments propose a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in the second aspect.

In a fifth aspect, the disclosed embodiments propose a computer program product comprising a computer program that, when executed by a processor, implements the method as described in the second aspect.

According to the hardware adaptation device and method based on deep learning, provided by the embodiment of the disclosure, an intermediate representation of a graph structure corresponding to an input target model file is obtained through a deep learning inference framework module; fusing the configuration file and the intermediate representation through a sub-graph engine module to obtain a sub-graph operator, wherein the configuration file is used for defining the data attribute and the hardware type of the sub-graph operator; and the hardware adaptation module is used for converting the sub-graph operator into an instruction code executed on target hardware corresponding to the hardware type. The hardware adaptation module can be established between the deep learning inference frame module and the target hardware, so that the decoupling of hardware adaptation and the deep learning inference frame is realized, the learning threshold of the deep learning inference frame is reduced, the data attribute and the hardware type can be defined through the configuration file, the use scene of the data attribute can be customized, and the precision of operator operation is improved. Meanwhile, any change of the deep learning inference framework is absorbed by the hardware adaptation layer, so that the instruction code has stronger robustness and maintainability.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Other features, objects, and advantages of the disclosure will become apparent from a reading of the following detailed description of non-limiting embodiments which proceeds with reference to the accompanying drawings. The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of one embodiment of a deep learning based hardware adaptation apparatus according to the present disclosure;

FIG. 2 is a schematic diagram of a sub-graph engine module;

FIG. 3 is a schematic diagram of a hardware adaptation module;

FIG. 4 is a flow diagram of one embodiment of a deep learning based hardware adaptation method according to the present disclosure;

FIG. 5 is a block diagram of an electronic device used to implement an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows a schematic diagram of an embodiment of a deep learning based hardware adaptation apparatus to which the present disclosure may be applied.

As shown in fig. 1, the hardware adaptation apparatus based on deep learning may include: a deep learning inference framework module 101, a sub-graph engine module 102 and a hardware adaptation module 103; wherein the content of the first and second substances,

and the deep learning inference framework module 101 is used for the deep learning inference framework module and obtaining the intermediate representation of the graph structure based on the input target model file. And the sub-graph engine module 102 is configured to fuse the configuration file and the intermediate representation to obtain a sub-graph operator, where the configuration file is used to define data attributes and hardware types of the sub-graph operator. A hardware adaptation module 103, configured to convert the sub-graph operator into instruction code executed on a target hardware corresponding to the hardware type.

In this embodiment, the deep learning inference framework module 101 is specifically configured to: analyzing the input target model file to acquire topological structure information of the neural network; feature graph information and computational operation information in the topology information are used as nodes and edges, respectively, to generate an intermediate representation of the graph structure.

In one example, deep learning inference framework module 101 may pass parsing sub-modules each corresponding to a type of model file, each parsing sub-module for parsing a corresponding type of model file. For example, respective paddlepaddleparser and tensrflow parsers are used to parse model files acquired via paddlepaddlepaddleand tensrflow deep learning inference framework modules, respectively. Such as paddlepaddlee, Caffe, TensorFlow, MxNet, PyTorch, and the like. The deep learning inference framework module can set different computational graph models by using the DSL and the API of the framework to realize specific tasks, such as face recognition, image detection, voice recognition and the like.

It should be noted that, for the new deep learning inference framework, a parser for the model file may be added accordingly.

Here, the intermediate representation is generated as a node representation feature graph, the edges are represented as a computation graph of computation operations, and the nodes and edges each include attribute features. The attributes of the nodes may include dimension information and/or length-width channel information of the feature map. The computing operation of the edge representation includes at least one of: convolution, pooling, dimension transformation, point addition (eltwise), deconvolution, rearrangement, non-linearity, batch normalization (BatchNorm), scaling (scale). The attributes of the edge then include parameters of the computing operation and include at least one of: convolution kernel size, extended edge (pad), stride, grouping, expansion (disparity).

In one example, the intermediate representation is IR in the form of a graph, each node of the graph represents Op (operator) including but not limited to convolution, pooling, dimensional transformation, point addition, deconvolution, normalization, non-linearity, etc., and tensors, which represent input, output data of the operator. Each edge of the graph represents the dependency relationship of an operator and a tensor, and an operator node is adjacent to a tensor node.

In this embodiment, the deep learning inference framework module may analyze the neural network model (i.e., the target model file) developed on different depth learning frameworks into a framework-independent Intermediate Representation (IR), thereby implementing decoupling between the deep learning framework and the hardware adaptation module, and converting graph structures with different particle sizes of various deep learning inference frameworks into the fixed-particle intermediate representation in the present disclosure.

In this embodiment, the sub-graph engine module 102 is specifically configured to: and combining the calculation operations, taking the feature graph as a node, and taking the combined plurality of calculation operations as subgraph operators of the edge.

Wherein merging the computing operations comprises at least one of: removing unnecessary or non-influence operation on the calculation result; fusing a plurality of adjacent computing operations; and decomposing the computing operation to fuse or implement processing of the decomposed computing operation with a preceding or subsequent computing operation.

Here, the merging is merging a plurality of adjacent calculation operations.

In one example, the fusion may fuse a computing operation into a pre-and post-access operation and/or a computing operation.

Correspondingly, in this example, obtaining the operator subgraph comprises the following steps:

the first step is as follows: and (5) marking an operator.

And sequentially traversing each operator in the graph according to the topological sequence of the target model file, and marking the operator which can be converted into the hardware IR according to the registered Paddle operator- > hardware IR conversion table.

For example, the graph includes 10 operators Op 1-Op 10, and assuming that Op1, Op3 and Op10 cannot be converted to hardware IR, then Op1, Op3 and Op10 are labeled as a class of algorithms (e.g., labeled yellow); while Op2, Op4, Op5, Op6, Op7, Op8 and Op9 are labeled as another class of operators (e.g., labeled red), indicating that Op2, Op4, Op5, Op6, Op7, Op8 and Op9 operators can be converted to IR for hardware.

The second step is that: detecting a subgraph;

and (3) adopting a reverse Depth-First-Search (DFS) algorithm to mark operators marked in the First part as the same subgraph.

For example, Op2 is divided separately to sub fig. 1, while Op4, Op5, Op6, Op7, Op8 and Op9 are divided to sub fig. 2. That is, there is no dependency between OP2 and OP4, OP5, OP6, OP7, OP8, and OP 9. At this time, the operators with the dependency relationship are marked as the same subgraph.

The third step: subgraph fusion;

and performing fusion processing on the subgraphs obtained in the second step, for example, deleting the subgraphs with operators less than the preset operators in the subgraphs.

In one example, if a subgraph has too few operators, the subgraph is deleted and then the other subgraphs undergo operator fusion.

Specifically, a subgraph operator is used for representing all operators contained in the subgraph, all the operators contained in the subgraph are stored in a Program (Program desc) in the form of new blocks (Block desc), and Block indexes are stored in the subgraph operator in the form of attributes.

It should be noted that the preset operator may be 1 operator. The preset operator can be set according to the inference precision and the inference speed or by related personnel. Wherein whether the operator is supported in the hardware is determined by the hardware platform.

In this embodiment, subgraphs with fewer operators are deleted, so that the overhead caused by excessive data copying at hardware and Host (Host) ends can be reduced.

In this embodiment, the hardware adaptation module 103 is specifically configured to: the sub-graph operator is converted into instruction code that is executed on target hardware corresponding to the hardware type.

In one example, the subgraph operator can be mapped to the instruction code of the target hardware. The instruction code may be a specific instruction for the target hardware. The target hardware may be at least one of: target hardware based on FPGA or ASIC; target hardware based on a GPU; a target hardware based on the CPU.

In this embodiment, in order to deploy the target model file, the sub-graph operator needs to be compiled into binary instruction code that can be executed by the target hardware. For example, the sub-graph operator can be compiled into an instruction code of a dpu (deep Learning Processor unit) platform (i.e., target hardware) through dnnc (deep Neural Network compiler), such as a serialized model generated by Nvidia TensorRT, so as to perform inference calculation of a corresponding target model file.

According to the hardware adaptation device based on deep learning provided by the embodiment of the disclosure, the intermediate representation of the graph structure corresponding to the input target model file is obtained through the deep learning inference frame module; fusing the configuration file and the intermediate representation through a sub-graph engine module to obtain a sub-graph operator, wherein the configuration file is used for defining the data attribute and the hardware type of the sub-graph operator; the sub-graph operator is converted into instruction code executed on the target hardware by the hardware adaptation module. The hardware adaptation module can be established between the deep learning inference frame module and the target hardware, decoupling of hardware adaptation and the deep learning inference frame is achieved, the learning threshold of the deep learning inference frame is reduced, and data attributes and hardware types can be defined through the configuration file, so that the use scene of the data attributes can be customized, the precision of operator operation is improved, meanwhile, any change of the deep learning inference frame is absorbed by the hardware adaptation layer, and the instruction code has strong robustness and maintainability.

In some optional implementations of this embodiment, the hardware adaptation module 103 is specifically configured to: and converting the sub-graph operator into instruction codes executed on target hardware corresponding to the interface through the interface corresponding to the sub-graph operator.

In this implementation, the hardware adaptation module 103 may convert the sub-graph operator into an instruction code executed on a target hardware corresponding to the interface through the interface corresponding to the sub-graph operator.

In some optional implementations of this embodiment, the interface includes at least one of: the system comprises a hardware management interface, a multi-hardware unified context interface, a model networking interface, a model compiling interface and a model executing interface.

In this implementation, the hardware adaptation framework module includes an inference framework adaptation layer interface, a runtime, a hardware abstraction layer standard interface, and a standard operator definition (as shown in fig. 2).

In one example, a framework adaptation layer standard interface that adapts different deep learning inference frameworks to achieve a complete decoupling from the deep learning inference framework may include at least one of: hardware management interface, the unified context interface of many hardware, model networking interface, model compilation interface and model execution interface:

the hardware management interface is used for inquiring hardware basic information including a hardware name, a manufacturer name, an accelerator card type, a hardware abstraction layer library version, hardware acquisition and initialization and the like.

And the multi-hardware unified context interface is used for creating a plurality of hardware unified contexts. Preferably, the multi-hardware unified context interface provides parameters such as hardware operation, model compiling and executing for each hardware configuration by means of key value strings.

And the model networking interface is used for realizing the decoupling with the model expression mode in the deep learning inference framework. So as to establish a uniform intermediate representation independent of hardware, thereby transforming operators and tensor objects in the model of the inference framework into an internal uniform expression.

And the model compiling interface and the model executing interface are used for realizing the conversion of the intermediate representation of the model to a target hardware code (namely, an instruction code) by calling a hardware manufacturer software stack in the hardware abstraction layer library, and returning a result to the deep learning inference framework after execution.

It should be noted that, in order to reduce the overhead caused by online patterning (i.e., the above graph structure) and model compilation by the model compilation interface, the build hardware model (i.e., the instruction code) may be read from the cache by the model compilation interface. The model compiling interface comprises the following steps:

model compiler (Model online Model, void cache Model, …); // compile to generate the hardware model from the incoming model intermediate representation online _ model or model cache _ model.

In this implementation, in order to create an intermediate representation independent of the deep-learning inference framework, hardware-independent, runtime-and hardware-abstraction-level unified, in addition to the data structure needed to define the model and its contained operators and tensors, the operator type and parameter list are also normalized.

In this implementation, during operation, the framework adaptation layer and the hardware abstraction layer serve as a bridge, and functions of not only translating the call of the framework adaptation layer interface into the intermediate representation of the model, the operator and the tensor and the call of the hardware abstraction layer interface, but also registering the hardware abstraction layer library, and serializing and deserializing the model cache.

It should be noted that, when the hardware model is operated, the hardware model cache format and the serialization and deserialization processes need to be unified, the model cache data set by the interface of the frame adaptation layer is deserialized and then transferred to the hardware abstraction layer, and the instruction code is recovered by each hardware abstraction layer library, so that the mode can establish a unified model cache analysis and recovery process for different hardware (i.e., target hardware).

In one example, a hardware abstraction layer standard interface is used to shield hardware details and provide a uniform device access interface to a runtime.

It should be noted that a hardware abstraction layer is established between the runtime and the vendor software stack, and the hardware abstraction layer is composed of data structures such as the uniform device interface description, the model, the intermediate representation of the operator and the tensor, which are realized by the C structural body, as follows:

the above interface definition mainly relates to hardware-related basic information, such as hardware name, vendor name, accelerator type and hardware abstraction layer interface version, and more important definition of hardware function call interface.

In one example, an open _ device interface is called when the hardware is initialized during runtime, create _ context is called when the hardware context is created, a model intermediate representation or model cache data needs to be compiled to generate a hardware model, a create _ program interface is called, and finally an execute _ program is called when the hardware model is executed. Hardware manufacturers only need to realize the interfaces to complete the adaptation of the hardware, the realization principle of the deep learning reasoning framework does not need to be known, and the learning and development cost can be greatly reduced.

In some optional implementations of this embodiment, the data attribute includes: data type and/or data structure.

In this implementation, the data attributes may include: data type and/or data structure.

In one example, fp32, int8, int16 can be set to equate data types in units of operators in the target model file.

In one example, the data structure is translated according to standard types, e.g., to normalize operator types and parameter lists.

In some optional implementations of this embodiment, the data type includes at least one of: fp32, int8, int 16.

In this implementation, the data type includes at least one of: fp32, int8, int 16.

In one example, the subgraph engine module includes blending precision processing, subgraph detection/fusion, subgraph transformation/execution to transform the intermediate representation into subgraph operators (as shown in fig. 3).

The blending precision processing is used for setting quantization data types such as fp32, int8, int16 and the like by taking an operator in a target model file as a unit according to blending precision configuration information defined by a user, and taking the following formats as examples:

input _ name _ list output _ name _ list precision _ type// calculation type// list of input tensor names// list of output tensor names// type of device

op _ type input _ name _ list precision type

op_type::output_name_list:precision_type

op_type:::precision_type

For example:

data types conv2d in _ var0, in _ var1 out _ var0 int8// tensor

Fp32// meaning that all softmax operators run at fp32 precision

In this implementation, quantization or inverse quantization operators are automatically inserted between operators of different precision in the intermediate representation.

In one example, for already quantized parameters, such as the filter of conv2d, if conv2d is forced to run on fp32 precision calculations, the filter is also restored from the quantized type to the floating point type.

In one example, subgraph detection, fusion, for setting device types in units of operators in target model files according to user-defined subgraph detection configuration information

For example, the device types are:

input _ name _ list output _ name _ list device _ type _ device _ type// operator type// list of input tensor names// list of output tensor names// device type

op_type:input_name_list::device_type

op_type::output_name_list:device_type

op_type:::device_type

Several cases are supported, for example:

transpose in _ var0: out _ var0: CPU// means that all the transpose operators run on the CPU

CPU// means that all softmax operators run on the CPU.

In the sub-graph detection process, a specific operator cannot be drawn into a hardware sub-graph, which is often used in some detection class target model files, namely, a backbone structure of the first half part of the target model file runs in a hardware accelerator, and a post-processing operator runs on a CPU.

In one example, a subgraph transforms, executes a call to turn each operator in the subgraph into a hardware networking API.

With further reference to fig. 4, fig. 4 illustrates a flow 400 of one embodiment of a deep learning based hardware adaptation method according to the present disclosure. The hardware adaptation method based on deep learning can comprise the following steps:

step 401, obtaining a configuration file and a target model file.

In this embodiment, the execution subject (e.g., server) of the deep learning based hardware adaptation method may obtain the configuration file and the target model file locally or externally. The configuration file may be used to define the data type of the sub-graph operator and the target hardware on which the sub-graph operator is run. The target model file may be a model file to be deployed on the target hardware.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

And step 402, obtaining an intermediate representation of the graph structure by using a deep learning inference framework.

In this embodiment, the execution subject may pass through parsing sub-modules each corresponding to one type of model file, and each parsing sub-module is configured to parse a model file of a corresponding type. For example, respective paddlepaddleparser and tensrflow parsers are used to parse model files acquired via paddlepaddlepaddleand tensrflow deep learning inference framework modules, respectively. Such as paddlepaddlee, Caffe, TensorFlow, MxNet, PyTorch, and the like. The deep learning inference framework module can set different computational graph models by using a DSL and an Application Programming Interface (API) of the framework, so as to realize specific tasks, such as face recognition, image detection, voice recognition, and the like.

And step 403, fusing the intermediate representation by using the configuration file and the subgraph engine to obtain a subgraph operator, wherein the configuration file is used for defining the data type and the hardware type of the subgraph operator.

In this embodiment, the execution body may merge the computation operations, and use the feature graph as a node, and use the merged multiple computation operations as a sub-graph operator of an edge.

Here, the merging is merging a plurality of adjacent calculation operations.

the first step is as follows: and (5) marking an operator.

The second step is that: detecting a subgraph;

The third step: subgraph fusion;

Step 404, converting the sub-graph operator into an instruction code executed on the target hardware according to a hardware adaptation framework.

In this embodiment, the execution body may convert the sub-graph operator into an instruction code executed on the target hardware.

The hardware adaptation method based on deep learning provided by the embodiment of the disclosure includes the steps of firstly, obtaining a configuration file and a target model file; then, obtaining an intermediate representation of the graph structure by utilizing a deep learning inference framework; then, fusing the intermediate representation by using a configuration file and a subgraph engine to obtain a subgraph operator, wherein the configuration file is used for defining the data type and the hardware type of the subgraph operator; the sub-graph operator is then converted into instruction code that is executed on the target hardware according to the hardware adaptation framework. The hardware adaptation frame can be established between the deep learning inference frame and the target hardware, so that the decoupling of the hardware adaptation frame and the deep learning inference frame is realized, the learning threshold of the deep learning inference frame is reduced, the data attribute and the hardware type can be defined through the configuration file, the use scene of the data attribute can be customized, and the precision of operator operation is improved. Meanwhile, any change of the deep learning inference framework is absorbed by the hardware adaptation framework, so that the instruction code has stronger robustness and maintainability.

In some optional implementations of this embodiment, converting the sub-graph operator into instruction code executed on target hardware corresponding to the hardware type according to a hardware adaptation framework includes: and converting the sub-image operator into an instruction code executed on target hardware corresponding to the interface according to the interface corresponding to the sub-image operator in the hardware adaptation frame.

In this implementation manner, the execution main body may convert the sub-graph operator into an instruction code executed on a target hardware corresponding to the interface through an interface corresponding to the sub-graph operator in the hardware adaptation frame.

It should be noted that the configuration file is also used to define a cache format of the object model file.

In one example, the subgraph engine module includes blending precision processing, subgraph detection/fusion, subgraph transformation/execution to transform the intermediate representation into subgraph operators.

op_type:input_name_list::precision_type

op_type::output_name_list:precision_type

op_type:::precision_type

For example:

conv2d:in_var0,in_var1:out_var0:int8

fp32// meaning that all softmax operators run at fp32 precision

For example, the device types are:

op_type:input_name_list:output_name_list:device_type

op_type:input_name_list::device_type

op_type::output_name_list:device_type

op_type:::device_type

several cases are supported, for example:

transpose:in_var0:out_var0:cpu

CPU// means that all softmax operators run on the CPU.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as a hardware adaptation method based on deep learning. For example, in some embodiments, the deep learning based hardware adaptation method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the deep learning based hardware adaptation method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the deep learning based hardware adaptation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

Artificial intelligence is the subject of studying computers to simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural voice processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel, sequentially, or in a different order, as long as the desired results of the technical solutions mentioned in this disclosure can be achieved, and are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A hardware adaptation apparatus based on deep learning, comprising:

the deep learning inference framework module is used for obtaining the intermediate representation of the graph structure based on the input target model file;

the subgraph engine module is used for fusing the configuration file and the intermediate representation to obtain a subgraph operator, wherein the configuration file is used for defining the data attribute and the hardware type of the subgraph operator;

and the hardware adaptation module is used for converting the sub-graph operator into an instruction code executed on target hardware corresponding to the hardware type.

2. The apparatus according to claim 1, wherein the hardware adaptation module is specifically configured to:

and converting the sub-graph operator into instruction codes executed on target hardware corresponding to the interface through the interface corresponding to the sub-graph operator.

3. The apparatus of claim 1 or 2, wherein the interface comprises at least one of: the system comprises a hardware management interface, a multi-hardware unified context interface, a model networking interface, a model compiling interface and a model executing interface.

4. The apparatus of any of claims 1-3, wherein the data attributes comprise: data type and/or data structure.

5. The apparatus of claim 4, wherein the data type comprises at least one of: fp32, int8, int 16.

6. A hardware adaptation method based on deep learning comprises the following steps:

acquiring a configuration file and a target model file;

obtaining an intermediate representation of the graph structure by using a deep learning inference framework;

fusing the intermediate representation by using the configuration file and the subgraph engine to obtain a subgraph operator, wherein the configuration file is used for defining the data attribute and the hardware type of the subgraph operator;

and converting the sub-graph operator into an instruction code executed on target hardware corresponding to the hardware type according to a hardware adaptation framework.

7. The method of claim 6, wherein said converting the sub-graph operator into instruction code executing on target hardware corresponding to the hardware type according to a hardware adaptation framework comprises:

and converting the sub-graph operator into an instruction code executed on target hardware corresponding to the interface according to the interface corresponding to the sub-graph operator in the hardware adaptation frame.

8. The method of claim 6 or 7, wherein the interface comprises at least one of: the system comprises a hardware management interface, a multi-hardware unified context interface, a model networking interface, a model compiling interface and a model executing interface.

9. The method of any of claims 6-8, wherein the data attributes comprise: data type and/or data structure.

10. The method of claim 9, wherein the data type comprises at least one of: fp32, int8, int 16.

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 6-10.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 6-10.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 6-10.