CN112947935A

CN112947935A - Operation method and device, electronic device and storage medium

Info

Publication number: CN112947935A
Application number: CN202110224062.0A
Authority: CN
Inventors: 江子山; 许志耿
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-02-26
Filing date: 2021-02-26
Publication date: 2021-06-11

Abstract

The present disclosure relates to an operation method and apparatus, an electronic device, and a storage medium, the method including: acquiring an initial model file for realizing neural network operation; in response to the selection operation aiming at the multiple types of neural network processors, compiling the initial model file based on the selected target neural network processor to obtain a target model file matched with the target neural network processor; and operating the target model file in the target neural network processor to obtain an operation result. The embodiment of the disclosure can reduce the deployment and migration costs of the algorithm model on different neural network processors in the operation, and improve the generality and flexibility of the operation.

Description

Operation method and device, electronic device and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an operation method and apparatus, an electronic device, and a storage medium.

Background

Rapid development and deep learning in the field of artificial intelligence, and rising of Neural networks are not separable, and as the design of Neural networks is increasingly complex, the main computing platform of the Neural Network is gradually migrated to a Neural Network Processor (NPU). The NPU is a computing framework specially designed for neural network computing, can meet the computing force requirement of the neural network computing task, can provide higher computing performance, energy efficiency and instantaneity, and greatly reduces computing energy consumption.

However, due to the special-purpose of the NPU computing architecture, the NPU is generally bound to a special model file and a back-end software stack to implement the compiling and running of the NPU, and thus the cost of algorithm deployment and migration in the operation is too high.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure proposes an operation scheme.

According to an aspect of the present disclosure, there is provided an arithmetic method including:

acquiring an initial model file for realizing neural network operation; in response to the selection operation aiming at the multiple types of neural network processors, compiling the initial model file based on the selected target neural network processor to obtain a target model file matched with the target neural network processor; and operating the target model file in the target neural network processor to obtain an operation result.

In a possible implementation manner, the compiling the initial model file based on the selected target neural network processor to obtain a target model file matched with the target neural network processor includes: constructing a first calculation graph according to the initial model file, wherein the first calculation graph is formed by one or more preset operators in a first preset operator library; acquiring a target operator library corresponding to the target neural network processor based on the selected target neural network processor; converting the first computational graph into a second computational graph matched with the target neural network processor according to a conversion relation between the first preset operator library and the target operator library, wherein the second computational graph is formed by one or more target operators in the target operator library; and generating a target model file matched with the target neural network processor based on the second computational graph.

In a possible implementation manner, the building a first computation graph according to the initial model file includes: analyzing the initial model file to obtain an analysis result; constructing a third computation graph based on the analysis result, wherein the third computation graph is formed by one or more preset operators in the first preset operator library; and optimizing the third calculation graph based on the operation mode of the selected target neural network processor to obtain a first calculation graph.

In a possible implementation manner, the optimizing the third computation graph based on the operation manner of the selected target neural network processor to obtain a first computation graph includes: and performing one or more of operator fusion, operator splitting and operator replacement on a preset operator in the third calculation graph based on the operation mode of the selected target neural network processor to obtain a first calculation graph.

In one possible implementation, the converting the first computation graph into a second computation graph matched with the target neural network processor according to a conversion relationship between the first preset operator library and the target operator library includes: respectively converting a plurality of preset operators in the first calculation graph according to a conversion relation between the first preset operator library and the target operator library to obtain a plurality of conversion results respectively corresponding to the plurality of preset operators in the first calculation graph, wherein the conversion results comprise target operators and/or target operator subgraphs in the target operator library, and the target operator subgraphs are formed by the plurality of target operators in the target operator library; and connecting the plurality of conversion results according to the connection relation between preset operators in the first calculation graph to obtain a second calculation graph matched with the target neural network processor.

In a possible implementation manner, before the connecting the plurality of conversion results according to the connection relationship between preset operators in the first computation graph to obtain a second computation graph matched with the target neural network processor, the method further includes: and fusing and converting part of preset operators in the first calculation graph into target operators in the target operator library as conversion results.

In one possible implementation manner, the obtaining, based on the selected target neural network processor, a target operator library corresponding to the target neural network processor includes: based on the selected target neural network processor, selecting a packaged operator library corresponding to the target neural network processor from a second preset operator library as a target operator library; and the second preset operator library comprises packaging operator libraries corresponding to the multiple types of neural network processors respectively.

In one possible implementation, the encapsulation operator library includes: and the original operator in the neural network processor corresponding to the packaging operator library and/or the custom operator developed by the neural network processor corresponding to the packaging operator library.

In a possible implementation manner, the compiling the initial model file further includes: and performing compilation corresponding to the format of the initial model file based on the format of the initial model file.

According to an aspect of the present disclosure, there is provided an arithmetic device including:

the initial model file acquisition module is used for acquiring an initial model file for realizing neural network operation; the target model file generation module is used for responding to selection operation aiming at multiple types of neural network processors, compiling the initial model file based on the selected target neural network processor and obtaining a target model file matched with the target neural network processor; and the operation module is used for operating the target model file in the target neural network processor to obtain an operation result.

In one possible implementation, the object model file generating module is configured to: constructing a first calculation graph according to the initial model file, wherein the first calculation graph is formed by one or more preset operators in a first preset operator library; acquiring a target operator library corresponding to the target neural network processor based on the selected target neural network processor; converting the first computational graph into a second computational graph matched with the target neural network processor according to a conversion relation between the first preset operator library and the target operator library, wherein the second computational graph is formed by one or more target operators in the target operator library; and generating a target model file matched with the target neural network processor based on the second computational graph.

In one possible implementation, the object model file generating module is further configured to: analyzing the initial model file to obtain an analysis result; constructing a third computation graph based on the analysis result, wherein the third computation graph is formed by one or more preset operators in the first preset operator library; and optimizing the third calculation graph based on the operation mode of the selected target neural network processor to obtain a first calculation graph.

In one possible implementation, the object model file generating module is further configured to: and performing one or more of operator fusion, operator splitting and operator replacement on a preset operator in the third calculation graph based on the operation mode of the selected target neural network processor to obtain a first calculation graph.

In one possible implementation, the object model file generating module is further configured to: respectively converting a plurality of preset operators in the first calculation graph according to a conversion relation between the first preset operator library and the target operator library to obtain a plurality of conversion results respectively corresponding to the plurality of preset operators in the first calculation graph, wherein the conversion results comprise target operators and/or target operator subgraphs in the target operator library, and the target operator subgraphs are formed by the plurality of target operators in the target operator library; and connecting the plurality of conversion results according to the connection relation between preset operators in the first calculation graph to obtain a second calculation graph matched with the target neural network processor.

In one possible implementation manner, the target model file generation module is further configured to: and fusing and converting part of preset operators in the first calculation graph into target operators in the target operator library as conversion results.

In one possible implementation, the object model file generating module is further configured to: based on the selected target neural network processor, selecting a packaged operator library corresponding to the target neural network processor from a second preset operator library as a target operator library; and the second preset operator library comprises packaging operator libraries corresponding to the multiple types of neural network processors respectively.

In one possible implementation manner, the target model file generation module is further configured to: and performing compilation corresponding to the format of the initial model file based on the format of the initial model file.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the present disclosure, the initial model file may be obtained, and in response to a selection operation for multiple types of neural network processors, the initial model file is compiled based on the selected target neural network processor to obtain a target model file matched with the target neural network processor, so that the target model file is run in the target neural network processor to obtain an operation result. Through the process, according to the operation method and device, the electronic device and the storage medium provided by the embodiment of the disclosure, the initial model file can be converted into the target model file matched with the selected target neural network processor to realize the operation in the target neural network processor, so that the operation of the same algorithm on different neural network processors can be realized based on the same initial model file, the deployment and migration costs of the algorithm model on different neural network processors in the operation are reduced, and the universality and flexibility of the operation are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow chart of a method of operation according to an embodiment of the present disclosure.

Fig. 2 shows a flow chart of a method of operation according to an embodiment of the present disclosure.

Fig. 3 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure.

Fig. 4 shows a schematic diagram of an application example according to the present disclosure.

Fig. 5 shows a schematic diagram of an application example according to the present disclosure.

Fig. 6 shows a schematic diagram of an application example according to the present disclosure.

Fig. 7 shows a schematic diagram of an application example according to the present disclosure.

Fig. 8 shows a schematic diagram of an application example according to the present disclosure.

Fig. 9 shows a schematic diagram of an application example according to the present disclosure.

Fig. 10 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure.

Fig. 11 shows a block diagram of an electronic device 1900 according to an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

Fig. 1 shows a flowchart of an operation method according to an embodiment of the present disclosure, and the method may be applied to an operation device, which may be a terminal device, a server, or other processing device. The terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In one example, the operation method may be applied to a processor connected to a neural Network Processor (NPU), where the processor may be a general-purpose processor such as a CPU and/or a GPU; in one example, the arithmetic method may also be directly applied to the NPU.

In some possible implementations, the method of operation may also be implemented by a processor calling computer readable instructions stored in a memory.

As shown in fig. 1, in a possible implementation manner, the operation method may include:

and step S11, acquiring an initial model file for realizing the neural network operation.

In a possible implementation manner, the initial model file may be a file that contains description information of the trained neural network model and each weight parameter. The initial model file may have different formats depending on the training framework used by the neural network model during the training process, and is not limited in the embodiments of the present disclosure, but is not limited to the following embodiments. In a possible implementation manner, the initial model file may be a prototxt and a cafemodel file format corresponding to a Convolutional neural network framework (Convolutional Architecture for Fast Feature Embedding), and/or a pb file format corresponding to a tensrflow training framework, and the like.

The obtaining mode of the initial model file is not limited in the embodiment of the disclosure, and can be flexibly determined according to the actual situation. In a possible implementation manner, the initial model file corresponding to the trained neural network model may be obtained by training the neural network model defined based on the code file for multiple times. In a possible implementation manner, a preset initial model file and the like may be directly obtained.

The data content contained in the initial model file is not limited in the embodiment of the present disclosure, and can be flexibly determined according to the actual situation. In some possible implementations, the initial model file may contain one or more of data shape, data type, and data arrangement of input data and output data of the neural network model.

And step S12, in response to the selection operation aiming at the multiple types of neural network processors, compiling the initial model file based on the selected target neural network processor to obtain a target model file matched with the target neural network processor.

The multiple types of neural network processors may be different types of neural network processors, such as neural network processors NPUs developed by different manufacturers, or different types of NPUs developed by the same manufacturer. The specific types of neural network processors included in the multi-type neural network processor can be flexibly configured according to actual situations, and are not limited to the following embodiments, but in one possible implementation, the multi-type neural network processor may include one or more of a Shengteng chip, a Mianji MLU chip, and a light-containing 800 chip.

The selecting operation for the multiple types of neural network processors may be an operation of selecting one or some of the multiple types of neural network processors to implement a subsequent operation. How to realize the selection operation for multiple types of neural network processors can be flexibly determined according to actual conditions, and the implementation form is not limited to the following disclosed embodiments. In one possible implementation, the selection operation for multiple types of neural network processors may be implemented by configuring a neural network processor for implementing operations during parameter configuration.

The target neural network processor may be a neural network processor selected from a plurality of types of neural network processors, and the target model file may be a model file that the target processor can read and run. Because of different neural network processors, formats of supported model files may be different, so that different target model files cannot be compiled or run based on an initial model file directly to realize operation. Therefore, in one possible implementation, the initial model file may be compiled based on the selected target neural network processor to obtain a target model file that matches the target neural network processor and is readable by the target neural network processor.

How to compile the initial model file based on the selected target neural network processor can be flexibly determined according to actual conditions, and the compiling process is detailed in the following disclosed embodiments and is not expanded at first.

And step S13, operating the target model file in the target neural network processor to obtain an operation result.

After obtaining the target model file readable by the target neural network processor, the target model file may be run in the target neural network processor. The mode of the target neural network processor running the target model file can be flexibly determined according to the actual situation of the target neural network processor, and is detailed in the following disclosed embodiments, which is not expanded at first. In one possible implementation manner, the target neural network may perform inference operation of the neural network on the acquired input data based on the target model file to obtain an operation result.

Fig. 2 shows a flowchart of an operation method according to an embodiment of the present disclosure, and as shown in the figure, in one possible implementation, step S12 may include:

and step S121, constructing a first calculation graph according to the initial model file, wherein the first calculation graph is formed by one or more preset operators in a first preset operator library.

And step S122, acquiring a target operator library corresponding to the target neural network processor based on the selected target neural network processor.

And step S123, converting the first calculation graph into a second calculation graph matched with the target neural network processor according to the conversion relation between the first preset operator library and the target operator library, wherein the second calculation graph is formed by one or more target operators in the target operator library.

And step S124, generating a target model file matched with the target neural network processor based on the second calculation graph.

The operators can be algorithms commonly used in the neural network, such as convolution operators, full-link operators, pooling operators or activation operators, and the like, and different operators are packaged to obtain different operator libraries. For each neural network processor, the corresponding original operator library can be developed together in the design process.

The computation graph is a language used to describe the computation, which is a way to formalize the computation. In the case of code implementation of operators in computational graphs, the computational graphs may be serialized into a model file readable by a neural network processor, such that the neural network processor runs the model file to implement the operation. Because the original operator libraries corresponding to different neural network processors are different, the different neural network processors can be matched, and the calculation graphs formed by operators naturally have differences.

In a possible implementation manner, the first preset operator library may be a self-defined operator library, which may include a plurality of self-defined preset operators, and specifically includes which preset operators may be set by themselves according to an actual situation, which is not limited in the embodiment of the present disclosure. The first preset operator library may provide basic definitions, parameter checks, shape inference and other functions of the preset operators, and the arithmetic device may generate the first computation graph by using at least some preset operators in the first preset operator library through the above functions of the first preset operator library, in which case, each operator included in the first computation graph belongs to the first preset operator library.

The target operator library may be an operator library corresponding to the neural network processor, and may include a plurality of target operators corresponding to the neural network processor, and specifically which target operators are included may also be flexibly determined according to actual conditions, which is not limited in the embodiment of the present disclosure. In a possible implementation manner, the target operator library may be an original operator library developed by the neural network processor in the design process, and in a possible implementation manner, the target operator library may also be an operator library obtained by developing or encapsulating the original operator library, an implementation manner of the target operator library, and how to obtain the target operator library corresponding to the target neural network processor.

After the first computation graph is obtained, because the computation graphs matched with different neural network processors have differences, each preset operator in the first computation graph, which belongs to the first preset operator library, can be converted into a target operator, which belongs to the target operator library, according to the conversion relation between the first preset operator library and the target operator library, so that a second computation graph formed by one or more converted target operators is obtained.

After the second computation graph is obtained, the second computation graph may be serialized into a target model file readable by the target neural network processor, and a serialization manner and a serialization process may be flexibly determined according to a coding implementation manner of an operator in a target operator library, which is not limited in the embodiment of the present disclosure.

It should be noted that, the number of the steps in the embodiment of the present disclosure does not limit the execution order of the steps, for example, the steps S121 and S122 may be implemented simultaneously, or implemented sequentially according to a certain order, which is not limited in the embodiment of the present disclosure.

In the embodiment of the disclosure, a first computation graph formed by preset operators in a first preset operator library is constructed according to an initial model file, and a conversion relation between the first preset operator library and a target operator library corresponding to a target neural network processor is utilized to convert the first computation graph into a second computation graph formed by target operators in the target operator library, so as to generate a target model file matched with the target neural network processor based on the second computation graph, through the above process, decoupling of a binding relation between a front-end model file and a back-end neural network processor can be realized, and in the case of orienting to the initial model file of each type of front end or the neural network processor of each type of back end, a unified operator representation mode is constructed through a customized first preset operator library, so that through the unified operator representation mode, the compiling process facing different neural network processors is converted into the conversion process between calculation graphs, so that the operation of the same algorithm on different neural network processors based on the same initial model file can be conveniently realized, and the universality and the flexibility of the operation are improved.

In one possible implementation, step S121 may include:

analyzing the initial model file to obtain an analysis result;

constructing a third computation graph based on the analysis result, wherein the third computation graph is formed by one or more preset operators in the first preset operator library;

and optimizing the third calculation graph based on the operation mode of the selected target neural network processor to obtain the first calculation graph.

For example, in a possible implementation manner, in the case that the initial model file is in a file format such as prototxt and cafemodel, the initial model file may be parsed in a manner of parsing the model file of the Caffe frame to obtain a parsing result; in one possible implementation, in the case that the initial model file is in the pb file format, the initial model file may be parsed in a manner of parsing the model file of the tensrflow framework to obtain a parsing result.

The analysis result can be related data extracted from the initial model file, and the specific contained data content can be flexibly determined according to the actual situation. In a possible implementation manner, the analysis result may include one or more of parameter information, connection relation, and corresponding weight data of each operator defined in the model file.

In one possible implementation, the parsing result may be used to construct a third computation graph, and the third computation graph may be composed of at least some preset operators in the first preset database. The construction mode of the third calculation graph can be flexibly determined according to the actual situation. In a possible implementation manner, the data in the analysis result may be organized in the form of structured data corresponding to the first preset operator database, based on the structured data, the corresponding preset operator may be selected from the first preset database, and assigned according to parameter information and weight data in the structured data, and connected in a pointer manner according to a connection relationship in the structured data, so as to obtain a plurality of example nodes including operator parameters, weight data, and pointers of upper and lower operator nodes, and a directed acyclic graph formed by the example nodes may be used as the third computation graph.

In one possible implementation, the third computation graph may be directly used as the first computation graph to complete the construction of the first computation graph.

In a possible implementation manner, the third computation graph may be further optimized based on the operation manner of the selected target neural network processor, so as to obtain the first computation graph with higher operation efficiency. The optimization method can be flexibly determined according to the actual situation of the target neural network processor, and is described in detail in the following disclosure embodiments, which are not expanded at first.

In the embodiment of the present disclosure, the initial model file is analyzed to obtain an analysis result, and the third computation graph is constructed based on the analysis result, so that the third computation graph is optimized based on the operation mode of the selected target neural network processor to obtain the first computation graph. Through the process, the initially generated third computation graph can be optimized in a targeted mode to obtain the first computation graph, so that the computation performance of the target model file on the target neural network processor is improved, and the computation performance and the efficiency are improved.

In a possible implementation manner, optimizing the third computation graph based on the operation manner of the selected target neural network processor to obtain the first computation graph may include:

and based on the operation mode of the selected target neural network processor, performing one or more of operator fusion, operator splitting and operator replacement on a preset operator in the third calculation graph to obtain the first calculation graph.

The operator fusion can be to fuse some preset operators with connection relations in the third computation graph into one preset operator, and the fused preset operator can realize the calculation with the same function as the preset operator with connection relations, but can improve the overall operation efficiency of the computation graph after fusion; the operator splitting may be to split a certain preset operator in the third computation graph into a plurality of preset operators with a connection relationship, and the like, and the split plurality of preset operators with a connection relationship may implement the same function calculation as the preset operator before splitting, but may improve the overall operation efficiency of the computation graph after splitting; similarly, the operator replacement may be to replace one or some preset operators in the third computation graph with other single or multiple preset operators, and the computation function of the implementation after the replacement is the same, but the computation efficiency may be improved.

And executing which operation of operator fusion, operator splitting or operator replacement on which preset operators in the third computation graph, wherein an implementation manner of the operation can be flexibly determined according to the actual operation performance of the target neural network processor, and is not specifically limited in the embodiment of the present disclosure. Through the process, the third computation graph can be flexibly optimized in a plurality of modes with higher efficiency, so that the operational performance and the operational efficiency of the target neural network processor are effectively improved.

In one possible implementation, step S122 may include:

based on the selected target neural network processor, selecting a packaged operator library corresponding to the target neural network processor from a second preset operator library as a target operator library; and the second preset operator library comprises packaging operator libraries corresponding to the various neural network processors respectively.

The second preset operator library may store packaged operator libraries corresponding to multiple types of neural network processors, and the number of the packaged operator libraries included in the second preset operator library may be flexibly determined according to actual conditions of the neural network processors, and is not limited to the following embodiments. For example, in an example, in a case that the operation method proposed by the embodiment of the present disclosure may simultaneously face 3 neural network processors A, B and C, the second preset operator library may store the packed operator library a corresponding to the neural network processor a, the packed operator library B corresponding to the neural network processor B, and the packed operator library C corresponding to the neural network processor C.

The implementation form of the encapsulation operator library corresponding to the neural network processor can also be flexibly determined according to the actual situation, and in a possible implementation form, the encapsulation operator library may include: and encapsulating the original operator in the neural network processor corresponding to the operator library, and/or encapsulating the custom operator developed by the neural network processor corresponding to the operator library.

In a possible implementation manner, the packaged operator library may be a packaged operator library with a uniform interface obtained by packaging the original operator library corresponding to the neural network processor according to a preset packaging template, where the operator in the packaged operator library is consistent with the original operator in the original operator library. In a possible implementation manner, the original operator library may further include, in addition to the operator definition (such as a basic definition, a parameter check, a shape inference, and the like of each original operator) of each original operator corresponding to the neural network processor, an operator implementation manner of each original operator, that is, a compiled code corresponding to the original operator, a compiled computation kernel, and the like.

The custom operator may be an operator corresponding to the neural network processor, which is obtained by further developing the neural network processor, in some possible implementation manners, under the condition that an original operator library corresponding to the neural network processor has a development function, the operator of the neural network processor may be developed and improved through an operator development tool chain to obtain the custom operator, and the original operator and the custom operator in the original operator library are encapsulated according to a preset encapsulation template to obtain an encapsulated operator library with a uniform interface. In some possible implementations, in the case that the encapsulation operator library includes the custom operator, in addition to the operator definition (such as the basic definition, parameter check, shape inference, and the like of the custom operator), the encapsulation operator library may also include the operator implementation of the respective defined operator, that is, the compiled code corresponding to the custom operator, the compiled compute kernel, and the like.

By the aid of the packaged operator library comprising the original operators and/or the custom operators, the custom operators which are developed by users and have higher operation efficiency can be integrated into the packaged operator library and then into the target operator library and the operation process, so that the object-oriented range of the method provided by the embodiment of the disclosure is further expanded, and the operation efficiency of the operation process is improved.

Since the second preset operator library includes the encapsulated operator libraries corresponding to the plurality of types of neural network processors, in one possible implementation, the encapsulated operator library corresponding to the selected target neural network processor may be selected from the second preset operator library as the target operator library.

In a possible implementation manner, in the case of deriving a new neural network processor (such as the neural network processor D), in order to make the method proposed by the embodiment of the present disclosure also be applicable to the neural network processor D, the neural network processor D may also be integrated into multiple types of neural network processors through registration, and the process of registration may be flexibly determined according to actual situations, and is not limited to the following disclosed embodiments. In a possible implementation manner, the original operator library corresponding to the neural network processor D may be packaged as a packaged operator library D, and the packaged operator library D is integrated into the second preset operator library, so as to implement registration of the operator library of the neural network processor D.

In the embodiment of the disclosure, based on the selected target neural network processor, the packaged operator library corresponding to the target neural network processor is selected as the target operator library in the second preset operator library, and through the above process, the packaged operator libraries corresponding to multiple types of neural network processors can be integrated through the second preset operator library, so that on one hand, the types of neural network processors oriented to the operation method are expanded, and the universality of the method is improved; on the other hand, the subsequent newly added target neural network processor is convenient to register, and the expandability and the flexibility of the method are improved.

In one possible implementation manner, step S123 may include:

respectively converting a plurality of preset operators in a first calculation graph according to a conversion relation between the first preset operator library and a target operator library to obtain a plurality of conversion results respectively corresponding to the plurality of preset operators in the first calculation graph, wherein the conversion results comprise target operators and/or target operator subgraphs in the target operator library, and the target operator subgraphs are formed by a plurality of target operators in the target operator library;

and connecting the plurality of conversion results according to the connection relation between preset operators in the first calculation graph to obtain a second calculation graph matched with the target neural network processor.

The conversion relation between the first preset operator library and the target operator library can be a corresponding relation between each operator in the target operator library and a certain preset operator in the first preset operator library or certain preset operators connected with each other, and the specific corresponding mode of the conversion relation can be flexibly determined according to the actual conditions of the first preset operator library and the target operator library. In a possible implementation manner, in the process of integrating the packaged operator library (i.e., the target operator library) corresponding to the target neural network processor into the second preset operator library, each target operator in the target operator library may be traversed, and a conversion logic between each traversed target operator and a preset operator in the first preset operator library is determined, so as to determine a conversion relationship between the first preset operator library and the target operator library.

Based on the determined conversion relationship, a plurality of preset operators in the first computation graph can be converted respectively to obtain a plurality of conversion results, wherein the plurality of conversion results correspond to the plurality of preset operators in the first computation graph. In a possible implementation manner, a certain preset operator 1 in the first computation graph may correspond to a certain target operator 1' in the target operator library, in which case, the conversion result of the preset operator may be the target operator in the target operator library; in a possible implementation manner, a certain preset operator 2 in the first computation graph may correspond to a target operator subgraph 2' formed by connecting a plurality of target operators in a target operator library, and in this case, the conversion result of the preset operator may be the target operator subgraph. The specific implementation form of the multiple conversion results can be flexibly determined according to the actual situation, and the embodiment of the disclosure does not limit this.

After obtaining the plurality of conversion results, the plurality of conversion results corresponding to the preset operators may be connected according to a connection relationship between the preset operators in the first computation graph, so as to obtain the second computation graph. In the embodiment of the present disclosure, through the above process, the target-oriented neural network processor-oriented compiling of the initial model file can be realized based on the conversion from the first computation graph to the second computation graph, which is convenient for controlling the conversion and compiling strategy of each operator and the control of the overall compiling flow, so as to provide as much optimization space as possible for the operation process, and improve the efficiency of the operation method and the object-oriented range.

In a possible implementation manner, before connecting the plurality of conversion results according to a connection relationship between preset operators in the first computation graph to obtain a second computation graph matched with the target neural network processor, the operation method provided in the embodiment of the present disclosure further includes:

and fusing and converting part of preset operators in the first calculation graph into target operators in a target operator library as a conversion result.

In some possible implementations, in addition to converting the preset operator in the first computation graph into the target operator in the target operator library or the target operator sub-graph, a plurality of preset operators having a connection relationship in the first computation graph may be fused and converted into a certain target operator in the target operator library to serve as a conversion result of the plurality of preset operators having a connection relationship in the first computation graph.

Through the process, a plurality of preset operators in the first calculation graph can be fused into one target operator, so that the calculation performance of the target model file in the target neural network processor is further improved, and the calculation performance and efficiency are further improved.

In a possible implementation manner, the compiling the initial model file in step S12 may further include:

based on the format of the initial model file, compiling corresponding to the format is performed on the initial model file.

The format of the initial model file may be realized in the above embodiments, and is not described herein again. Based on the format of the initial model file, performing compiling corresponding to the format on the initial model file, which may be based on different initial model file formats, and analyzing and extracting the content in the initial model file in different manners, as mentioned in the above-mentioned embodiments, under the condition that the initial model file is in a prototxt or cafemodel file or other file formats, the analysis result may be obtained in a manner of analyzing the model file of the Caffe frame; or, in the case where the initial model file is in the pb file format, an analysis result is obtained by analyzing the model file of the tensrflow framework.

Through the process, a user can conveniently develop a model file for compiling and running aiming at the same neural network algorithm or model, model compiling and running on different neural network processors can be realized, deployment and migration costs on different neural network processors are reduced, and convenience in operation implementation is effectively improved.

After the second computation graph is obtained by the methods proposed by the above-mentioned disclosed embodiments, a target model file matched with the target neural network processor may be generated based on the second computation graph by step S124. The mode of generating the target model file based on the second computation graph may be flexibly determined according to the actual situation of the target neural network processor, and is not limited to the following embodiments. In a possible implementation manner, each target operator included in the second computation graph may be traversed, and each traversed target operator is compiled into a computation kernel corresponding to the target neural network processor by using an operator implementation manner provided in the target operator library, and a call order of each target operator is determined through a connection relationship between the target operators in the second computation graph, so that the compiled computation kernel, the call order, and the computation resource to be allocated are serialized into a target model file matched with the target neural network processor.

In one possible implementation, after obtaining the target model file obtained by the above disclosed embodiments, the target model file may be run in the target neural network processor to obtain the operation result through step S13. The implementation of step S13 can be flexibly changed according to the target neural network processor.

In some possible implementations, in the process implemented in step S13, the method may include:

configuring in a target neural network processor according to the target model file;

performing reasoning operation in the configured target neural network processor according to the acquired input data to obtain an operation result;

and transmitting the operation result to the target device as output data.

Wherein the configuring in the target neural network processor may include: the method comprises the following steps of initializing a target neural network processor, allocating computing resources in the target neural network processor, and creating a model handle, wherein the model handle creation can be to carry computing kernels, operator calling information, weight data and the like in a target model file to the target neural network processor to obtain the model handle, so that the model handle can be used for subsequent reasoning operation.

After the target neural network processor is configured, inference operations can be performed in the configured target neural network processor based on the acquired input data. The input data may be derived from devices other than the target neural network processor, such as the computing device or other devices mentioned in the embodiments of the present disclosure, and the specific selection of which devices can be flexibly determined according to actual situations.

The acquired input data can be processed into a data format required by the target neural network processor through preprocessing operation, and the target neural network processor calls an inference interface to complete inference operation based on model handles according to the preprocessed input data, so that an operation result is obtained and serves as output data.

The obtained output data may be transmitted to other target devices, the implementation form of the target device is not limited in the embodiment of the present disclosure, in one possible implementation form, the target device may be the arithmetic device in the above disclosed embodiment, and in some possible implementation forms, the target device may also be other devices.

In some possible implementation manners, all the types of neural network processors provided in the embodiments of the present disclosure may be integrated in the same inference framework runtime library, and the integration form may be flexibly determined according to an actual situation, for example, refer to a form in which the encapsulation operator library is integrated in the second preset operator library in the embodiments of the present disclosure, and an implementation manner of the implementation manner is not limited in the embodiments of the present disclosure. In a possible implementation manner, in the case of a newly added neural network processor, the newly added neural network processor may add a corresponding operation logic branch of the neural network processor in the inference frame runtime library, and provide a packaged computing resource allocation management interface (such as an initialization interface, a memory allocation interface, a memory handling interface, a thread control interface, a time synchronization management interface, and the like) and a packaged inference operation interface (such as a model handle definition interface, a model query interface, an inference operation interface, and the like), so as to be used for performing inference operation in each target neural network processor subsequently.

The multi-class neural network processors are integrated in the same inference framework operation library to realize that the target model file is operated in the selected target neural network processor to obtain an operation result.

Fig. 3 shows a block diagram of an arithmetic device 20 according to an embodiment of the present disclosure, which, as shown in fig. 3, comprises:

an initial model file obtaining module 21, configured to obtain an initial model file for implementing a neural network operation.

And the target model file generation module 22 is configured to, in response to a selection operation for multiple types of neural network processors, compile the initial model file based on the selected target neural network processor to obtain a target model file matched with the target neural network processor.

And the operation module 23 is configured to operate the target model file in the target neural network processor to obtain an operation result.

In one possible implementation, the target model file generation module is configured to: constructing a first calculation graph according to the initial model file, wherein the first calculation graph is formed by one or more preset operators in a first preset operator library; acquiring a target operator library corresponding to the target neural network processor based on the selected target neural network processor; converting the first computational graph into a second computational graph matched with the target neural network processor according to the conversion relation between the first preset operator library and the target operator library, wherein the second computational graph is formed by one or more target operators in the target operator library; and generating a target model file matched with the target neural network processor based on the second computational graph.

In one possible implementation, the target model file generation module is further configured to: analyzing the initial model file to obtain an analysis result; constructing a third computation graph based on the analysis result, wherein the third computation graph is formed by one or more preset operators in the first preset operator library; and optimizing the third calculation graph based on the operation mode of the selected target neural network processor to obtain the first calculation graph.

In one possible implementation, the target model file generation module is further configured to: and based on the operation mode of the selected target neural network processor, performing one or more of operator fusion, operator splitting and operator replacement on a preset operator in the third calculation graph to obtain the first calculation graph.

In one possible implementation, the target model file generation module is further configured to: respectively converting a plurality of preset operators in a first calculation graph according to a conversion relation between the first preset operator library and a target operator library to obtain a plurality of conversion results respectively corresponding to the plurality of preset operators in the first calculation graph, wherein the conversion results comprise target operators and/or target operator subgraphs in the target operator library, and the target operator subgraphs are formed by a plurality of target operators in the target operator library; and connecting the plurality of conversion results according to the connection relation between preset operators in the first calculation graph to obtain a second calculation graph matched with the target neural network processor.

In one possible implementation, the target model file generating module is further configured to: and fusing and converting part of preset operators in the first calculation graph into target operators in a target operator library as a conversion result.

In one possible implementation, the target model file generation module is further configured to: based on the selected target neural network processor, selecting a packaged operator library corresponding to the target neural network processor from a second preset operator library as a target operator library; and the second preset operator library comprises packaging operator libraries corresponding to the various neural network processors respectively.

In one possible implementation, the encapsulation operator library includes: and encapsulating the original operator in the neural network processor corresponding to the operator library, and/or encapsulating the custom operator developed by the neural network processor corresponding to the operator library.

In one possible implementation, the target model file generating module is further configured to: based on the format of the initial model file, compiling corresponding to the format is performed on the initial model file.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Application scenario example

Fig. 4 to 9 are schematic diagrams illustrating an application example according to the present disclosure, and as shown in the drawing, the application example of the present disclosure proposes an operation method, which may implement an operation process including compiling and running for a multi-class neural network processor based on the inference framework compiling library shown in fig. 4 and the inference framework runtime library shown in fig. 5.

As shown in fig. 4, the inference framework compilation library is composed of a plurality of modules:

and a compiling parameter configuration module, configured to configure various parameters of the current compiling process, such as a path of the initial model file, a structure of the initial model file, the selected neural network processor NPU, an input/output node of the neural network model, and the like, and based on the compiling parameter configuration module, the processing logic branches and quantized data of the initial model file at the front end and the NPU at the back end selected in the compiling process in the operation process can be determined.

The analysis module can be used for analyzing the trained initial model files from other common deep learning frameworks, such as the initial model files trained by the frameworks such as Caffe, Open Neural Network Exchange (ONNX), and the like, acquiring parameter information, connection relations and corresponding weight data of each operator defined in the initial model files through analysis, and organizing a structured data format defined in the inference framework as an analysis result.

The first preset operator library may be an operator set composed of preset operators predefined by the inference framework, and the first preset operator library may only contain operator definitions, for example, basic definitions, parameter verification, and shape inference functions of operators are provided, so as to provide a unified operator representation.

And the computation graph building module can build a computation graph formed by preset operators represented by a directed acyclic graph data structure according to analysis results (such as operator parameters, connection modes, weight data and the like) provided by the model analysis module, and generates specific example nodes in the building process of the computation graph, wherein the specific example nodes comprise current operator parameters, weight data, upper and lower operator node pointers and the like.

And the computation graph conversion module can convert the computation graph constructed by the computation graph construction module into a second computation graph corresponding to the selected target NPU. The calculation graph constructed by the calculation graph construction module is defined by an inference framework and is composed of preset operators, the preset operators are defined by a target operator library corresponding to the target NPU, the preset operators are composed of target operators defined in the target operator library and are related to the architecture of the target NPU.

The computation graph conversion module may first optimize the computation graph constructed by the computation graph construction module according to the architecture characteristics of the target NPU, for example, fusion and splitting or subgraph replacement may be performed on preset operators in the computation graph, so as to improve the computation performance on the target NPU as much as possible, and the optimized computation graph may be referred to as a first computation graph; then, traversing all preset operators in the first calculation graph, and converting each preset operator into a single target operator in a target operator library or into a target operator subgraph consisting of a plurality of target operators in the target operator library, wherein the conversion can comprise parameter conversion, weight data conversion and the like; in some possible implementation manners, a plurality of preset operators can be fused into one target operator to provide higher computational performance, and finally, the converted target operators are connected according to the operator connection relation of the first computation graph to form the second computation graph.

And the second computation graph serialization module can serialize the second computation graph to obtain a target model file special for the target NPU. The serialization process may be: and traversing all target operators in the second calculation graph, compiling the traversed target operators into calculation kernels in sequence by using corresponding codes in a target operator library, generating a calling sequence of the target operators by using the connection relation of the target operators in the second calculation graph, and finally serializing all the calculation kernels, the calling sequence and the calculation resources to be distributed into a target model file.

Based on the inference framework compilation library shown in fig. 4, a process of compiling in the inference framework compilation library to obtain the object model file facing the selected object NPU may be as shown in fig. 5, and as can be seen from fig. 5, in an example, based on the inference framework compilation library, a compiling process in the operation method proposed in the application example of the present disclosure may include:

determining a selected target NPU (namely the NPU1 in the figure) based on parameters configured by the compiling parameter configuration module, and analyzing an initial model file corresponding to the format of the framework 1 to obtain an analysis result;

constructing a third computation graph formed by preset operators in the first preset operator library through a computation graph construction module based on the analysis result;

according to the architecture of a target NPU, optimizing a third computation graph in a computation graph conversion module to obtain a first computation graph;

converting the first calculation graph into a second calculation graph corresponding to the target NPU in a calculation graph conversion module according to the conversion relation between the first preset calculation sub-library and the target calculation sub-library corresponding to the target NPU;

and serializing the second calculation graph through a second calculation graph serialization module to obtain a target model file.

Based on the process of fig. 5, it can be seen that the inference framework compilation library proposed by the application example of the present disclosure can be oriented to multiple NPUs, and the process of integrating the multiple NPUs into the inference framework compilation library can be shown in fig. 6, and as can be seen from fig. 6, in an example, the process of integrating the inference framework compilation library can include:

the NPU architecture registration is that a compiling logic branch corresponding to the NPU is added in the compiling logic of the reasoning framework compiling library, and for each NPU architecture, respective functions such as computational graph conversion logic, serialization logic and the like can be realized through various logics.

The construction of the encapsulation operator base corresponding to the NPU, each NPU has the original operator base with the respective format, and in a possible implementation mode, different NPU operator bases can be encapsulated according to a unified operator base template to obtain the encapsulation operator base so as to provide a unified interface, thereby facilitating acquisition, management and independent operator test. The packaged operator library can contain an original operator library provided by an NPU manufacturer, and in some possible implementation manners, under the condition that the NPU manufacturer opens the custom function of the operator, the developed custom NPU operator and the high-performance NPU operator can be introduced into the packaged operator library.

And (3) developing an operator conversion logic, and designing an implementation scheme of a packaging operator library corresponding to the NPU aiming at each preset operator in the first preset operator library. Each preset operator can correspond to one operator in the packaged operator library or an operator subgraph formed by a plurality of operators in the packaged operator library according to different operator parameters, and conversion of corresponding weight data is completed at the same time.

Under the condition that each operator in the first preset operator library realizes the development of the conversion logic, the optimization strategy of the third computation graph is developed, for example, various graph optimization strategies, such as subgraph replacement, computation node fusion and other functions required in the computation graph conversion logic, can be realized on the third computation graph according to different NPU computation architectures.

The first computational graph conversion logic is developed to implement conversion logic from the first computational graph to the second computational graph.

And developing the serialization logic of the second computation graph, and realizing the serialization of the second computation graph to obtain the conversion logic of the target model file.

As shown in fig. 7, the inference framework runtime is composed of a plurality of modules:

and operating a parameter configuration module, which is used for configuring various parameters of the current compiling process, such as a path of the initial model file, a structure of the initial model file, the selected neural network processor NPU, an input and output node of the neural network model, and the like, and based on the parameter configuration module, determining the processing logic branches and quantized data of the initial model file at the front end and the NPU at the back end selected in the compiling process in the operation process.

And the input and output data module can interact with other modules or devices in the running process, provide input data required by the operation of the target NPU, and acquire output data formed by the operation result from the target NPU for subsequent processing.

The target NPU management module can read a target model file and acquire information such as corresponding operator calculation kernels, operator calling sequences, calculation resource allocation information and the like; initializing a target NPU, and completing the allocation of computing resources, including memory, thread management and the like; carrying the operator calculation kernel, the operator calling information and the weight data to a target NPU to obtain a model handle for subsequent reasoning operation; and after the reasoning operation is completed, releasing the computing resources and the model handle.

And the reasoning operation module can carry data in the target NPU equipment, such as to a target device (such as a host device) and complete operator computing kernel calling and synchronization according to the model description handle.

Based on the inference framework runtime illustrated in fig. 7, a process of obtaining an operation result by operation in the inference framework runtime may be as illustrated in fig. 8, and as can be seen from fig. 8, in an example, based on the inference framework runtime, an operation process in the operation method provided in the application example of the present disclosure may include:

acquiring an operation logic branch of the target NPU according to the selected target NPU, and configuring necessary configuration information such as input and output data information, iteration times and the like in the operation process;

acquiring a target model file, and finishing initialization, calculation resource allocation and model handle creation of a target NPU according to information in the target model file;

acquiring input data from other sources and completing preprocessing to obtain input data in a format required by a target NPU;

carrying input data to a target NPU, calling an inference interface based on the model handle to complete inference operation to obtain an operation result, and carrying the operation result as output data from the target NPU to a target device;

post-processing the output data and continuing to transmit for subsequent use;

and repeating the process from the acquisition of the input data to the post-processing of the output data until the reasoning operation is completed.

Based on the process of fig. 8, it can be seen that the inference framework runtime proposed in the application example of the present disclosure may also be oriented to multiple NPUs, and the process of integrating these NPUs into the inference framework runtime may be as shown in fig. 9, as can be seen from fig. 9, in an example, the process of integrating the inference framework runtime may include:

the NPU architecture registration is to add a running logic branch of a corresponding NPU in an inference framework running library, and for each NPU architecture, basic steps can be inherited to realize respective functions, such as steps of device initialization, resource allocation and the like.

The computing resource allocation management interface package may include, for example, packages of interfaces such as device initialization, memory allocation, memory handling, thread control, event synchronization management, and the like.

The package of the inference operation interface may include, for example, package of interfaces such as model handle definition, model query interface, and model inference interface, which are used to complete the relevant operations of inference operation.

The operation method provided by the application example can be realized by combining the compiling process of the reasoning framework compiling library and the operation process of the reasoning framework operation library in the application example, and the same code can be used for realizing the compiling and the operation of the same algorithm model on different NPUs to finish the operation on different NPUs; moreover, the compiling flow and the operator compiling strategy can be flexibly controlled and set, and the operation optimization space can be provided as much as possible; the development and integration of the user-defined operator can be conveniently completed; therefore, by using the method provided by the application example of the disclosure, the deployment and migration costs of the algorithm model on different NPUs can be reduced, and as much computational optimization space and degrees of freedom as possible are provided.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The disclosed embodiments also provide a computer program product comprising computer readable code which, when run on a device, executes instructions for implementing a method as provided by any of the above embodiments.

Embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed, cause a computer to perform the operations of the method provided by any of the above embodiments.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 10 illustrates a block diagram of an electronic device 800 in accordance with an embodiment of the disclosure. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like terminal.

Referring to fig. 10, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a Complementary Metal Oxide Semiconductor (CMOS) or Charge Coupled Device (CCD) image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as a wireless network (WiFi), a second generation mobile communication technology (2G) or a third generation mobile communication technology (3G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described methods.

Fig. 11 shows a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 11, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of operation, the method comprising:

acquiring an initial model file for realizing neural network operation;

in response to the selection operation aiming at the multiple types of neural network processors, compiling the initial model file based on the selected target neural network processor to obtain a target model file matched with the target neural network processor;

and operating the target model file in the target neural network processor to obtain an operation result.

2. The method of claim 1, wherein compiling the initial model file based on the selected target neural network processor to obtain a target model file matching the target neural network processor comprises:

constructing a first calculation graph according to the initial model file, wherein the first calculation graph is formed by one or more preset operators in a first preset operator library;

acquiring a target operator library corresponding to the target neural network processor based on the selected target neural network processor;

converting the first computational graph into a second computational graph matched with the target neural network processor according to a conversion relation between the first preset operator library and the target operator library, wherein the second computational graph is formed by one or more target operators in the target operator library;

and generating a target model file matched with the target neural network processor based on the second computational graph.

3. The method of claim 2, wherein constructing a first computational graph from the initial model file comprises:

analyzing the initial model file to obtain an analysis result;

and optimizing the third calculation graph based on the operation mode of the selected target neural network processor to obtain a first calculation graph.

4. The method of claim 3, wherein optimizing the third computation graph based on the operation mode of the selected target neural network processor to obtain a first computation graph comprises:

and performing one or more of operator fusion, operator splitting and operator replacement on a preset operator in the third calculation graph based on the operation mode of the selected target neural network processor to obtain a first calculation graph.

5. The method of any one of claims 2 to 4, wherein the converting the first computational graph into a second computational graph matched to the target neural network processor according to a conversion relationship between the first preset operator library and the target operator library comprises:

respectively converting a plurality of preset operators in the first calculation graph according to a conversion relation between the first preset operator library and the target operator library to obtain a plurality of conversion results respectively corresponding to the plurality of preset operators in the first calculation graph, wherein the conversion results comprise target operators and/or target operator subgraphs in the target operator library, and the target operator subgraphs are formed by the plurality of target operators in the target operator library;

6. The method according to claim 5, wherein before the connecting the plurality of conversion results according to the connection relationship between preset operators in the first computation graph to obtain a second computation graph matched with the target neural network processor, the method further comprises:

and fusing and converting part of preset operators in the first calculation graph into target operators in the target operator library as conversion results.

7. The method according to any one of claims 2 to 6, wherein the obtaining a target operator library corresponding to the target neural network processor based on the selected target neural network processor comprises:

based on the selected target neural network processor, selecting a packaged operator library corresponding to the target neural network processor from a second preset operator library as a target operator library; and the second preset operator library comprises packaging operator libraries corresponding to the multiple types of neural network processors respectively.

8. The method of claim 7, wherein encapsulating the operator library comprises: and the original operator in the neural network processor corresponding to the packaging operator library and/or the custom operator developed by the neural network processor corresponding to the packaging operator library.

9. The method according to any one of claims 1 to 8, wherein compiling the initial model file further comprises:

and performing compilation corresponding to the format of the initial model file based on the format of the initial model file.

10. An arithmetic device, the device comprising:

the initial model file acquisition module is used for acquiring an initial model file for realizing neural network operation;

the target model file generation module is used for responding to selection operation aiming at multiple types of neural network processors, compiling the initial model file based on the selected target neural network processor and obtaining a target model file matched with the target neural network processor;

and the operation module is used for operating the target model file in the target neural network processor to obtain an operation result.

11. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 9.

12. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 9.