CN117170686B - Method and computing device for neural network compilation optimization - Google Patents

Method and computing device for neural network compilation optimization Download PDF

Info

Publication number
CN117170686B
CN117170686B CN202311453932.7A CN202311453932A CN117170686B CN 117170686 B CN117170686 B CN 117170686B CN 202311453932 A CN202311453932 A CN 202311453932A CN 117170686 B CN117170686 B CN 117170686B
Authority
CN
China
Prior art keywords
network model
similarity
target
feature
feature set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311453932.7A
Other languages
Chinese (zh)
Other versions
CN117170686A (en
Inventor
梁栋
蔡权雄
牛昕宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Corerain Technologies Co Ltd
Original Assignee
Shenzhen Corerain Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Corerain Technologies Co Ltd filed Critical Shenzhen Corerain Technologies Co Ltd
Priority to CN202311453932.7A priority Critical patent/CN117170686B/en
Publication of CN117170686A publication Critical patent/CN117170686A/en
Application granted granted Critical
Publication of CN117170686B publication Critical patent/CN117170686B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a neural network compiling optimization method and computing equipment, wherein the method comprises the following steps: preprocessing at least one preset network model to obtain a first feature set and a second feature set; processing the target network model to obtain a first target feature and a second target feature; calculating the comprehensive similarity and side-by-side of a target network model and the at least one predetermined network model according to the first feature set and the second feature set and the first target feature and the second target feature; determining the network type of the target network model according to the comprehensive similarity; and performing corresponding classification compiling on the target network model according to the determined network type. According to the technical scheme, the target network model can be classified, so that corresponding compiling optimization is carried out on the target network model to be compiled according to the classification result.

Description

Method and computing device for neural network compilation optimization
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a neural network compiling optimization method and computing equipment.
Background
In the application of the neural network, the common development flow is as follows: developing a corresponding algorithm by an algorithm engineer according to the requirements; compiling an algorithm by an application engineer to obtain an algorithm model capable of running on the equipment; the algorithm model is deployed to the designated device. The neural network can be divided into uses such as image classification, image segmentation, semantic segmentation and the like according to different uses of the network, and a plurality of typical networks exist in the network of each use. At present, the compiling process in the common neural network development is not distinguished according to the type of the network, but is uniformly compiled.
Disclosure of Invention
The invention aims to provide a method and computing equipment for compiling and optimizing a neural network, which can distinguish types of the neural network model so as to perform corresponding optimizing and compiling.
According to an aspect of the present invention, there is provided a method of neural network compilation optimization, the method comprising:
preprocessing at least one preset network model to obtain a first feature set and a second feature set;
processing the target network model to obtain a first target feature and a second target feature;
calculating the comprehensive similarity and side-by-side of a target network model and the at least one predetermined network model according to the first feature set and the second feature set and the first target feature and the second target feature;
determining the network type of the target network model according to the comprehensive similarity;
and performing corresponding classification compiling on the target network model according to the determined network type.
According to some embodiments, the first feature set includes a topology ordering result of the directed acyclic graph corresponding to the at least one predetermined network model, and the second feature set includes a summary of each of the topology ordering results obtained according to a summary algorithm;
the first target feature comprises a topological ordering result of the directed acyclic graph corresponding to the target network model, and the second feature comprises a summary of the topological ordering result obtained according to the summary algorithm.
According to some embodiments, preprocessing at least one predetermined network model comprises:
and analyzing the at least one preset network model, and analyzing each preset network model into a corresponding first directed acyclic graph.
According to some embodiments, preprocessing the at least one predetermined network model further comprises:
and carrying out topological sorting on each first directed acyclic graph, and adding the obtained topological sorting result into the first feature set.
According to some embodiments, preprocessing the at least one predetermined network model further comprises:
and calculating a summary of each topological sorting result in the first feature set, and adding the second feature set.
According to some embodiments, the summarization algorithm comprises:
determining weights for corresponding nodes according to node types in the topological sorting result, and coding to form type coding-weight pairs;
multiplying the type code by a weight for each type code-weight pair;
for each topological sorting result, adding all weight multiplication results in a row;
and (5) reducing the dimension of the column adding result to obtain the abstract.
According to some embodiments, processing a target network model includes:
analyzing the target network model to obtain a second directed acyclic graph;
performing topological sorting on the second directed acyclic graph to obtain the first target feature;
and calculating the abstract of the first target feature to obtain the second target feature.
According to some embodiments, calculating the integrated similarity of the target network model and the at least one predetermined network model comprises:
performing first similarity distance calculation on the first target feature and elements in the first feature set to obtain first similarity;
and carrying out second similarity distance calculation on the second target feature and elements in the second feature set to obtain second similarity.
According to some embodiments, calculating the integrated similarity of the target network model and the at least one predetermined network model further comprises:
multiplying the first similarity by a first preset weight coefficient to obtain first weight similarity;
multiplying the second similarity by a second preset weight coefficient to obtain a second weight similarity;
and adding the first weight similarity and the second weight similarity to obtain a sum which is used as the comprehensive similarity.
According to some embodiments, determining the network type of the target network model from the integrated similarity comprises:
and taking the corresponding preset network model with the maximum comprehensive similarity and the comprehensive similarity larger than a first threshold value as the network type of the target network model.
According to another aspect of the present invention, there is provided a computing device comprising:
a processor; and
a memory storing a computer program which, when executed by the processor, causes the processor to perform the method of any one of the preceding claims.
According to another aspect of the invention there is provided a non-transitory computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to perform the method of any of the above.
The common compiling process is not distinguished according to the type of the network, but is uniformly compiled. According to the embodiment of the invention, the neural network is firstly distinguished according to the type of the network in the compiling process, so that the compiler can correspondingly optimize the hardware characteristics according to the type of the network, and compile a network model which runs faster on hardware. For this purpose, the solution according to the exemplary embodiment performs preprocessing on the predetermined network model, and then processes the target network model, and calculates the integrated similarity between the target network model and each predetermined network model according to the foregoing preprocessing and calculation results. And determining the network type of the target network model according to the comprehensive similarity, so that the classification of the network type can be better completed, the compiling is optimized, and the compiled network model can be operated on hardware faster.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below.
FIG. 1 illustrates a method flow diagram for neural network compilation optimization, according to an example embodiment.
FIG. 2 illustrates a method flow diagram for preprocessing a predetermined network model in accordance with an exemplary embodiment.
FIG. 3 illustrates a flow chart of a method of processing a target network model according to an example embodiment.
FIG. 4 illustrates a method flow diagram for integrated similarity calculation according to an example embodiment.
Fig. 5a shows a schematic structural diagram of a directed acyclic graph of a network model according to an example embodiment.
Fig. 5b shows a schematic diagram according to a directed acyclic graph.
FIG. 6 illustrates a block diagram of a computing device in accordance with an exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another element. Accordingly, a first component discussed below could be termed a second component without departing from the teachings of the present inventive concept. As used herein, the term "and/or" includes any one of the associated listed items and all combinations of one or more.
The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present invention are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.
Those skilled in the art will appreciate that the drawings are schematic representations of example embodiments and that the modules or flows in the drawings are not necessarily required to practice the invention and therefore should not be taken to limit the scope of the invention.
At present, the neural network is one of common calculation models in the field of artificial intelligence, and the neural network model is compiled by describing a special model of the neural network algorithm, namely the neural network algorithm is converted into a general calculation map, the calculation map is optimized, and the optimized calculation map is mapped into instructions and machine codes executable by a back-end hardware platform, so that the neural network model is converted into codes which are executed on the calculation platform. Neural network compilation optimization is a computational process that is complex to process and takes a long time. The common compiling process is not distinguished according to the type of the network, but is uniformly compiled. According to the embodiment of the invention, the neural network is firstly distinguished according to the type of the network in the compiling process, so that the compiler can correspondingly optimize the hardware characteristics according to the type of the network, and compile a network model which runs faster on hardware.
Therefore, the invention provides a neural network compiling optimization method, which carries out corresponding compiling optimization on the target network model to be compiled according to different classification results, so that the compiled model can run faster on hardware. According to the embodiment of the invention, the comprehensive similarity between the target network model to be compiled and each preset typical network model is calculated first, then the network type of the target network model is determined according to the comprehensive similarity, and then corresponding compiling optimization is carried out.
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings.
FIG. 1 illustrates a method flow diagram for neural network compilation optimization, according to an example embodiment.
Referring to fig. 1, at S101, at least one predetermined network model is preprocessed to obtain a first feature set and a second feature set.
According to some embodiments, the first feature set includes a topology ordering result of the directed acyclic graph corresponding to the at least one predetermined network model, and the second feature set includes a summary of each of the topology ordering results obtained according to a summary algorithm.
According to some embodiments, predetermined network models are parsed, each predetermined network model parsed into a first Directed Acyclic Graph (DAG).
And then, carrying out topological sorting on each first directed acyclic graph, and adding the obtained topological sorting result into the first feature set.
According to some embodiments, the process of topology ordering includes selecting a node from the DAG that has no precursor (i.e., degree of ingress is 0) and outputting; deleting the node and all directed edges starting from the graph; the previous two steps are repeated until the current DAG is empty or there are no nodes in the current graph that are not predecessor.
Thereafter, a summary may be calculated for each topology sequencing result in the first feature set, joining the second feature set.
According to some embodiments, the method for calculating the abstract includes determining weights for corresponding nodes according to node types in a topological sorting result and encoding to form type encoding-weight pairs; multiplying the type code by a weight for each type code-weight pair; for each topological sorting result, adding all weight multiplication results in a row; and (5) reducing the dimension of the column adding result to obtain the abstract.
At S103, the target network model is processed to obtain a first target feature and a second target feature.
With reference to the foregoing description, similarly, the first target feature includes a topology ordering result of the directed acyclic graph corresponding to the target network model, and the second feature includes a summary of the topology ordering result obtained according to the summary algorithm.
According to some embodiments, the target network model is parsed to obtain a second directed acyclic graph. And carrying out topological sorting on the second directed acyclic graph to obtain the first target feature. And calculating the abstract of the first target feature to obtain the second target feature. The topology ordering method and the abstract algorithm are the same as the previous ones, and are not repeated.
According to some embodiments, if the length of the second directed acyclic graph is zero, the target network model does not belong to any typical model, has no computational significance, and exits the computation.
At S105, a comprehensive similarity and side-by-side of a target network model and the at least one predetermined network model are calculated based on the first and second feature sets and the first and second target features.
According to some embodiments, performing a first similarity distance calculation on the first target feature and an element in the first feature set to obtain a first similarity;
according to some embodiments, a second similarity distance calculation is performed between the second target feature and an element in the second feature set, so as to obtain a second similarity.
According to some embodiments, the first similarity is multiplied by a first predetermined weight coefficient to obtain a first weight similarity, the second similarity is multiplied by a second predetermined weight coefficient to obtain a second weight similarity, and the sum obtained by adding the first weight similarity and the second weight similarity is used as the integrated similarity.
According to some embodiments, similarity calculation is performed on the features of the target network model and the features obtained by preprocessing the predetermined network models in the set respectively. For example: f is the target network model, { A, B, C, D, E } is a set of predetermined network types, and feature similarity of F and A, B, C, D, E, e.g., {0.3,0.7,0.5,0.4,0.1} is calculated and ordered, respectively.
According to some embodiments, the integrated similarity is topologically ordered after it is obtained. For example, in the example above, {0.3,0.7,0.5,0.4,0.1} is ordered.
At S107, a network type of the target network model is determined according to the integrated similarity.
According to some embodiments, a respective predetermined network model when the integrated similarity is greatest and the integrated similarity is greater than a first threshold is taken as the network type of the target network model. For example, in the above example, F has the highest similarity with the B type and is greater than the first threshold (e.g., 0.5), and F may be considered to be of the B type.
And in S109, performing corresponding classification compiling on the target network model according to the determined network type.
According to some embodiments, after determining the network type of the target network model, corresponding classification compiling may be performed according to the network type, so as to complete corresponding compiling optimization, and make the compiled model run faster on hardware.
FIG. 2 illustrates a method flow diagram for preprocessing a predetermined network model in accordance with an exemplary embodiment.
Referring to fig. 2, at S201, the at least one predetermined network model is parsed, and each predetermined network model is parsed into a corresponding first directed acyclic graph.
The network model that is preprocessed is a neural network model in ONNX format, which is an open file format designed for machine learning for storing trained models. It allows different artificial intelligence frameworks (e.g., pytorch, MXNet) to store and interact with model data in the same format. ONNX may enable models to be transferred between different frameworks, allowing interactions between multiple frameworks and tools. The current mainstream deep learning and machine learning frameworks both support deriving ONNX models, suitable for storing trained models. And meanwhile, the ONNX model can be converted into a specific software and hardware computing platform to deploy the AI application.
ONNX provides standard operators, methods, and data types for representing computational graph models. The network model may be represented as a Directed Acyclic Graph (DAG), in which nodes (nodes) represent operators and edges represent the flow direction of data. Meanwhile, ONNX also supports operator extension to support a self-defined calculation method. And (3) analyzing the network model in the ONNX format during preprocessing to obtain the directed acyclic graph.
In graph theory and computer science, directed acyclic graph Directed Acyclic Graph (DAG) is a directed graph without directed loops. It consists of nodes Vertex and edges, each pointing from one node to the other, without forming a closed Loop along the direction of these nodes, see fig. 5b (including nodes 2, 3, 5, 7, 8, 9, 10, 11). The edges of the directional finger connection nodes are directional. When there is a path from one node u to another node v, node v is said to be reachable from node u. A path is a ring if it can reach itself from it.
Referring to fig. 5a, taking a fragment of the network model of Resnet as an example, the directed acyclic graph shown in fig. 5a can be obtained by parsing.
At S203, each first directed acyclic graph is topologically ordered, and the resulting topologically ordered results are added to the first feature set.
According to an example embodiment, the topological ordering result in the first feature set is a linear sequence of all nodes in the first directed acyclic graph.
According to an example embodiment, the linear sequence satisfies two conditions: first, each node of the directed acyclic graph appears and only appears once; second, if there is a path from the first node a to the second node B, the first node a appears in front of the second node B in the linear sequence.
Taking fig. 5a as an example, the following sequence may be obtained by topologically ordering the directed acyclic graph shown in fig. 5a by using topological ordering:
TABLE 1
The uniqueness of the results needs to be guaranteed when ordering. For this reason, when there are nodes (operators) of the same type in the network, the node name is suffixed with a sequence number, guaranteeing that the node name is unique, such as Relu1, relu2, etc. in table 1. In addition, when there are two operators with 0 degree, the operators are fetched according to the priority. For example, conv operators are important and common in networks, so priority is higher. The operator is taken by priority such that the result of the topological ordering is unique, either in positive or reverse order.
At S205, a summary is calculated for each topology sequencing result in the first feature set, and the second feature set is added.
The summarization algorithm according to an example embodiment is as described above. For example, for the topological ordering (table 1) obtained from the directed acyclic graph of fig. 5a, the Conv type operator gives weight 5, the activation function type operator gives weight 1, the pooling type operator gives weight-4, and the normalization operator gives weight-3. Type coding is for example coded according to the following table:
TABLE 2
The type code is multiplied by a weight, for example, the type code-weight pair of Conv operator is (00000001,5), and the result is (00000005). The type code-weight pair of BatchNorm is (00000011, -3), and the multiplication output is (000000-3-3). After the weight multiplication, the result is added in columns, such as Conv operator and BatchNorm operator columns, to obtain (00000005) + (000000-3-3) = (000000-32). After column addition, the results were dimension reduced. For example, if the result of the column addition is 1 or more, 1 is output, and if the result is 0 which is less than 1, (000000-32) is reduced in dimension, and then (00000001) is output.
The digest algorithm according to the example embodiment is different from the conventional digest algorithm. The conventional digest algorithm is only responsible for mapping the original content to a signature value uniformly and randomly as much as possible, and is in principle equivalent to a pseudo-random number generation algorithm. The abstract algorithm provided by the embodiment of the invention has the advantages of high efficiency and accuracy. The summary algorithm maps the high-dimensional feature vector into the low-dimensional summary value through the dimension reduction technology, so that efficient calculation is realized, and the summary algorithm is suitable for similarity calculation of large-scale network data. Moreover, the summary algorithm can accurately measure the similarity between networks, and even if small differences exist between the networks, the algorithm can sensitively capture the differences.
FIG. 3 illustrates a flowchart of processing a target network model according to an example embodiment.
The processing of the target network model is similar to the processing of the predetermined (typical) network model described above.
Referring to fig. 3, in S301, the target network model is parsed to obtain a second directed acyclic graph.
The method for resolving the target network model into the second directed acyclic graph is the same as the above method, and is not described herein. And performing topological sorting after the second directed acyclic graph is obtained.
At S303, the second directed acyclic graph is topologically ordered to obtain the first target feature.
Referring to the previous description, when nodes of the same type exist in the network, the node name is added with a sequence number suffix, so that the node name is ensured to be unique. When there are two operators with 0 degree, the operators are fetched according to the priority.
And in S305, calculating a summary of the first target feature to obtain the second target feature.
The method for calculating the digest value for the first target feature is the same as the above method, and will not be described here. And after the second target feature is obtained, subsequent similarity calculation can be performed.
FIG. 4 illustrates a method flow diagram for integrated similarity calculation according to an example embodiment.
Referring to fig. 4, in S401, a first similarity distance is calculated between the first target feature and an element in the first feature set, so as to obtain a first similarity.
According to some embodiments, the first similarity distance calculation is a jaccard similarity calculation.
The Jacquard similarity (Jaccard similarity coefficient) is an index for measuring the difference between two sets, and is calculated by calculating the proportion of the same element in the two sets to all elements.
For example, set a is (1, 2, 3) and set B is (1, 3, 4), then the same element set in both sets is (1, 3), and all set elements are (1, 2, 3, 4). The number of the same elements in the two sets is 2, and the number of all elements is 4, so that the jeckard similarity is 2/4=0.5.
And S403, performing second similarity distance calculation on the second target feature and the elements in the second feature set to obtain second similarity.
According to some embodiments, the second similarity distance calculation is a hamming distance calculation.
The Hamming distance is the number of different corresponding position values after being converted into binary system.
For example, assuming that there are two decimal numbers a=93 and b=73, if these two numbers are represented by binary numbers, there are a= 1011101 and b=1001001, and it can be seen that the 3 rd and 5 th bits of the numbers from right to left are different (the numbers from 1), and thus the hamming distance of a and b is 2, and the result of dividing the integer 1 by the hamming distance is the similarity.
In S405, the first similarity is multiplied by a first predetermined weight coefficient to obtain a first weight similarity, and the second similarity is multiplied by a second predetermined weight coefficient to obtain a second weight similarity.
According to some embodiments, the first predetermined weight coefficient + the second predetermined weight coefficient = 1.
In S407, the sum obtained by adding the first weight similarity and the second weight similarity is used as the integrated similarity.
According to some embodiments, the integrated similarity formula is: integrated similarity = first predetermined weight coefficient first similarity + second predetermined weight coefficient second similarity.
FIG. 6 illustrates a block diagram of a computing device according to an example embodiment of the invention.
As shown in fig. 6, computing device 30 includes processor 12 and memory 14. Computing device 30 may also include a bus 22, a network interface 16, and an I/O interface 18. The processor 12, memory 14, network interface 16, and I/O interface 18 may communicate with each other via a bus 22.
The processor 12 may include one or more general purpose CPUs (Central Processing Unit, processors), microprocessors, or application specific integrated circuits, etc. for executing relevant program instructions. According to some embodiments, computing device 30 may also include a high performance display adapter (GPU) 20 that accelerates processor 12.
Memory 14 may include machine-system-readable media in the form of volatile memory, such as Random Access Memory (RAM), read Only Memory (ROM), and/or cache memory. Memory 14 is used to store one or more programs including instructions as well as data. The processor 12 may read instructions stored in the memory 14 to perform the methods according to embodiments of the invention described above.
Computing device 30 may also communicate with one or more networks through network interface 16. The network interface 16 may be a wireless network interface.
Bus 22 may be a bus including an address bus, a data bus, a control bus, etc. Bus 22 provides a path for exchanging information between the components.
It should be noted that, in the implementation, the computing device 30 may further include other components necessary to achieve normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method. The computer readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), network storage devices, cloud storage devices, or any type of media or device suitable for storing instructions and/or data.
Embodiments of the present invention also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the methods described in the method embodiments above.
It will be clear to a person skilled in the art that the solution according to the invention can be implemented by means of software and/or hardware. "Unit" and "module" in this specification refer to software and/or hardware capable of performing a specific function, either alone or in combination with other components, where the hardware may be, for example, a field programmable gate array, an integrated circuit, or the like.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
The exemplary embodiments of the present invention have been particularly shown and described above. It is to be understood that this invention is not limited to the precise arrangements, instrumentalities and instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (8)

1. A method for neural network compilation optimization, comprising:
preprocessing at least one preset network model to obtain a first feature set and a second feature set, wherein the first feature set comprises topological sorting results of the directed acyclic graph corresponding to the at least one preset network model, and the second feature set comprises abstracts of the topological sorting results obtained according to an abstracting algorithm;
processing a target network model to obtain a first target feature and a second target feature, wherein the first target feature comprises a topological ordering result of a directed acyclic graph corresponding to the target network model, and the second feature comprises a summary of the topological ordering result obtained according to the summary algorithm;
calculating the comprehensive similarity and side-by-side of a target network model and the at least one predetermined network model according to the first feature set and the second feature set and the first target feature and the second target feature;
determining the network type of the target network model according to the comprehensive similarity;
performing corresponding classification compiling on the target network model according to the determined network type,
wherein the digest algorithm maps the high-dimensional feature vector to a low-dimensional digest value by a dimension reduction technique,
wherein calculating the integrated similarity of the target network model and the at least one predetermined network model comprises:
performing first similarity distance calculation on the first target feature and elements in the first feature set to obtain first similarity;
performing second similarity distance calculation on the second target feature and elements in the second feature set to obtain second similarity;
multiplying the first similarity by a first preset weight coefficient to obtain first weight similarity;
multiplying the second similarity by a second preset weight coefficient to obtain a second weight similarity;
and adding the first weight similarity and the second weight similarity to obtain a sum which is used as the comprehensive similarity.
2. The method of claim 1, wherein preprocessing at least one predetermined network model comprises:
and analyzing the at least one preset network model, and analyzing each preset network model into a corresponding first directed acyclic graph.
3. The method of claim 2, wherein preprocessing at least one predetermined network model further comprises:
and carrying out topological sorting on each first directed acyclic graph, and adding the obtained topological sorting result into the first feature set.
4. A method according to claim 3, wherein the preprocessing of at least one predetermined network model further comprises:
and calculating a summary of each topological sorting result in the first feature set, and adding the second feature set.
5. The method of claim 1, wherein the summarization algorithm comprises:
determining weights for corresponding nodes according to node types in the topological sorting result, and coding to form type coding-weight pairs;
multiplying the type code by a weight for each type code-weight pair;
for each topological sorting result, adding all weight multiplication results in a row;
and (5) reducing the dimension of the column adding result to obtain the abstract.
6. The method of claim 1, wherein processing the target network model comprises:
analyzing the target network model to obtain a second directed acyclic graph;
performing topological sorting on the second directed acyclic graph to obtain the first target feature;
and calculating the abstract of the first target feature to obtain the second target feature.
7. The method of claim 1, wherein determining the network type of the target network model based on the integrated similarity comprises:
and taking the corresponding preset network model with the maximum comprehensive similarity and the comprehensive similarity larger than a first threshold value as the network type of the target network model.
8. A computing device, comprising:
a processor; and
a memory storing a computer program which, when executed by the processor, causes the processor to perform the method of any one of claims 1-7.
CN202311453932.7A 2023-11-03 2023-11-03 Method and computing device for neural network compilation optimization Active CN117170686B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311453932.7A CN117170686B (en) 2023-11-03 2023-11-03 Method and computing device for neural network compilation optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311453932.7A CN117170686B (en) 2023-11-03 2023-11-03 Method and computing device for neural network compilation optimization

Publications (2)

Publication Number Publication Date
CN117170686A CN117170686A (en) 2023-12-05
CN117170686B true CN117170686B (en) 2024-03-12

Family

ID=88939939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311453932.7A Active CN117170686B (en) 2023-11-03 2023-11-03 Method and computing device for neural network compilation optimization

Country Status (1)

Country Link
CN (1) CN117170686B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766147A (en) * 2018-07-25 2020-02-07 赛灵思公司 Neural network compiler architecture and compiling method
CN111814842A (en) * 2020-06-17 2020-10-23 北京邮电大学 Object classification method and device based on multi-pass graph convolution neural network
CN113703775A (en) * 2021-08-31 2021-11-26 上海阵量智能科技有限公司 Compiling method, device, equipment and storage medium
CN113703741A (en) * 2021-10-29 2021-11-26 深圳思谋信息科技有限公司 Neural network compiler configuration method and device, computer equipment and storage medium
WO2022087788A1 (en) * 2020-10-26 2022-05-05 华为技术有限公司 Neural network compiling optimization method and related apparatus
CN114595751A (en) * 2022-02-28 2022-06-07 思创数码科技股份有限公司 Node classification method, system, readable storage medium and computer device
WO2022234209A1 (en) * 2021-05-05 2022-11-10 Centre National d'Études Spatiales Computer-implemented method for automatically determining a target architecture
WO2023185842A1 (en) * 2022-03-28 2023-10-05 中兴通讯股份有限公司 Method for determining compile optimization option, electronic device and readable storage medium
CN116932909A (en) * 2023-07-26 2023-10-24 中国工商银行股份有限公司 Model recommendation method and device, processor and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766147A (en) * 2018-07-25 2020-02-07 赛灵思公司 Neural network compiler architecture and compiling method
CN111814842A (en) * 2020-06-17 2020-10-23 北京邮电大学 Object classification method and device based on multi-pass graph convolution neural network
WO2022087788A1 (en) * 2020-10-26 2022-05-05 华为技术有限公司 Neural network compiling optimization method and related apparatus
CN116368494A (en) * 2020-10-26 2023-06-30 华为技术有限公司 Neural network compiling optimization method and related device
WO2022234209A1 (en) * 2021-05-05 2022-11-10 Centre National d'Études Spatiales Computer-implemented method for automatically determining a target architecture
CN113703775A (en) * 2021-08-31 2021-11-26 上海阵量智能科技有限公司 Compiling method, device, equipment and storage medium
CN113703741A (en) * 2021-10-29 2021-11-26 深圳思谋信息科技有限公司 Neural network compiler configuration method and device, computer equipment and storage medium
CN114595751A (en) * 2022-02-28 2022-06-07 思创数码科技股份有限公司 Node classification method, system, readable storage medium and computer device
WO2023185842A1 (en) * 2022-03-28 2023-10-05 中兴通讯股份有限公司 Method for determining compile optimization option, electronic device and readable storage medium
CN116932909A (en) * 2023-07-26 2023-10-24 中国工商银行股份有限公司 Model recommendation method and device, processor and electronic equipment

Also Published As

Publication number Publication date
CN117170686A (en) 2023-12-05

Similar Documents

Publication Publication Date Title
Ediger et al. Massive streaming data analytics: A case study with clustering coefficients
US20100042964A1 (en) Reuse of circuit labels in subcircuit recognition
CN110110213A (en) Excavate method, apparatus, computer readable storage medium and the terminal device of user's occupation
CN111045670B (en) Method and device for identifying multiplexing relationship between binary code and source code
Solanki et al. Comparative study of software clone detection techniques
Aboy et al. Optimizations in cusnp simulator for spiking neural p systems on cuda gpus
CN113283675A (en) Index data analysis method, device, equipment and storage medium
CN115859302A (en) Source code vulnerability detection method, device, equipment and storage medium
Oliveira et al. An evaluation of four reordering algorithms to reduce the computational cost of the Jacobi-preconditioned conjugate gradient method using high-precision arithmetic
Putelli et al. The Impact of Self-Interaction Attention on the Extraction of Drug-Drug Interactions.
CN117170686B (en) Method and computing device for neural network compilation optimization
Bernard et al. Stochastic L-system inference from multiple string sequence inputs
Zhao Fast correlation function calculator-A high-performance pair-counting toolkit
CN111739646A (en) Data verification method and device, computer equipment and readable storage medium
CN113076089B (en) API (application program interface) completion method based on object type
CN110221838B (en) Method for carrying out automatic program design optimization based on genetic algorithm and directed acyclic graph
CN110968690B (en) Clustering division method and device for words, equipment and storage medium
Kloster et al. A nearly-sublinear method for approximating a column of the matrix exponential for matrices from large, sparse networks
CN111723247A (en) Graph-based hypothetical computation
Kang et al. Novel sampling method for the von Mises–Fisher distribution
CN114936220B (en) Search method and device for Boolean satisfiability problem solution, electronic equipment and medium
Onodera et al. Data on the solution and processing time reached when constructing a phylogenetic tree using a quantum-inspired computer
CN115469931B (en) Instruction optimization method, device, system, equipment and medium of loop program
Babalad et al. Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures
Pham et al. A polynomial time parallel algorithm for graph isomorphism using a quasipolynomial number of processors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant