CN117170686B - Method and computing device for neural network compilation optimization - Google Patents
Method and computing device for neural network compilation optimization Download PDFInfo
- Publication number
- CN117170686B CN117170686B CN202311453932.7A CN202311453932A CN117170686B CN 117170686 B CN117170686 B CN 117170686B CN 202311453932 A CN202311453932 A CN 202311453932A CN 117170686 B CN117170686 B CN 117170686B
- Authority
- CN
- China
- Prior art keywords
- network model
- similarity
- target
- feature
- feature set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 19
- 238000005457 optimization Methods 0.000 title claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 27
- 238000004364 calculation method Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 5
- 230000009467 reduction Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 9
- 238000003062 neural network model Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 229910002056 binary alloy Inorganic materials 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a neural network compiling optimization method and computing equipment, wherein the method comprises the following steps: preprocessing at least one preset network model to obtain a first feature set and a second feature set; processing the target network model to obtain a first target feature and a second target feature; calculating the comprehensive similarity and side-by-side of a target network model and the at least one predetermined network model according to the first feature set and the second feature set and the first target feature and the second target feature; determining the network type of the target network model according to the comprehensive similarity; and performing corresponding classification compiling on the target network model according to the determined network type. According to the technical scheme, the target network model can be classified, so that corresponding compiling optimization is carried out on the target network model to be compiled according to the classification result.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a neural network compiling optimization method and computing equipment.
Background
In the application of the neural network, the common development flow is as follows: developing a corresponding algorithm by an algorithm engineer according to the requirements; compiling an algorithm by an application engineer to obtain an algorithm model capable of running on the equipment; the algorithm model is deployed to the designated device. The neural network can be divided into uses such as image classification, image segmentation, semantic segmentation and the like according to different uses of the network, and a plurality of typical networks exist in the network of each use. At present, the compiling process in the common neural network development is not distinguished according to the type of the network, but is uniformly compiled.
Disclosure of Invention
The invention aims to provide a method and computing equipment for compiling and optimizing a neural network, which can distinguish types of the neural network model so as to perform corresponding optimizing and compiling.
According to an aspect of the present invention, there is provided a method of neural network compilation optimization, the method comprising:
preprocessing at least one preset network model to obtain a first feature set and a second feature set;
processing the target network model to obtain a first target feature and a second target feature;
calculating the comprehensive similarity and side-by-side of a target network model and the at least one predetermined network model according to the first feature set and the second feature set and the first target feature and the second target feature;
determining the network type of the target network model according to the comprehensive similarity;
and performing corresponding classification compiling on the target network model according to the determined network type.
According to some embodiments, the first feature set includes a topology ordering result of the directed acyclic graph corresponding to the at least one predetermined network model, and the second feature set includes a summary of each of the topology ordering results obtained according to a summary algorithm;
the first target feature comprises a topological ordering result of the directed acyclic graph corresponding to the target network model, and the second feature comprises a summary of the topological ordering result obtained according to the summary algorithm.
According to some embodiments, preprocessing at least one predetermined network model comprises:
and analyzing the at least one preset network model, and analyzing each preset network model into a corresponding first directed acyclic graph.
According to some embodiments, preprocessing the at least one predetermined network model further comprises:
and carrying out topological sorting on each first directed acyclic graph, and adding the obtained topological sorting result into the first feature set.
According to some embodiments, preprocessing the at least one predetermined network model further comprises:
and calculating a summary of each topological sorting result in the first feature set, and adding the second feature set.
According to some embodiments, the summarization algorithm comprises:
determining weights for corresponding nodes according to node types in the topological sorting result, and coding to form type coding-weight pairs;
multiplying the type code by a weight for each type code-weight pair;
for each topological sorting result, adding all weight multiplication results in a row;
and (5) reducing the dimension of the column adding result to obtain the abstract.
According to some embodiments, processing a target network model includes:
analyzing the target network model to obtain a second directed acyclic graph;
performing topological sorting on the second directed acyclic graph to obtain the first target feature;
and calculating the abstract of the first target feature to obtain the second target feature.
According to some embodiments, calculating the integrated similarity of the target network model and the at least one predetermined network model comprises:
performing first similarity distance calculation on the first target feature and elements in the first feature set to obtain first similarity;
and carrying out second similarity distance calculation on the second target feature and elements in the second feature set to obtain second similarity.
According to some embodiments, calculating the integrated similarity of the target network model and the at least one predetermined network model further comprises:
multiplying the first similarity by a first preset weight coefficient to obtain first weight similarity;
multiplying the second similarity by a second preset weight coefficient to obtain a second weight similarity;
and adding the first weight similarity and the second weight similarity to obtain a sum which is used as the comprehensive similarity.
According to some embodiments, determining the network type of the target network model from the integrated similarity comprises:
and taking the corresponding preset network model with the maximum comprehensive similarity and the comprehensive similarity larger than a first threshold value as the network type of the target network model.
According to another aspect of the present invention, there is provided a computing device comprising:
a processor; and
a memory storing a computer program which, when executed by the processor, causes the processor to perform the method of any one of the preceding claims.
According to another aspect of the invention there is provided a non-transitory computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, cause the processor to perform the method of any of the above.
The common compiling process is not distinguished according to the type of the network, but is uniformly compiled. According to the embodiment of the invention, the neural network is firstly distinguished according to the type of the network in the compiling process, so that the compiler can correspondingly optimize the hardware characteristics according to the type of the network, and compile a network model which runs faster on hardware. For this purpose, the solution according to the exemplary embodiment performs preprocessing on the predetermined network model, and then processes the target network model, and calculates the integrated similarity between the target network model and each predetermined network model according to the foregoing preprocessing and calculation results. And determining the network type of the target network model according to the comprehensive similarity, so that the classification of the network type can be better completed, the compiling is optimized, and the compiled network model can be operated on hardware faster.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below.
FIG. 1 illustrates a method flow diagram for neural network compilation optimization, according to an example embodiment.
FIG. 2 illustrates a method flow diagram for preprocessing a predetermined network model in accordance with an exemplary embodiment.
FIG. 3 illustrates a flow chart of a method of processing a target network model according to an example embodiment.
FIG. 4 illustrates a method flow diagram for integrated similarity calculation according to an example embodiment.
Fig. 5a shows a schematic structural diagram of a directed acyclic graph of a network model according to an example embodiment.
Fig. 5b shows a schematic diagram according to a directed acyclic graph.
FIG. 6 illustrates a block diagram of a computing device in accordance with an exemplary embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another element. Accordingly, a first component discussed below could be termed a second component without departing from the teachings of the present inventive concept. As used herein, the term "and/or" includes any one of the associated listed items and all combinations of one or more.
The user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present invention are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of related data is required to comply with the relevant laws and regulations and standards of the relevant country and region, and is provided with corresponding operation entries for the user to select authorization or rejection.
Those skilled in the art will appreciate that the drawings are schematic representations of example embodiments and that the modules or flows in the drawings are not necessarily required to practice the invention and therefore should not be taken to limit the scope of the invention.
At present, the neural network is one of common calculation models in the field of artificial intelligence, and the neural network model is compiled by describing a special model of the neural network algorithm, namely the neural network algorithm is converted into a general calculation map, the calculation map is optimized, and the optimized calculation map is mapped into instructions and machine codes executable by a back-end hardware platform, so that the neural network model is converted into codes which are executed on the calculation platform. Neural network compilation optimization is a computational process that is complex to process and takes a long time. The common compiling process is not distinguished according to the type of the network, but is uniformly compiled. According to the embodiment of the invention, the neural network is firstly distinguished according to the type of the network in the compiling process, so that the compiler can correspondingly optimize the hardware characteristics according to the type of the network, and compile a network model which runs faster on hardware.
Therefore, the invention provides a neural network compiling optimization method, which carries out corresponding compiling optimization on the target network model to be compiled according to different classification results, so that the compiled model can run faster on hardware. According to the embodiment of the invention, the comprehensive similarity between the target network model to be compiled and each preset typical network model is calculated first, then the network type of the target network model is determined according to the comprehensive similarity, and then corresponding compiling optimization is carried out.
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings.
FIG. 1 illustrates a method flow diagram for neural network compilation optimization, according to an example embodiment.
Referring to fig. 1, at S101, at least one predetermined network model is preprocessed to obtain a first feature set and a second feature set.
According to some embodiments, the first feature set includes a topology ordering result of the directed acyclic graph corresponding to the at least one predetermined network model, and the second feature set includes a summary of each of the topology ordering results obtained according to a summary algorithm.
According to some embodiments, predetermined network models are parsed, each predetermined network model parsed into a first Directed Acyclic Graph (DAG).
And then, carrying out topological sorting on each first directed acyclic graph, and adding the obtained topological sorting result into the first feature set.
According to some embodiments, the process of topology ordering includes selecting a node from the DAG that has no precursor (i.e., degree of ingress is 0) and outputting; deleting the node and all directed edges starting from the graph; the previous two steps are repeated until the current DAG is empty or there are no nodes in the current graph that are not predecessor.
Thereafter, a summary may be calculated for each topology sequencing result in the first feature set, joining the second feature set.
According to some embodiments, the method for calculating the abstract includes determining weights for corresponding nodes according to node types in a topological sorting result and encoding to form type encoding-weight pairs; multiplying the type code by a weight for each type code-weight pair; for each topological sorting result, adding all weight multiplication results in a row; and (5) reducing the dimension of the column adding result to obtain the abstract.
At S103, the target network model is processed to obtain a first target feature and a second target feature.
With reference to the foregoing description, similarly, the first target feature includes a topology ordering result of the directed acyclic graph corresponding to the target network model, and the second feature includes a summary of the topology ordering result obtained according to the summary algorithm.
According to some embodiments, the target network model is parsed to obtain a second directed acyclic graph. And carrying out topological sorting on the second directed acyclic graph to obtain the first target feature. And calculating the abstract of the first target feature to obtain the second target feature. The topology ordering method and the abstract algorithm are the same as the previous ones, and are not repeated.
According to some embodiments, if the length of the second directed acyclic graph is zero, the target network model does not belong to any typical model, has no computational significance, and exits the computation.
At S105, a comprehensive similarity and side-by-side of a target network model and the at least one predetermined network model are calculated based on the first and second feature sets and the first and second target features.
According to some embodiments, performing a first similarity distance calculation on the first target feature and an element in the first feature set to obtain a first similarity;
according to some embodiments, a second similarity distance calculation is performed between the second target feature and an element in the second feature set, so as to obtain a second similarity.
According to some embodiments, the first similarity is multiplied by a first predetermined weight coefficient to obtain a first weight similarity, the second similarity is multiplied by a second predetermined weight coefficient to obtain a second weight similarity, and the sum obtained by adding the first weight similarity and the second weight similarity is used as the integrated similarity.
According to some embodiments, similarity calculation is performed on the features of the target network model and the features obtained by preprocessing the predetermined network models in the set respectively. For example: f is the target network model, { A, B, C, D, E } is a set of predetermined network types, and feature similarity of F and A, B, C, D, E, e.g., {0.3,0.7,0.5,0.4,0.1} is calculated and ordered, respectively.
According to some embodiments, the integrated similarity is topologically ordered after it is obtained. For example, in the example above, {0.3,0.7,0.5,0.4,0.1} is ordered.
At S107, a network type of the target network model is determined according to the integrated similarity.
According to some embodiments, a respective predetermined network model when the integrated similarity is greatest and the integrated similarity is greater than a first threshold is taken as the network type of the target network model. For example, in the above example, F has the highest similarity with the B type and is greater than the first threshold (e.g., 0.5), and F may be considered to be of the B type.
And in S109, performing corresponding classification compiling on the target network model according to the determined network type.
According to some embodiments, after determining the network type of the target network model, corresponding classification compiling may be performed according to the network type, so as to complete corresponding compiling optimization, and make the compiled model run faster on hardware.
FIG. 2 illustrates a method flow diagram for preprocessing a predetermined network model in accordance with an exemplary embodiment.
Referring to fig. 2, at S201, the at least one predetermined network model is parsed, and each predetermined network model is parsed into a corresponding first directed acyclic graph.
The network model that is preprocessed is a neural network model in ONNX format, which is an open file format designed for machine learning for storing trained models. It allows different artificial intelligence frameworks (e.g., pytorch, MXNet) to store and interact with model data in the same format. ONNX may enable models to be transferred between different frameworks, allowing interactions between multiple frameworks and tools. The current mainstream deep learning and machine learning frameworks both support deriving ONNX models, suitable for storing trained models. And meanwhile, the ONNX model can be converted into a specific software and hardware computing platform to deploy the AI application.
ONNX provides standard operators, methods, and data types for representing computational graph models. The network model may be represented as a Directed Acyclic Graph (DAG), in which nodes (nodes) represent operators and edges represent the flow direction of data. Meanwhile, ONNX also supports operator extension to support a self-defined calculation method. And (3) analyzing the network model in the ONNX format during preprocessing to obtain the directed acyclic graph.
In graph theory and computer science, directed acyclic graph Directed Acyclic Graph (DAG) is a directed graph without directed loops. It consists of nodes Vertex and edges, each pointing from one node to the other, without forming a closed Loop along the direction of these nodes, see fig. 5b (including nodes 2, 3, 5, 7, 8, 9, 10, 11). The edges of the directional finger connection nodes are directional. When there is a path from one node u to another node v, node v is said to be reachable from node u. A path is a ring if it can reach itself from it.
Referring to fig. 5a, taking a fragment of the network model of Resnet as an example, the directed acyclic graph shown in fig. 5a can be obtained by parsing.
At S203, each first directed acyclic graph is topologically ordered, and the resulting topologically ordered results are added to the first feature set.
According to an example embodiment, the topological ordering result in the first feature set is a linear sequence of all nodes in the first directed acyclic graph.
According to an example embodiment, the linear sequence satisfies two conditions: first, each node of the directed acyclic graph appears and only appears once; second, if there is a path from the first node a to the second node B, the first node a appears in front of the second node B in the linear sequence.
Taking fig. 5a as an example, the following sequence may be obtained by topologically ordering the directed acyclic graph shown in fig. 5a by using topological ordering:
TABLE 1
The uniqueness of the results needs to be guaranteed when ordering. For this reason, when there are nodes (operators) of the same type in the network, the node name is suffixed with a sequence number, guaranteeing that the node name is unique, such as Relu1, relu2, etc. in table 1. In addition, when there are two operators with 0 degree, the operators are fetched according to the priority. For example, conv operators are important and common in networks, so priority is higher. The operator is taken by priority such that the result of the topological ordering is unique, either in positive or reverse order.
At S205, a summary is calculated for each topology sequencing result in the first feature set, and the second feature set is added.
The summarization algorithm according to an example embodiment is as described above. For example, for the topological ordering (table 1) obtained from the directed acyclic graph of fig. 5a, the Conv type operator gives weight 5, the activation function type operator gives weight 1, the pooling type operator gives weight-4, and the normalization operator gives weight-3. Type coding is for example coded according to the following table:
TABLE 2
The type code is multiplied by a weight, for example, the type code-weight pair of Conv operator is (00000001,5), and the result is (00000005). The type code-weight pair of BatchNorm is (00000011, -3), and the multiplication output is (000000-3-3). After the weight multiplication, the result is added in columns, such as Conv operator and BatchNorm operator columns, to obtain (00000005) + (000000-3-3) = (000000-32). After column addition, the results were dimension reduced. For example, if the result of the column addition is 1 or more, 1 is output, and if the result is 0 which is less than 1, (000000-32) is reduced in dimension, and then (00000001) is output.
The digest algorithm according to the example embodiment is different from the conventional digest algorithm. The conventional digest algorithm is only responsible for mapping the original content to a signature value uniformly and randomly as much as possible, and is in principle equivalent to a pseudo-random number generation algorithm. The abstract algorithm provided by the embodiment of the invention has the advantages of high efficiency and accuracy. The summary algorithm maps the high-dimensional feature vector into the low-dimensional summary value through the dimension reduction technology, so that efficient calculation is realized, and the summary algorithm is suitable for similarity calculation of large-scale network data. Moreover, the summary algorithm can accurately measure the similarity between networks, and even if small differences exist between the networks, the algorithm can sensitively capture the differences.
FIG. 3 illustrates a flowchart of processing a target network model according to an example embodiment.
The processing of the target network model is similar to the processing of the predetermined (typical) network model described above.
Referring to fig. 3, in S301, the target network model is parsed to obtain a second directed acyclic graph.
The method for resolving the target network model into the second directed acyclic graph is the same as the above method, and is not described herein. And performing topological sorting after the second directed acyclic graph is obtained.
At S303, the second directed acyclic graph is topologically ordered to obtain the first target feature.
Referring to the previous description, when nodes of the same type exist in the network, the node name is added with a sequence number suffix, so that the node name is ensured to be unique. When there are two operators with 0 degree, the operators are fetched according to the priority.
And in S305, calculating a summary of the first target feature to obtain the second target feature.
The method for calculating the digest value for the first target feature is the same as the above method, and will not be described here. And after the second target feature is obtained, subsequent similarity calculation can be performed.
FIG. 4 illustrates a method flow diagram for integrated similarity calculation according to an example embodiment.
Referring to fig. 4, in S401, a first similarity distance is calculated between the first target feature and an element in the first feature set, so as to obtain a first similarity.
According to some embodiments, the first similarity distance calculation is a jaccard similarity calculation.
The Jacquard similarity (Jaccard similarity coefficient) is an index for measuring the difference between two sets, and is calculated by calculating the proportion of the same element in the two sets to all elements.
For example, set a is (1, 2, 3) and set B is (1, 3, 4), then the same element set in both sets is (1, 3), and all set elements are (1, 2, 3, 4). The number of the same elements in the two sets is 2, and the number of all elements is 4, so that the jeckard similarity is 2/4=0.5.
And S403, performing second similarity distance calculation on the second target feature and the elements in the second feature set to obtain second similarity.
According to some embodiments, the second similarity distance calculation is a hamming distance calculation.
The Hamming distance is the number of different corresponding position values after being converted into binary system.
For example, assuming that there are two decimal numbers a=93 and b=73, if these two numbers are represented by binary numbers, there are a= 1011101 and b=1001001, and it can be seen that the 3 rd and 5 th bits of the numbers from right to left are different (the numbers from 1), and thus the hamming distance of a and b is 2, and the result of dividing the integer 1 by the hamming distance is the similarity.
In S405, the first similarity is multiplied by a first predetermined weight coefficient to obtain a first weight similarity, and the second similarity is multiplied by a second predetermined weight coefficient to obtain a second weight similarity.
According to some embodiments, the first predetermined weight coefficient + the second predetermined weight coefficient = 1.
In S407, the sum obtained by adding the first weight similarity and the second weight similarity is used as the integrated similarity.
According to some embodiments, the integrated similarity formula is: integrated similarity = first predetermined weight coefficient first similarity + second predetermined weight coefficient second similarity.
FIG. 6 illustrates a block diagram of a computing device according to an example embodiment of the invention.
As shown in fig. 6, computing device 30 includes processor 12 and memory 14. Computing device 30 may also include a bus 22, a network interface 16, and an I/O interface 18. The processor 12, memory 14, network interface 16, and I/O interface 18 may communicate with each other via a bus 22.
The processor 12 may include one or more general purpose CPUs (Central Processing Unit, processors), microprocessors, or application specific integrated circuits, etc. for executing relevant program instructions. According to some embodiments, computing device 30 may also include a high performance display adapter (GPU) 20 that accelerates processor 12.
Memory 14 may include machine-system-readable media in the form of volatile memory, such as Random Access Memory (RAM), read Only Memory (ROM), and/or cache memory. Memory 14 is used to store one or more programs including instructions as well as data. The processor 12 may read instructions stored in the memory 14 to perform the methods according to embodiments of the invention described above.
Computing device 30 may also communicate with one or more networks through network interface 16. The network interface 16 may be a wireless network interface.
Bus 22 may be a bus including an address bus, a data bus, a control bus, etc. Bus 22 provides a path for exchanging information between the components.
It should be noted that, in the implementation, the computing device 30 may further include other components necessary to achieve normal operation. Furthermore, it will be understood by those skilled in the art that the above-described apparatus may include only the components necessary to implement the embodiments of the present description, and not all the components shown in the drawings.
The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method. The computer readable storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, DVDs, CD-ROMs, micro-drives, and magneto-optical disks, ROM, RAM, EPROM, EEPROM, DRAM, VRAM, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), network storage devices, cloud storage devices, or any type of media or device suitable for storing instructions and/or data.
Embodiments of the present invention also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the methods described in the method embodiments above.
It will be clear to a person skilled in the art that the solution according to the invention can be implemented by means of software and/or hardware. "Unit" and "module" in this specification refer to software and/or hardware capable of performing a specific function, either alone or in combination with other components, where the hardware may be, for example, a field programmable gate array, an integrated circuit, or the like.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
The exemplary embodiments of the present invention have been particularly shown and described above. It is to be understood that this invention is not limited to the precise arrangements, instrumentalities and instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
Claims (8)
1. A method for neural network compilation optimization, comprising:
preprocessing at least one preset network model to obtain a first feature set and a second feature set, wherein the first feature set comprises topological sorting results of the directed acyclic graph corresponding to the at least one preset network model, and the second feature set comprises abstracts of the topological sorting results obtained according to an abstracting algorithm;
processing a target network model to obtain a first target feature and a second target feature, wherein the first target feature comprises a topological ordering result of a directed acyclic graph corresponding to the target network model, and the second feature comprises a summary of the topological ordering result obtained according to the summary algorithm;
calculating the comprehensive similarity and side-by-side of a target network model and the at least one predetermined network model according to the first feature set and the second feature set and the first target feature and the second target feature;
determining the network type of the target network model according to the comprehensive similarity;
performing corresponding classification compiling on the target network model according to the determined network type,
wherein the digest algorithm maps the high-dimensional feature vector to a low-dimensional digest value by a dimension reduction technique,
wherein calculating the integrated similarity of the target network model and the at least one predetermined network model comprises:
performing first similarity distance calculation on the first target feature and elements in the first feature set to obtain first similarity;
performing second similarity distance calculation on the second target feature and elements in the second feature set to obtain second similarity;
multiplying the first similarity by a first preset weight coefficient to obtain first weight similarity;
multiplying the second similarity by a second preset weight coefficient to obtain a second weight similarity;
and adding the first weight similarity and the second weight similarity to obtain a sum which is used as the comprehensive similarity.
2. The method of claim 1, wherein preprocessing at least one predetermined network model comprises:
and analyzing the at least one preset network model, and analyzing each preset network model into a corresponding first directed acyclic graph.
3. The method of claim 2, wherein preprocessing at least one predetermined network model further comprises:
and carrying out topological sorting on each first directed acyclic graph, and adding the obtained topological sorting result into the first feature set.
4. A method according to claim 3, wherein the preprocessing of at least one predetermined network model further comprises:
and calculating a summary of each topological sorting result in the first feature set, and adding the second feature set.
5. The method of claim 1, wherein the summarization algorithm comprises:
determining weights for corresponding nodes according to node types in the topological sorting result, and coding to form type coding-weight pairs;
multiplying the type code by a weight for each type code-weight pair;
for each topological sorting result, adding all weight multiplication results in a row;
and (5) reducing the dimension of the column adding result to obtain the abstract.
6. The method of claim 1, wherein processing the target network model comprises:
analyzing the target network model to obtain a second directed acyclic graph;
performing topological sorting on the second directed acyclic graph to obtain the first target feature;
and calculating the abstract of the first target feature to obtain the second target feature.
7. The method of claim 1, wherein determining the network type of the target network model based on the integrated similarity comprises:
and taking the corresponding preset network model with the maximum comprehensive similarity and the comprehensive similarity larger than a first threshold value as the network type of the target network model.
8. A computing device, comprising:
a processor; and
a memory storing a computer program which, when executed by the processor, causes the processor to perform the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311453932.7A CN117170686B (en) | 2023-11-03 | 2023-11-03 | Method and computing device for neural network compilation optimization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311453932.7A CN117170686B (en) | 2023-11-03 | 2023-11-03 | Method and computing device for neural network compilation optimization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117170686A CN117170686A (en) | 2023-12-05 |
CN117170686B true CN117170686B (en) | 2024-03-12 |
Family
ID=88939939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311453932.7A Active CN117170686B (en) | 2023-11-03 | 2023-11-03 | Method and computing device for neural network compilation optimization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117170686B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110766147A (en) * | 2018-07-25 | 2020-02-07 | 赛灵思公司 | Neural network compiler architecture and compiling method |
CN111814842A (en) * | 2020-06-17 | 2020-10-23 | 北京邮电大学 | Object classification method and device based on multi-pass graph convolution neural network |
CN113703775A (en) * | 2021-08-31 | 2021-11-26 | 上海阵量智能科技有限公司 | Compiling method, device, equipment and storage medium |
CN113703741A (en) * | 2021-10-29 | 2021-11-26 | 深圳思谋信息科技有限公司 | Neural network compiler configuration method and device, computer equipment and storage medium |
WO2022087788A1 (en) * | 2020-10-26 | 2022-05-05 | 华为技术有限公司 | Neural network compiling optimization method and related apparatus |
CN114595751A (en) * | 2022-02-28 | 2022-06-07 | 思创数码科技股份有限公司 | Node classification method, system, readable storage medium and computer device |
WO2022234209A1 (en) * | 2021-05-05 | 2022-11-10 | Centre National d'Études Spatiales | Computer-implemented method for automatically determining a target architecture |
WO2023185842A1 (en) * | 2022-03-28 | 2023-10-05 | 中兴通讯股份有限公司 | Method for determining compile optimization option, electronic device and readable storage medium |
CN116932909A (en) * | 2023-07-26 | 2023-10-24 | 中国工商银行股份有限公司 | Model recommendation method and device, processor and electronic equipment |
-
2023
- 2023-11-03 CN CN202311453932.7A patent/CN117170686B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110766147A (en) * | 2018-07-25 | 2020-02-07 | 赛灵思公司 | Neural network compiler architecture and compiling method |
CN111814842A (en) * | 2020-06-17 | 2020-10-23 | 北京邮电大学 | Object classification method and device based on multi-pass graph convolution neural network |
WO2022087788A1 (en) * | 2020-10-26 | 2022-05-05 | 华为技术有限公司 | Neural network compiling optimization method and related apparatus |
CN116368494A (en) * | 2020-10-26 | 2023-06-30 | 华为技术有限公司 | Neural network compiling optimization method and related device |
WO2022234209A1 (en) * | 2021-05-05 | 2022-11-10 | Centre National d'Études Spatiales | Computer-implemented method for automatically determining a target architecture |
CN113703775A (en) * | 2021-08-31 | 2021-11-26 | 上海阵量智能科技有限公司 | Compiling method, device, equipment and storage medium |
CN113703741A (en) * | 2021-10-29 | 2021-11-26 | 深圳思谋信息科技有限公司 | Neural network compiler configuration method and device, computer equipment and storage medium |
CN114595751A (en) * | 2022-02-28 | 2022-06-07 | 思创数码科技股份有限公司 | Node classification method, system, readable storage medium and computer device |
WO2023185842A1 (en) * | 2022-03-28 | 2023-10-05 | 中兴通讯股份有限公司 | Method for determining compile optimization option, electronic device and readable storage medium |
CN116932909A (en) * | 2023-07-26 | 2023-10-24 | 中国工商银行股份有限公司 | Model recommendation method and device, processor and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN117170686A (en) | 2023-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ediger et al. | Massive streaming data analytics: A case study with clustering coefficients | |
US20100042964A1 (en) | Reuse of circuit labels in subcircuit recognition | |
CN110110213A (en) | Excavate method, apparatus, computer readable storage medium and the terminal device of user's occupation | |
CN111045670B (en) | Method and device for identifying multiplexing relationship between binary code and source code | |
Solanki et al. | Comparative study of software clone detection techniques | |
Aboy et al. | Optimizations in cusnp simulator for spiking neural p systems on cuda gpus | |
CN113283675A (en) | Index data analysis method, device, equipment and storage medium | |
CN115859302A (en) | Source code vulnerability detection method, device, equipment and storage medium | |
Oliveira et al. | An evaluation of four reordering algorithms to reduce the computational cost of the Jacobi-preconditioned conjugate gradient method using high-precision arithmetic | |
Putelli et al. | The Impact of Self-Interaction Attention on the Extraction of Drug-Drug Interactions. | |
CN117170686B (en) | Method and computing device for neural network compilation optimization | |
Bernard et al. | Stochastic L-system inference from multiple string sequence inputs | |
Zhao | Fast correlation function calculator-A high-performance pair-counting toolkit | |
CN111739646A (en) | Data verification method and device, computer equipment and readable storage medium | |
CN113076089B (en) | API (application program interface) completion method based on object type | |
CN110221838B (en) | Method for carrying out automatic program design optimization based on genetic algorithm and directed acyclic graph | |
CN110968690B (en) | Clustering division method and device for words, equipment and storage medium | |
Kloster et al. | A nearly-sublinear method for approximating a column of the matrix exponential for matrices from large, sparse networks | |
CN111723247A (en) | Graph-based hypothetical computation | |
Kang et al. | Novel sampling method for the von Mises–Fisher distribution | |
CN114936220B (en) | Search method and device for Boolean satisfiability problem solution, electronic equipment and medium | |
Onodera et al. | Data on the solution and processing time reached when constructing a phylogenetic tree using a quantum-inspired computer | |
CN115469931B (en) | Instruction optimization method, device, system, equipment and medium of loop program | |
Babalad et al. | Tile Size and Loop Order Selection using Machine Learning for Multi-/Many-Core Architectures | |
Pham et al. | A polynomial time parallel algorithm for graph isomorphism using a quasipolynomial number of processors |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |