CN115904394B - Neural network increment compiling method and device for many-core architecture - Google Patents

Neural network increment compiling method and device for many-core architecture Download PDF

Info

Publication number
CN115904394B
CN115904394B CN202310191337.4A CN202310191337A CN115904394B CN 115904394 B CN115904394 B CN 115904394B CN 202310191337 A CN202310191337 A CN 202310191337A CN 115904394 B CN115904394 B CN 115904394B
Authority
CN
China
Prior art keywords
increment
incremental
network
compiling
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310191337.4A
Other languages
Chinese (zh)
Other versions
CN115904394A (en
Inventor
何煜坤
李莹
马德
章明
孙世春
金孝飞
邓水光
潘纲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Zhejiang Lab
Original Assignee
Zhejiang University ZJU
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU, Zhejiang Lab filed Critical Zhejiang University ZJU
Priority to CN202310191337.4A priority Critical patent/CN115904394B/en
Publication of CN115904394A publication Critical patent/CN115904394A/en
Application granted granted Critical
Publication of CN115904394B publication Critical patent/CN115904394B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Stored Programmes (AREA)

Abstract

The invention discloses a neural network increment compiling method and device facing a many-core architecture, wherein the method comprises the following steps: step one, obtaining a previous compiling result and restoring the previous compiling result into an intermediate representation structure corresponding to a compiler; step two, respectively carrying out layer-by-layer matching on the previous compiled model and the current compiled model, and identifying the change generated between the previous compiled model and the current compiled model; thirdly, performing incremental analysis on the model network layer generating the change to obtain incremental data, and recursively performing dependency analysis on the layer connected with the network layer generating the change to confirm the recompilation minimum set; analyzing the restored intermediate representation structure, positioning nodes related to incremental modification in the intermediate representation structure, and then performing incremental filling and recompilation; and fifthly, carrying out partial serialization on the intermediate representation structure subjected to incremental filling and recompilation to generate a compiling result file. The invention can reduce the time consumption of repeated compiling optimization and improve the compiling efficiency.

Description

Neural network increment compiling method and device for many-core architecture
Technical Field
The invention relates to the technical field of high-performance computer compiling, in particular to a neural network incremental compiling method and device for a many-core architecture.
Background
The neural network has a complex topological structure and a large number of parameters, and the neurons form a huge network structure through synaptic interconnection. The neural computing chip related to the neural network consists of a plurality of isomorphic cores, the network is divided into a plurality of subgraphs and distributed to each core, and the deployment of the application on the target machine is completed, namely the compiling and mapping process of the network model is completed. To increase the computational efficiency of the machine, the occupied core coordinates need to be adjusted according to the different communication costs between neurons, and the connections which are mutually communication intensive are distributed on cores which are closer to each other, namely compiling optimization. Along with the continuous increase of the size of the neural network, the size of the core array is gradually increased, and the time and the memory cost consumed in the compiling and optimizing process are also continuously increased, so that huge load consumption is brought to the server.
Aiming at the Loihi acceleration chip developed by the A company, an LCompiler compiler is developed, and the aim of solving the multi-core mapping optimization problem is achieved. The Loihi architecture comprises a complete core array, the segmentation and the redistribution are completed on the neural network through the LCompiler, the input mapping efficiency ratio is analyzed, the redundancy bandwidth is reduced through adjusting the core position, and meanwhile, the maximum fan is evaluated to determine the core distribution position, so that the deployment of the model to the Loihi chip is realized. However, the Loihi compiler does not provide incremental compiling optimization, theoretically, any modification to the network needs to be compiled again, and the previous compiling optimization result cannot be reused, so that redundancy time consumption and memory consumption are generated.
When the user actually trains the neural network model, after the design of the network structure is finished in most scenes, the user only uses the newly provided data to complete the training of multiple rounds and only retrains and modifies the network parameters. Under the task scene, the user only modifies the local network layer, adjusts the size or adds a new network layer. These modifications will generally only affect the partial compilation results, and if the complete model is fully recompiled, a significant number of iterations will be introduced. Meanwhile, the optimization result in the previous compilation cannot be utilized in the new compilation, and the optimization algorithm search needs to be rerun, so that the calculation efficiency is greatly wasted, the compilation waiting time of a user is remarkably prolonged, and the system compilation concurrency efficiency is reduced. Therefore, there is a need to design an improved technique for compiling for incremental mode by using historical compiling results according to the local difference between the model front and back.
Disclosure of Invention
In order to solve the technical problems in the prior art, the invention provides a neural network incremental compiling method for a many-core architecture, which determines a minimum set to be compiled by incremental analysis of adjacent twice compiling results, and only efficiently recompiles the set, thereby reducing recompilation time consumption after a user modifies a network, reducing energy consumption and redundant repetitive compiling processes generated by compiling, and improving compiling efficiency, and the specific technical scheme is as follows:
a neural network increment compiling method facing to a many-core architecture comprises the following steps:
step one, obtaining a previous compiling result and restoring the previous compiling result into an intermediate representation structure corresponding to a compiler;
step two, respectively carrying out layer-by-layer matching on the previous compiled model and the current compiled model, and identifying the change generated between the previous compiled model and the current compiled model;
thirdly, performing incremental analysis on the model network layer generating the change to obtain incremental data, and recursively performing dependency analysis on the layer connected with the network layer generating the change to confirm the recompilation minimum set;
analyzing the intermediate representation structure restored in the first step, positioning nodes related to incremental modification in the intermediate representation structure, and performing incremental filling and recompilation by utilizing the incremental data obtained in the third step and the recompilation minimum set;
and fifthly, carrying out partial serialization on the intermediate representation structure subjected to incremental filling and recompilation to generate a compiling result file, and completing the whole incremental compiling process.
Further, the first step specifically comprises: and reading and restoring the previous compiling result into a standard low-level intermediate representation structure in the compiler through an analysis interface by utilizing a compiling result auxiliary file, wherein the compiling result auxiliary file is a data type file for quick decoding, which is additionally generated during each compiling.
Further, the content of the layer-by-layer matching in the second step includes: model structure, weight, configuration parameters.
Further, the third step specifically includes the following substeps:
step 3.1, recording incremental information generated in the matching process of the network layer of the model, recording the incremental information, and calculating to obtain incremental data, wherein the incremental data comprises: data to be newly added and filled and data to be updated and rewritten when the current change occurs;
and 3.2, when the model network layers are matched, performing dependency analysis on the layers with the shapes modified or the layers connected with the new layers after modification, recursively collecting the network layers containing the dependencies by using breadth-first search, and confirming the recompilation minimum set.
Further, the step 3.1 specifically includes: using a differential analysis based on Myers algorithm, recursively determining the type of delta pattern for each network layer, the type ranging from low to high in delta complexity, comprising: zero increment mode, parameter increment mode, tree increment mode, axon increment mode and shape increment mode;
collecting all network layers defined as parameter increment modes, extracting parameter increment mode information, judging distribution positions of increment modes which are rewritten and newly added in an intermediate representation structure, and adding the network layer name character strings into a dictionary as key indexes
Figure SMS_1
Named as parameter increment dictionary, and the parameter increment dictionary is used for completing filling data preparation and rewriting data preparation.
Furthermore, the Myers algorithm introduces three operations of deleting, inserting and increasing, sequentially matches the network layers of the previously compiled model and the currently compiled model according to the complexity of the increasing mode, greedily selects the path with the smallest increment, marks each network layer with the increasing mode, and if a certain network layer accords with a plurality of increasing modes, selects the layer with the most complicated increasing mode on the hierarchy to mark.
Further, the preparing of filling data and preparing of rewriting data are completed by using the parameter increment dictionary, which specifically comprises the following steps:
the preparation of filling data is completed for a network layer in the parameter increment dictionary, and specifically comprises the following steps: when the parameters of the network layer are converted from scalar type to vector type, carrying out additional parameter filling; clearing the address of the original scalar configuration word and generating the address and data pairing of the vector configuration area corresponding to the parameter
Figure SMS_2
The 0 th column is address information, the 1 st column is data information, the N is the data quantity length, and meanwhile, the address overlapping check of the data in different logic areas in the memory is finished to confirm that the filled area and the data to be filled are not overlapped;
the preparation of rewriting data is completed for a network layer in the parameter increment dictionary, and specifically comprises the following steps: when the parameter shape of the network layer is unchanged but the value is changed, extra parameter rewriting is performed, the address and new data of the rewritten parameter are recorded, and the address and data pairing of the vector configuration area corresponding to the parameter is generated
Figure SMS_3
Column 0 is address information, column 1 is data information, and S is the length of the number of data that generates the change.
Further, the zero increment mode satisfies the following conditions: matching the previously compiled model and the currently compiled model into the identical network layer;
the parameter increment mode is that the condition is satisfied: the number of network layer parameters for matching between the previously compiled model and the currently compiled model is changed, the front and back synaptic connection weights of the network layer are changed, and the other characteristics are consistent with the zero increment mode;
the meeting conditions of the tree protrusion increment mode are as follows: the number of neurons of a precursor network layer for matching between the previously compiled model and the currently compiled model is changed, and the dendritic connection structure of the precursor network layer is changed;
the satisfaction conditions of the axonal increment mode are as follows: the number of neurons of a subsequent network layer matched between the previously compiled model and the currently compiled model is changed, and the axon connection structure of the subsequent network layer is changed;
the satisfaction conditions of the shape increment mode are as follows: the number of neurons that match between the previously compiled model and the currently compiled model and the synaptic connections between the network layers that follow the predecessor change.
Further, the step 3.2 specifically includes: for network layers defined as axonal increment mode, tree increment mode and shape increment mode, the network layer name is added to a dictionary in a key manner
Figure SMS_4
As a recompilation set, namely a recompilation minimum set M; after traversing all the non-zero increment network layers, continuing to determine the topological sequence relation between the non-zero increment network layers connected with the non-zero increment network layers by using a breadth-first search algorithm, pressing all the non-zero increment network layers into a newly constructed priority queue candidate one by one according to the access sequence of the breadth-first search algorithm, and calling the priority queue as a zero increment priority queue Q.
Further, in the fourth step, the intermediate representation structure restored in the first step is parsed, the node related to incremental modification in the intermediate representation structure is located, incremental filling is performed by using the incremental data obtained in the third step, and specifically, the method includes: and (3) simultaneously providing index modes of all weights and neuron numbers in the restored intermediate representation structure, recording weight parameters in a matrix format, quickly searching data addresses in cores placed by each network layer, taking index serial numbers of the data as specific address data in the weights to finish extraction, obtaining an address-data mask sequence, using the mask sequence to scan mapping matched pairs of the addresses and the data in the intermediate representation structure in batches, and performing data coverage in the intermediate representation structure to finish incremental filling.
Further, the step four of recompilation specifically includes the following steps:
a. the recompilated minimum set is regarded as a sub-network to be compiled, core resources where a network layer in the sub-network is located are recovered, and binding between the network layer and the core is removed;
b. iteratively selecting the next network layer in the recompilated minimum set to recompile according to the topological ordering
Figure SMS_5
c. Selecting a new core resource group from the core resource pool, binding the neuron to a new computing core, and completing a network layer
Figure SMS_6
Repeating the step, iteratively completing the compiling of all network layers in the recompilated minimum set;
d. checking all the axon connection relations of the network layers in the zero increment priority queue, if the recompiled network layers generate different core allocation schemes, namely the neurons originally on one core are allocated into different cores, the network layers which are originally in zero increment are considered to actually generate additional increment, and the network layers are defined as pseudo-zero increment network layers, namely the additional increment compiling expense is needed, and the data structure of the axon connection area is needed to be readjusted;
e. after finishing the adjustment of a pseudo zero increment network layer, ejecting the pseudo zero increment network layer from the zero increment priority queue Q, and repeating the steps until the rest network layers in the zero increment priority queue Q do not need to be adjusted;
f. clearing core resources corresponding to the recompiled network layer from all computing cores in the intermediate representation structure, and further implementing a mapping algorithm on all newly allocated virtual cores to be matched with actual core resources;
g. and completing the relative coordinate filling of the subsequent cores in the axon connection data block.
Further, the fifth step specifically comprises: firstly, extracting all network layers in a zero increment priority queue, and directly copying corresponding compiling generation files from an original compiling result by using core records distributed by all network layers in the last compiling; and finally, carrying out serialization operation on the adjusted core again to generate a corresponding compiling file, and completing consistency verification on the corresponding compiling file by using the structure and parameters of the new model to complete the whole increment compiling process.
The neural network increment compiling device facing the many-core architecture comprises one or more processors, wherein the processors are used for realizing the neural network increment compiling method facing the many-core architecture.
A computer readable storage medium, on which a program is stored, which when executed by a processor implements the neural network incremental compilation method for a many-core architecture.
The beneficial effects are that:
the invention uses the neural network on-chip core compiling mapping technology based on increment compiling, completes layer-by-layer matching of the front and back compiled neural network models, completes increment mode collection of each layer by using a multi-layer difference matching algorithm, further selects different compiling means, completes increment compiling of a new network with lower compiling cost, thereby reducing time consumption of repeated compiling optimization, improving compiling efficiency, further reducing time consumption in compiling stage, simultaneously guaranteeing compiling result quality, and still maintaining high computing performance on a target machine.
Drawings
FIG. 1 is a schematic flow diagram of a neural network incremental compiling method for a many-core architecture according to the present invention;
FIG. 2 is a schematic diagram of a flowchart of incremental compilation of a neural network for a many-core architecture according to an embodiment of the present invention;
FIGS. 3a and 3b are diagrams showing the comparison of the number of neurons in the network layer in the model according to the embodiment of the present invention;
FIG. 4 is a schematic illustration of incremental filling of a model network layer in accordance with an embodiment of the present invention;
FIGS. 5a to 5f are diagrams illustrating a network layer change process for performing recompilation of a neural network according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a system operation principle of incremental compiling of a neural network facing to a many-core architecture according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a neural network incremental compiling device facing to a many-core architecture according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and technical effects of the present invention more apparent, the present invention will be further described in detail with reference to the drawings and examples of the specification.
As shown in fig. 1 and fig. 2, a neural network incremental compiling method facing to a many-core architecture includes the following steps:
step one, the old compiling result, that is, the previous compiling result is obtained and restored to the intermediate representation structure IR (Intermediate Representation) corresponding to the compiler.
Because the dependency relationship between complex data structures in compiling may be lost, the recovery cost is great, and in order to simplify the recovery process, a data type file capable of being quickly decoded is additionally generated each time of compiling, which may be called as a compiling result auxiliary file, and all information required by the intermediate representation of compiling is contained. The compiling result auxiliary file is input, and a special analysis interface is used for reading and restoring the compiling result auxiliary file into a low-level intermediate representation structure IR of the standard in the compiler, namely, the old model completes the IR result generated by compiling mapping optimization.
Step two, respectively inputting a previously compiled model and a currently compiled model, and performing layer-by-layer matching on the models of the two adjacent times, wherein the step two comprises respectively completing matching of model structures, weights and configuration parameters, and identifying changes generated by compiling the input models of the two adjacent times before and after the model structures, weights and configuration parameters.
The method comprises the steps of using a front-end interface defaulted by a compiler to analyze a model, reading the model into an resolvable high-level IR, and converting a network model description into a graph structure expression in the IR.
Step three, incremental analysis is completed on the model network layer generating the change to obtain incremental data, dependency analysis is recursively completed on the layers connected with the network layer generating the change to confirm the recompilation minimum set, and the method specifically comprises the following substeps:
step 3.1, recording incremental information generated in the matching process of the network layer of the model, recording the incremental information, and calculating to obtain incremental data, wherein the incremental data comprises: newly filled data is required and the data which is changed currently and needs to be updated and rewritten.
According to the increment information, increment modes are obtained, the increment modes are classified, the relative increment complexity of each mode is from low to high, and the increment modes are divided into a zero increment mode, a parameter increment mode, a tree protrusion increment mode, an axon increment mode and a shape increment mode.
Zero increment mode: the previous compiled model and the current compiled model, namely the new model and the old model, are matched into the identical network layer, and the following conditions are met, and the zero increment mode is defined:
the number of neurons with connected network layers before and after the neurons is kept unchanged;
the number of neurons themselves remains unchanged;
the synapse between the network layers which are the predecessor and the successor is full connection, or the synapse between the network layers which are the predecessor and the successor is convolution kernel connection, but the shape parameters of convolution are kept unchanged;
all its own parameters and the weight of the front and back synapses are kept unchanged.
Parameter delta mode: and defining the network layers matched in the new model and the old model as belonging to the parameter increment mode if the following conditions are met:
the parameter quantity of the self-body is changed, and the front and back synapse connection weights are changed;
the remaining characteristics remain consistent with the zero delta mode.
Axonal increment mode: and defining the network layers matched in the new model and the old model as belonging to the axonal increment mode if the following conditions are met:
the number of neurons of its successor network layer changes with the axon connection structure of its successor network layer.
Tree protrusion increment mode: and defining the network layers matched in the new model and the old model as belonging to the tree-bump mode if the following conditions are met:
the number of neurons of the network layer that it is precursor to changes from the dendritic connection structure of the network layer that it is precursor to.
Shape increment mode: the network layer matched in the new model and the old model is called as a shape increment mode if the following conditions are met:
the number of neurons themselves and the synaptic connections between their predecessor network layers vary.
The same network layer may have multiple incremental modes, as shown in FIGS. 3a and 3b, in which the number of neurons in the C layer changes to C At the same time at C Linked dendrites A-C ,B- C And axon C D is changed, but none of the three layers a, B, D is changed, and the increment included in each network layer is as follows:
a: an axon increment, an axon parameter increment;
b: an axon increment, an axon parameter increment;
c: axon increment, dendrite increment, shape increment, parameter increment;
d: a dendrite increment, a dendrite parameter increment;
the rest of the network layers: zero increment.
It can be seen that the more complex delta patterns, the greater the variation they contain, and the more likely multiple delta patterns are simultaneously contained. The increment modes have inclusion relations, such as the increment of the axon and the increment of the parameter on the axon. On the other hand, the simpler the delta pattern, the more stringent its match, the less delta information is generated, while the less the additional cost of completing delta compilation.
The delta pattern type for each network layer is determined recursively using a variance analysis based on Myers algorithm. The Myers algorithm consists in generating a shortest editing script Shortest Edit Script (SES) between the other text sequences of interest, generating instructions for the two input files W and Y, using only both delete and insert operations, while greedily selecting the "shortest and most intuitive" path to complete the conversion of file W to file Y.
In an embodiment of the present invention, a variant of Myers' algorithm is implemented on the old network W and the new network Y, introducing three operations of deletion, insertion and addition simultaneously. When the network layer of W to Y has the above delta, it is marked as delta layer. If deletion and insertion operations are introduced at the same time, a hierarchical matching strategy is introduced according to the matching degree of the Myers algorithm, the paths with the smallest increment are matched in sequence according to the complexity degree of the increment mode, and the increment mode marking is carried out on each network layer. If a network layer accords with a plurality of increment modes, selecting the most complex increment mode on the hierarchy to mark the layer.
Collecting all network layers defined as parameter increment modes, extracting parameter increment mode information, judging distribution positions of increment modes which are rewritten and newly added in an IR structure, and adding the network layer name character strings as key indexes into a dictionary called parameter increment dictionary
Figure SMS_7
Filling data and rewriting data preparation is completed using the parameter delta dictionary.
And completing filling data preparation for the network layer in the parameter increment dictionary. When the parameters of the network layer are changed from scalar type to vector type, additional parameter filling is needed. Clearing the address of the original scalar configuration word and generating the address and data pairing of the vector configuration area corresponding to the parameter
Figure SMS_8
Column 0 is address information, column 1 is data information, +.>
Figure SMS_9
A data quantity length. And meanwhile, address overlapping checking of different logic area data in the memory is completed, so that the filled area and the data to be filled are ensured not to overlap.
For parameter increment dictionary
Figure SMS_10
The network layers in the network layer respectively complete the preparation of rewriting data. When the parameter shape of the network layer is unchanged but the value is changed, additional parameter rewriting is needed, the address of the rewritten parameter and new data are recorded, and the address and data pairing of the vector configuration area corresponding to the parameter are generated>
Figure SMS_11
Column 0 is address information, column 1 is data information, and S is the length of the number of data that generates the change. At this time, the addresses of the parameters are kept consistent with the original compiling result, and only the corresponding replacement is needed.
And 3.2, when the model network layers are matched, performing dependency analysis on the layers with the shapes modified or the layers connected with the new layers after modification, recursively collecting the network layers containing the dependencies by using breadth-first search, and confirming the recompilation minimum set.
For the network layer generating the axon increment, the dendrite increment and the shape increment, namely the network layer defined as the axon increment mode, the dendrite increment mode and the shape increment mode, the network layer name is added into a dictionary P in a key manner to be used as a recompilation set, namely the set is called a recompilation minimum set M. After all non-zero increment network layers are traversed, the zero increment network layers connected with the non-zero increment network layers are needed to be continued, and a breadth-first search algorithm is used for determining the topological sequence relation between the non-zero increment network layers. And pressing all non-zero increment network layers into a newly constructed priority queue one by one according to the access sequence of the breadth-first search algorithm, and calling the priority queue as a zero increment priority queue Q.
And fourthly, analyzing the intermediate representation structure restored in the first step, positioning nodes related to incremental modification in the intermediate representation structure, and performing incremental filling and recompilation by utilizing the incremental data obtained in the third step and the recompilation minimum set.
The index mode of all weights and neuron numbers is provided in the restored intermediate representation structure IR, parameters such as the weights are recorded in a matrix format, data addresses in cores placed by each network layer are quickly searched, the index serial numbers of the data are used as labels to finish extraction of specific address data in the weights, an address-data mask sequence is obtained, mapping matching pairs of the addresses and the data in the IR are scanned in batches by using the mask sequence, and data coverage is completed in the IR.
An embodiment of incremental filling is shown in fig. 4, where the network layer contains 4 convolution kernels, one after the user adjusts the network
Figure SMS_13
The weights of (2) are modified and the remaining three convolution kernels +.>
Figure SMS_15
The weight parameters of the network layer are not changed, and the network layer is determined to be in a parameter increment mode after the increment analysis is completed. The core of the previous compilation allocation in IR was examined and found that the network layer was divided into two parts,/->
Figure SMS_17
And->
Figure SMS_14
Is allocated on core 0, < >>
Figure SMS_16
And->
Figure SMS_18
Is allocated on core 1 and indexed to +.>
Figure SMS_19
The area where the parameter increment occurs is in the address section corresponding to the core 1. New +.>
Figure SMS_12
The weight parameters of the convolution kernel are overlaid on the original address in batches, so that the incremental filling of the network layer is completed, and the incremental compiling process is further completed.
The recompilation is specifically as follows: and sequentially performing recompilation on the model network layers in the confirmed recompilation minimum set, and recursively completing recompilation of the corresponding intermediate representation structure.
The algorithm flow of the recompilation of the partial core is similar to the complete compiling flow, the recompilation minimum set is regarded as a sub-network to be compiled, and the recompilation algorithm comprises the following steps:
a. recovering all core resources where the network layer in the recompilated minimum set is located, and removing the binding between the network layer and the core;
b. iteratively selecting the next network layer in the recompilated minimum set to recompile according to the topological ordering
Figure SMS_20
c. Selecting a new core resource group from the core resource pool, binding the neuron to a new computing core, and completing a network layer
Figure SMS_21
Repeating the step, iteratively completing the compilation of all network layers in the recompilated minimum set M;
d. checking all the axon connection relations of the network layers in the zero increment priority queue, if the recompiled network layers generate different core allocation schemes, namely the neurons originally on one core are allocated into different cores, the network layers which are originally in zero increment are considered to actually generate additional increment, and the network layers are defined as pseudo-zero increment network layers, namely the additional increment compiling expense is needed, and the data structure of the axon connection area is needed to be readjusted;
e. after finishing the adjustment of a pseudo zero increment network layer, ejecting the pseudo zero increment network layer from the zero increment priority queue Q, and repeating the steps until the rest network layers in the zero increment priority queue Q do not need to be adjusted;
f. clearing core resources corresponding to the recompiled network layer from all computing cores in the IR, and further implementing a mapping algorithm on all newly allocated virtual cores to be matched with the actual core resources;
g. and completing the relative coordinate filling of the subsequent cores in the axon connection data block.
So far, all recompilations for the IR are completed, and the information contained by the IR has been updated to be consistent with the new model.
One particular embodiment of recursively completing recompilation for a recompilated minimum set is shown in fig. 5 a-5 f.
After the user has completed compiling the neural network as shown in fig. 5a, and after the neural network has been evaluated and modified, a new network layer is inserted as shown in fig. 5B, and the axon increment and the tree increment are respectively introduced into the network layers before and after the insertion position, so that the three network layers a, B and C are added into the recompilation minimum set. In fig. 5c, recompilation is completed in the recompilation minimum set, respectively, to generate corresponding compiling results, and old compiling result nodes are deleted from the graph. At this time, the rest of the network layers originally defined in the zero increment set are checked, and as shown in fig. 5D, the network layer D and the network layer E are found to be connected to B, and the compiling results originally obtained by D and E are checked, and the connection between E and B is found to have no change in recompilation, and the connection between D and B is modified. Thus, D is a pseudo-zero delta layer, and recompilation is completed on D, as shown in fig. 5e, at this time, the network layer F connected thereto is again checked, and the connection between F and D is found unchanged, so that recompilation is not required. After the old compiling result node is deleted from the graph, the recompilation process is finally completed, and a final compiling result is obtained, as shown in fig. 5 f.
And fifthly, carrying out partial serialization on the intermediate representation structure IR subjected to incremental filling and recompilation, namely carrying out code generation on the intermediate representation structure subjected to incremental modification to generate a compiling result file, and completing the whole incremental compiling process.
Specifically, firstly, extracting all network layers in a zero increment priority queue, and directly copying corresponding compiling generation files from an original compiling result by using core records distributed by all network layers in the last compiling; and finally, carrying out serialization operation on the adjusted core again to generate a corresponding compiling file, and completing consistency verification on the corresponding compiling file by using the structure and parameters of the new model to ensure that compiling information is free of errors and complete the whole increment compiling process.
As shown in fig. 6, the system for incremental compiling of a neural network facing to a many-core architecture provided by the invention comprises:
restoration intermediate representation module: the method is used for restoring the compiling result into an intermediate representation structure corresponding to the compiler;
layer matching checking module: the method comprises the steps of performing layer-by-layer matching on a previously compiled model and a currently compiled model respectively, and identifying changes generated between the previously compiled model and the currently compiled model; the module comprises: the structure modification matching module, the weight modification matching module and the parameter modification matching module are respectively used for matching and checking the model structure, the weight and the configuration parameters;
layer analysis module: performing incremental analysis on the model network layer generating the change to obtain incremental data, and recursively performing dependency analysis on the layer connected with the network layer generating the change to confirm the recompilation minimum set;
the intermediate representation populates the compilation module: the method is used for analyzing the restored intermediate representation structure, positioning nodes related to incremental modification in the intermediate representation structure, performing incremental filling and recompilation by utilizing the incremental data and the recompilation minimum set, and outputting a compiling result.
Corresponding to the embodiment of the neural network increment compiling method facing the many-core architecture, the invention also provides an embodiment of the neural network increment device facing the many-core architecture.
Referring to fig. 7, a neural network incremental compiling device facing to a many-core architecture provided in an embodiment of the present invention includes one or more processors configured to implement a neural network incremental compiling method facing to a many-core architecture in the above embodiment.
The embodiment of the neural network increment compiling device facing the many-core architecture can be applied to any device with data processing capability, and the device with data processing capability can be a device or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 7, a hardware structure diagram of an apparatus with arbitrary data processing capability where a neural network incremental compiling device for a many-core architecture of the present invention is located is shown in fig. 7, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 7, the apparatus with arbitrary data processing capability where the apparatus is located in an embodiment generally includes other hardware according to an actual function of the apparatus with arbitrary data processing capability, which is not described herein again.
The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The embodiment of the invention also provides a computer readable storage medium, and a program is stored on the computer readable storage medium, and when the program is executed by a processor, the neural network incremental compiling method facing the many-core architecture in the embodiment is realized.
The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may also be an external storage device, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Although the foregoing detailed description of the invention has been provided, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing examples, and that certain features may be substituted for those illustrated and described herein. Modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (9)

1. The neural network incremental compiling method for the many-core architecture is characterized by comprising the following steps of:
step one, obtaining a previous compiling result and restoring the previous compiling result into an intermediate representation structure corresponding to a compiler;
step two, respectively carrying out layer-by-layer matching on the previous compiled model and the current compiled model, and identifying the change generated between the previous compiled model and the current compiled model;
step three, incremental analysis is completed on the model network layer generating the change to obtain incremental data, dependency analysis is recursively completed on the layers connected with the network layer generating the change to confirm the recompilation minimum set, and the method specifically comprises the following substeps:
step 3.1, recording incremental information generated in the matching process of the network layer of the model, recording the incremental information, and calculating to obtain incremental data, wherein the incremental data comprises: data to be newly added and filled and data to be updated and rewritten when the current change occurs;
the step 3.1 specifically comprises the following steps: using a differential analysis based on Myers algorithm, recursively determining the type of delta pattern for each network layer, the type ranging from low to high in delta complexity, comprising: zero increment mode, parameter increment mode, tree increment mode, axon increment mode and shape increment mode;
collecting all network layers defined as parameter increment modes, extracting parameter increment mode information, judging distribution positions of increment modes which are rewritten and newly added in an intermediate representation structure, adding the network layer name character strings into a dictionary P as key indexes, naming the dictionary P as a parameter increment dictionary, and finishing filling data preparation and rewriting data preparation by using the parameter increment dictionary;
step 3.2, when the model network layer is matched, carrying out dependency analysis on the layer with the shape modified or the layer connected with the modified layer, recursively collecting the network layer containing the dependency by using breadth-first search, and confirming the recompilation minimum set;
the step 3.2 specifically comprises the following steps: for the network layer defined as the axonal increment mode, the tree increment mode and the shape increment mode, adding the network layer name into a dictionary P in a key manner to be used as a recompilation set, namely a recompilation minimum set M; after traversing all the non-zero increment network layers, continuing to determine the topological sequence relation between the non-zero increment network layers connected with the non-zero increment network layers by using a breadth-first search algorithm, pressing all the non-zero increment network layers into a newly constructed priority queue one by one according to the access sequence of the breadth-first search algorithm, and calling the priority queue as a zero increment priority queue Q;
analyzing the intermediate representation structure restored in the first step, positioning nodes related to incremental modification in the intermediate representation structure, and performing incremental filling and recompilation by utilizing the incremental data obtained in the third step and the recompilation minimum set;
analyzing the intermediate representation structure restored in the first step, positioning nodes related to incremental modification in the intermediate representation structure, and performing incremental filling by utilizing the incremental data obtained in the third step, wherein the method specifically comprises the following steps: the method comprises the steps that an index mode of all weights and neuron numbers is simultaneously provided in a restored intermediate representation structure, weight parameters are recorded in a matrix format, data addresses in cores placed by each network layer are quickly searched, index serial numbers of the data serve as labels to finish extraction of specific address data in the weights, an address-data mask sequence is obtained, mapping matching pairs of the addresses and the data in the intermediate representation structure are scanned in batches by using the mask sequence, data coverage is carried out in the intermediate representation structure, and incremental filling is completed;
the step four of recompilation specifically comprises the following steps:
a. the recompilated minimum set is regarded as a sub-network to be compiled, core resources where a network layer in the sub-network is located are recovered, and binding between the network layer and the core is removed;
b. iteratively selecting the next network layer in the recompilated minimum set to recompile according to the topological orderingl
c. Selecting a new core resource group from the core resource pool, binding the neuron to a new computing core, and completing a network layerlRepeating the step, iteratively completing the compiling of all network layers in the recompilated minimum set;
d. checking all the axon connection relations of the network layers in the zero increment priority queue, if the recompiled network layers generate different core allocation schemes, namely the neurons originally on one core are allocated into different cores, the network layers which are originally in zero increment are considered to actually generate additional increment, and the network layers are defined as pseudo-zero increment network layers, namely the additional increment compiling expense is needed, and the data structure of the axon connection area is needed to be readjusted;
e. after finishing the adjustment of a pseudo zero increment network layer, ejecting the pseudo zero increment network layer from the zero increment priority queue Q, and repeating the steps until the rest network layers in the zero increment priority queue Q do not need to be adjusted;
f. clearing core resources corresponding to the recompiled network layer from all computing cores in the intermediate representation structure, and further implementing a mapping algorithm on all newly allocated virtual cores to be matched with actual core resources;
g. completing the relative coordinate filling of the subsequent cores in the axon connection data block;
and fifthly, carrying out partial serialization on the intermediate representation structure subjected to incremental filling and recompilation to generate a compiling result file, and completing the whole incremental compiling process.
2. The neural network incremental compiling method for a many-core architecture according to claim 1, wherein the first step is specifically: and reading and restoring the previous compiling result into a standard low-level intermediate representation structure in the compiler through an analysis interface by utilizing a compiling result auxiliary file, wherein the compiling result auxiliary file is a data type file for quick decoding, which is additionally generated during each compiling.
3. The neural network incremental compiling method for a many-core architecture according to claim 1, wherein the content of the layer-by-layer matching in the second step comprises: model structure, weight, configuration parameters.
4. The neural network incremental compiling method for the many-core architecture according to claim 1, wherein the Myers algorithm introduces three operations of deletion, insertion and increment simultaneously, sequentially matches network layers of a previously compiled model and a currently compiled model according to the complexity of an incremental mode, greedily selects a path with the smallest increment, carries out incremental mode marking on each network layer, and selects the hierarchically most complex incremental mode marking layer if a certain network layer accords with a plurality of incremental modes.
5. The neural network incremental compiling method for a many-core architecture according to claim 1, wherein the filling data preparation and the rewriting data preparation are completed by using a parameter incremental dictionary, and specifically comprises:
the preparation of filling data is completed for a network layer in the parameter increment dictionary, and specifically comprises the following steps: when the parameters of the network layer are converted from scalar type to vector type, carrying out additional parameter filling;clearing the address of the original scalar configuration word and generating the address and data pairing of the vector configuration area corresponding to the parameterN2, row 0 is address information, row 1 is data information, and N is data number length, and address overlapping check of different logic area data in the memory is completed at the same time to confirm that the filled area and the data to be filled are not overlapped;
the preparation of rewriting data is completed for a network layer in the parameter increment dictionary, and specifically comprises the following steps: when the parameter shape of the network layer is unchanged but the value is changed, extra parameter rewriting is performed, the address and new data of the rewritten parameter are recorded, and the address and data pairing of the vector configuration area corresponding to the parameter is generatedSX 2, column 0 is address information, column 1 is data information, and S is the length of the number of data that generates the change.
6. The neural network incremental compiling method facing to a many-core architecture as claimed in claim 1, wherein the zero incremental mode satisfies the following conditions: matching the previously compiled model and the currently compiled model into the identical network layer;
the parameter increment mode is that the condition is satisfied: the number of network layer parameters for matching between the previously compiled model and the currently compiled model is changed, the front and back synaptic connection weights of the network layer are changed, and the other characteristics are consistent with the zero increment mode;
the meeting conditions of the tree protrusion increment mode are as follows: the number of neurons of a precursor network layer for matching between the previously compiled model and the currently compiled model is changed, and the dendritic connection structure of the precursor network layer is changed;
the satisfaction conditions of the axonal increment mode are as follows: the number of neurons of a subsequent network layer matched between the previously compiled model and the currently compiled model is changed, and the axon connection structure of the subsequent network layer is changed;
the satisfaction conditions of the shape increment mode are as follows: the number of neurons that match between the previously compiled model and the currently compiled model and the synaptic connections between the network layers that follow the predecessor change.
7. The neural network incremental compiling method for a many-core architecture of claim 1, wherein the fifth step is specifically: firstly, extracting all network layers in a zero increment priority queue, and directly copying corresponding compiling generation files from an original compiling result by using core records distributed by all network layers in the last compiling; and finally, carrying out serialization operation on the adjusted core again to generate a corresponding compiling file, and completing consistency verification on the corresponding compiling file by using the structure and parameters of the new model to complete the whole increment compiling process.
8. A neural network incremental compiling device facing to a many-core architecture, comprising one or more processors configured to implement the neural network incremental compiling method facing to a many-core architecture as claimed in any one of claims 1 to 7.
9. A computer readable storage medium, having stored thereon a program which, when executed by a processor, implements a neural network incremental compilation method of a many-core-oriented architecture according to any of claims 1 to 7.
CN202310191337.4A 2023-03-02 2023-03-02 Neural network increment compiling method and device for many-core architecture Active CN115904394B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310191337.4A CN115904394B (en) 2023-03-02 2023-03-02 Neural network increment compiling method and device for many-core architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310191337.4A CN115904394B (en) 2023-03-02 2023-03-02 Neural network increment compiling method and device for many-core architecture

Publications (2)

Publication Number Publication Date
CN115904394A CN115904394A (en) 2023-04-04
CN115904394B true CN115904394B (en) 2023-07-04

Family

ID=86476984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310191337.4A Active CN115904394B (en) 2023-03-02 2023-03-02 Neural network increment compiling method and device for many-core architecture

Country Status (1)

Country Link
CN (1) CN115904394B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112711422A (en) * 2020-12-31 2021-04-27 北京清微智能科技有限公司 Optimization method and system for neural network compiling
CN114492782A (en) * 2022-04-19 2022-05-13 之江实验室 On-chip core compiling and mapping method and device of neural network based on reinforcement learning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10326448B2 (en) * 2013-11-15 2019-06-18 Scientific Concepts International Corporation Code partitioning for the array of devices
CN112529175B (en) * 2020-11-05 2022-03-18 上海交通大学 Compiling method and system of neural network, computer storage medium and compiling device
CN114047919A (en) * 2021-11-04 2022-02-15 腾讯音乐娱乐科技(深圳)有限公司 Compiling method based on file difference, storage medium and electronic equipment
CN115113876A (en) * 2022-06-29 2022-09-27 武汉烽火技术服务有限公司 Incremental compiling method and device based on source code detection
CN115145588A (en) * 2022-07-29 2022-10-04 北极雄芯信息科技(西安)有限公司 Intermediate representation and data generation method based on TVM

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112711422A (en) * 2020-12-31 2021-04-27 北京清微智能科技有限公司 Optimization method and system for neural network compiling
CN114492782A (en) * 2022-04-19 2022-05-13 之江实验室 On-chip core compiling and mapping method and device of neural network based on reinforcement learning

Also Published As

Publication number Publication date
CN115904394A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
CN107077641B (en) Efficient synthesis of probabilistic quantum circuits with back-off
EP0847009A2 (en) Method and apparatus for automatically generating software programs
CN111027703B (en) Quantum circuit query method and device, storage medium and electronic device
CN113064586A (en) Code completion method based on abstract syntax tree augmented graph model
Balakrishnan et al. Efficient exploration of reward functions in inverse reinforcement learning via Bayesian optimization
Kerber et al. Fast Minimal Presentations of Bi-graded Persistence Modules∗
WO2023010916A1 (en) Software automatic repair method and system, electronic device, and storage medium
CN115238899A (en) Quantum program parallel processing method and operating system for superconducting quantum computer
Benelallam et al. Efficient model partitioning for distributed model transformations
CN116341441A (en) Method for optimizing digital logic circuit and related equipment
CN113906450A (en) Quantum circuit simulation
CN115481718A (en) Deep learning graph-calculation integrated optimizer based on simplified computation subset
Vasconcelos et al. Skeleton-based agent development for electronic institutions
CN114692600A (en) Method and system for formal language processing using subroutine graphs
CN115904394B (en) Neural network increment compiling method and device for many-core architecture
CN112527304B (en) Self-adaptive node fusion compiling optimization method based on heterogeneous platform
US20030208727A1 (en) Failure path grouping method, apparatus, and computer-readable medium
CN113076089B (en) API (application program interface) completion method based on object type
Dousti et al. Squash 2: a hierarchical scalable quantum mapper considering ancilla sharing
Cao et al. A scalable and adaptive system to infer the industry sectors of companies: Prompt+ model tuning of generative language models
Bobde et al. Software restructuring models for object oriented programming languages using the fuzzy based clustering algorithm
Agrawal et al. Evolutionary programming for fast and robust point pattern matching
CN112015426A (en) Code management method, device and equipment
Zlokapa et al. Sc19: U: A deep learning approach to noise prediction and circuit optimization for near-term quantum devices
US11915135B2 (en) Graph optimization method and apparatus for neural network computation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant