CN117829242B - Model processing method and related equipment - Google Patents

Model processing method and related equipment Download PDF

Info

Publication number
CN117829242B
CN117829242B CN202410241105.XA CN202410241105A CN117829242B CN 117829242 B CN117829242 B CN 117829242B CN 202410241105 A CN202410241105 A CN 202410241105A CN 117829242 B CN117829242 B CN 117829242B
Authority
CN
China
Prior art keywords
pruning
node
target
model
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410241105.XA
Other languages
Chinese (zh)
Other versions
CN117829242A (en
Inventor
余翀
张银锋
王雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202410241105.XA priority Critical patent/CN117829242B/en
Publication of CN117829242A publication Critical patent/CN117829242A/en
Application granted granted Critical
Publication of CN117829242B publication Critical patent/CN117829242B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application provides a model processing method and related equipment, wherein the method comprises the following steps: acquiring a data processing model to be processed; the data processing model comprises a plurality of network layers; analyzing and processing a network topology structure of the data processing model to obtain a calculation topology graph, wherein the calculation topology graph comprises operation nodes corresponding to a network layer and attribute information of the operation nodes, and the attribute information comprises pruning property; based on the attribute information of the operation nodes in the calculation topological graph, carrying out serialization processing on the data processing model to obtain a serialization result, wherein the serialization result is used for indicating a network layer needing pruning in the data processing model; and pruning the data processing model according to the serialization result. According to the embodiment of the application, the full-automatic pruning processing of the data processing model can be realized, so that the pruning efficiency of the model is improved.

Description

Model processing method and related equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a model processing method and related devices.
Background
With the development of artificial intelligence technology, various intelligent data processing models (e.g., neural network models trained through deep learning techniques) have been applied in various fields, including but not limited to: image recognition, game AI, speech recognition, natural language, and the like. The data processing model brings great change to the production and life of people due to the accurate processing effect of the data processing model, for example, the data processing model can intelligently recommend the content which is possibly interested in the user by deploying the recommendation large model in shopping application. However, the data processing model may have problems such as parameter redundancy, which affects the reasoning speed of the data processing model. The model pruning is used as a model optimization mode, redundant operation quantity of the model can be reduced by pruning the data processing model, but at present, the model pruning direction is mostly analyzed by manual assistance, and a network layer needing to be pruned in the model cannot be automatically analyzed, so that the model pruning efficiency is affected.
Disclosure of Invention
The embodiment of the application provides a model processing method and related equipment, which can realize the full-automatic pruning processing of a data processing model, thereby improving the pruning efficiency of the model.
In one aspect, an embodiment of the present application provides a method for processing a model, where the method includes:
acquiring a data processing model to be processed; the data processing model comprises a plurality of network layers;
Analyzing and processing a network topology structure of the data processing model to obtain a calculation topology graph, wherein the calculation topology graph comprises operation nodes corresponding to a network layer and attribute information of the operation nodes, and the attribute information comprises pruning property;
based on the attribute information of the operation nodes in the calculation topological graph, carrying out serialization processing on the data processing model to obtain a serialization result, wherein the serialization result is used for indicating a network layer needing pruning in the data processing model;
And pruning the data processing model according to the serialization result.
In one aspect, an embodiment of the present application provides a model processing apparatus, including:
the acquisition unit is used for acquiring the data processing model to be processed; the data processing model comprises a plurality of network layers;
The processing unit is used for analyzing and processing the network topology structure of the data processing model to obtain a calculation topology diagram, wherein the calculation topology diagram comprises operation nodes corresponding to a network layer and attribute information of the operation nodes, and the attribute information comprises pruning property;
The processing unit is also used for carrying out serialization processing on the data processing model based on the attribute information of the operation nodes in the calculation topological graph to obtain a serialization result, wherein the serialization result is used for indicating a network layer needing pruning in the data processing model;
And the processing unit is also used for pruning the data processing model according to the serialization result.
In one aspect, an embodiment of the present application provides a computer apparatus, including:
A processor adapted to execute a computer program;
a computer readable storage medium, in which a computer program is stored which, when executed by a processor, implements the model processing method as described above.
In one aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored therein, the computer program being loaded by a processor and executing a model processing method as described above.
In one aspect, embodiments of the present application provide a computer program product comprising a computer program or computer instructions which, when executed by a processor, implement the above-described model processing method.
The model processing method provided by the embodiment of the application can acquire the data processing model to be processed, wherein the data processing model comprises a plurality of network layers, and then, the network topology structure of the data processing model can be analyzed and processed to obtain a calculation topology graph, wherein the calculation topology graph comprises operation nodes corresponding to the network layers and attribute information of the operation nodes, and the attribute information comprises pruning. Therefore, attribute information which is not involved in the data processing model can be obtained through topology automatic analysis, so that reliable processing basis can be provided for model serialization. Further, based on the attribute information of the operation nodes in the calculation topological graph, serialization processing can be performed on the data processing model to obtain a serialization result, and the serialization result is used for indicating a network layer to be pruned in the data processing model. Therefore, through the serialization processing of the model, which network layers in the data processing model need pruning can be automatically analyzed, manual intervention is not needed in the process, and topology analysis and serialization are used as the process of pruning analysis, so that the method has the advantage of high efficiency due to the characteristic of automation, and in addition, the accuracy of pruning analysis can be ensured based on pruning property included in the attribute information. Then, pruning processing can be performed on the data processing model according to the serialization result. Based on the indication of the serialization results, accurate pruning of the data processing model may be performed. Therefore, the whole process from the acquisition of the model to the pruning of the model is a full-automatic process, and no matter which network layers in the data processing model need to be pruned or the data processing model is truly pruned, the automatic processing can be realized according to a fixed flow, so that on one hand, the use threshold and the labor cost of the model pruning can be reduced, and on the other hand, the model pruning efficiency can be improved, and the model optimizing speed is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1a is a block diagram of a model processing system provided in accordance with an exemplary embodiment of the present application;
FIG. 1b is a schematic view of a model pruning scenario provided by an exemplary embodiment of the present application;
FIG. 2 is a flow chart of a model processing method according to an exemplary embodiment of the present application;
FIG. 3a is a schematic diagram of a data processing model according to an exemplary embodiment of the present application;
FIG. 3b is a schematic representation of the results of a model topology analysis provided by an exemplary embodiment of the present application;
FIG. 4 is a flow chart of another model processing method according to an exemplary embodiment of the present application;
FIG. 5a is a grouping schematic of an operational group provided by an exemplary embodiment of the present application;
FIG. 5b is a schematic illustration of a package for an operational group provided by an exemplary embodiment of the present application;
FIG. 5c is a schematic illustration of a pruning simulation provided by an exemplary embodiment of the present application;
FIG. 5d is a schematic diagram of a transparent pruning marking information according to an exemplary embodiment of the present application;
FIG. 6 is a flow chart of a model processing method provided by an exemplary embodiment of the present application;
FIG. 7 is a schematic diagram of a model processing apparatus according to an exemplary embodiment of the present application;
fig. 8 is a schematic structural diagram of a computer device according to an exemplary embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," and the like in this disclosure are used for distinguishing between similar elements or items having substantially the same function and function, and it should be understood that there is no logical or chronological dependency between the terms "first," "second," and "n," and that there is no limitation on the amount and order of execution. The term "at least one" means one or more, and the meaning of "a plurality of" means two or more.
The term "module" or "unit" in the present application refers to a computer program or a part of a computer program having a predetermined function and working together with other relevant parts to achieve a predetermined object, and may be implemented in whole or in part by using software, hardware (such as a processing circuit or a memory), or a combination thereof. Also, a processor (or multiple processors or memories) may be used to implement one or more modules or units. Furthermore, each module or unit may be part of an overall module or unit that incorporates the functionality of the module or unit.
The application provides a model processing scheme, which relates to a model processing system, a method and related equipment. Further, the data processing model can be subjected to serialization processing according to the attribute information of the operation node to obtain a serialization result for indicating a network layer to be pruned in the data processing model, and then pruning processing is performed on the data processing model according to the serialization result. The whole process is a full-automatic process, and manual operation or intervention is not needed, so that on one hand, the use threshold of model pruning can be reduced, and the labor cost is reduced; on the other hand, the pruning efficiency of the model can be improved, so that the model optimization speed is improved.
The model is a mathematical model for describing the objective world, is abstracted from data, and can predict and obtain a result according to input based on mathematical rules found when analyzing a large amount of data. Model optimization is a post-processing method for a model, and comprises the steps of adjusting a parameter set of the model, and particularly comprises model pruning and model quantization, wherein the model pruning judges the importance of parameters in the model by searching an effective judging means, and unimportant parameters (such as weight values) can be cut, so that redundancy of the model is reduced. The scheme provided by the application is a model pruning scheme based on serialization, and a network layer to be pruned in a data processing model can be searched through serialization, so that pruning processing is carried out on the data processing model.
In the present application, the data processing model refers to a mathematical model having prediction ability. By data type, predictive capabilities include, but are not limited to: image recognition capability, voice recognition capability, text processing capability, video processing capability, and the like. For the type of data processing model, several partitioning modes can be included based on different dimensions, as follows (1) - (4):
(1) The data is divided according to the data type of the data processed by the model, and the data processing model can be any one of the following: an image processing model, an audio processing model, a natural language processing model, a multimodal data model, etc. Wherein a multimodal data model refers to a model capable of processing data of multiple modalities, including but not limited to: images, audio, text, etc. The multimodal data model is capable of building two or more data modality feature representations. In the model processing scheme provided by the application, if the data processing model is an image processing model, the pruning efficiency of the image processing model can be improved by automatically analyzing and pruning the image processing model. The input image is processed based on the pruned image processing model, so that the processing speed of the model on the image can be improved, and more efficient prediction can be realized. If the data processing model is an audio processing model, after the audio processing model is automatically pruned according to the scheme, redundant calculation of the audio processing model can be avoided in practical application, so that the processing speed of the audio is improved. If the data processing model is a natural language processing model, after the automatic pruning is carried out on the natural language processing model by the scheme, the processing efficiency of the natural language processing model on various natural language tasks can be improved.
(2) The data processing model may be a large model or a small model, divided by the model size. The large model is a machine learning model with a large number of parameters and a complex structure, and can process mass data and complete various complex tasks such as natural language processing, computer vision, voice recognition and the like. Large models such as a large-scale language model and a very large-scale language model. The large-scale language model refers to a natural language processing model based on a deep learning technology and a large-scale training data set, such as GPT1 (first generation pre-training model); the large-scale language model has large-scale parameters and excellent computing capacity, adopts large-scale text data for pre-training, can learn rich language knowledge and semantic representation in human language, and has excellent performance in tasks such as text generation, dialogue question-answering, machine translation and the like. The very large scale language model includes, but is not limited to, any of the following: GPT4 (fourth generation generated pre-training model), claude2 (a pre-training model), etc. The small model is a model with fewer parameters and shallower layers, and the model has the characteristics of light weight, high efficiency and easy deployment. The scale of the model is smaller than that of a large model, the model structure is simpler than that of the large model, and the data volume used for model training is smaller than that used for large model training. Because the large model often has more parameters and has the problem of higher redundancy of the parameters, the model processing scheme provided by the application can also be applied to the large model, the model redundancy parameters can be reduced by carrying out model pruning on the large model, the redundancy operation amount of the model can be effectively reduced, the model pruning direction is not required to be found by manual analysis through the scheme, but a network layer needing to carry out parameter pruning in the model can be automatically identified, so that the model parameters needing to be pruned are automatically found to carry out pruning operation, thus realizing full-automatic model pruning and further improving pruning efficiency.
(3) According to the mode of model training, the data processing model can be a deep learning model, a reinforcement learning model or other machine learning models, wherein the deep learning model is a model trained by adopting a deep learning technology, and the network structure of the deep learning model can be the structure of a multi-layer neural network; such as a deep-learning neural network, which may be used to cause a computer to process data in a manner inspired by the human brain. The reinforcement learning model refers to a model trained by reinforcement learning technology, and other machine learning models can be models obtained by traditional machine learning training, such as a traditional neural network. If the data processing model is a deep learning model, the scheme can judge whether the network layer (such as a convolution layer and a full connection layer) with parameters for deep learning needs pruning, so that the network layer with the reduced parameters, which has influence on the model precision in a receivable range, is determined as a pruning network, and pruning is carried out on the network layer, so that the model precision is not influenced while the parameter quantity of the deep learning model is reduced.
(4) The data processing model may be a trained model or an untrained model, depending on the degree of completion of model training. If the data processing model is an untrained model, model pruning can be realized by combining the scheme in the model training process, so that model accuracy is improved through training, and meanwhile, the model can be simplified through model pruning, and the reasoning speed of the model is improved. For example, after the training iteration is performed for a preset number of times (for example, 100 times), the training iteration may be performed on the large model as the data processing model to be processed, so as to perform a series of analyses on the data processing model and pruning processing, obtain a pruned data processing model, then, continuously training the pruned data processing model, after the training iteration is performed for a preset number of times, pruning processing may be performed on the trained model again, and so on, until the accuracy and the inference speed of the model reach the expectations, and the target model is obtained and applied to a corresponding service scene, for example, the data processing model is an image processing model, and then, the training iteration may be applied to service scenes such as image recognition, image classification, image search, and the like. If the data processing model is a trained model (e.g., a pre-trained model), then pruning may also be performed according to the scheme provided by the present application, which is a post-processing of the model, optionally, after pruning to obtain a pruned data processing model, retraining may also be performed on the pruned data processing model, so as to adjust model parameters to improve model accuracy.
According to the model processing scheme provided by the application, on one hand, because the model pruning is a full-automatic process, a model structure is not required to be manually analyzed, pruning parameters are not required to be manually configured, so that the use threshold of the model pruning is reduced, and the learning cost of a user on the model pruning is also reduced. Any user given a model may execute the present scheme to implement pruning of the model. On the other hand, the node to be pruned does not need to be searched by depending on pruning configuration information, so that the problem that normal pruning cannot be performed due to configuration errors can be avoided, the accuracy of pruning is ensured, the problem that the accuracy of a model is greatly reduced due to unreasonable pruning configuration can be avoided, and the usability of the model is ensured.
The model processing scheme provided by the application relates to model compression, and the model compression aims to help reduce the size of a model and accelerate model reasoning through a compression technology, so that the cost of the model in storage and calculation is reduced. Model compression typically includes pruning, low rank decomposition, knowledge distillation, etc.; the application relates to pruning in model compression, and model parameters can be reduced through pruning, so that the reasoning speed of a model is accelerated.
The architecture of the model processing system provided by the embodiment of the application will be described with reference to the accompanying drawings.
Referring to fig. 1a, fig. 1a is a schematic diagram of a model processing system according to an exemplary embodiment of the present application. As shown in fig. 1a, the model processing system comprises a computer device 101 and a model database 102; a communication connection is established between the computer device 101 and the model database 102 by wired or wireless means. Wherein the computer device comprises one or more of a terminal device and a server, the terminal device including but not limited to: smart phones, tablet computers, smart wearable devices, smart voice interaction devices, smart home appliances, personal computers, vehicle terminals, smart cameras, and the like, to which the present application is not limited. The present application is not limited with respect to the number of terminal devices. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, but is not limited thereto. The present application is not limited with respect to the number of servers. The model database is used to store at least one data processing model and may include, but is not limited to, one or more of the following categories: an image processing model, a natural language processing model, an audio processing model, or a multimodal processing model. The model database 102 may be a local database of the computer device 101 or a cloud database that can establish a connection with the computer device 101 according to the deployment location division. According to the attribute division, the model database 102 may be a public database, i.e., a database that is open to all computer devices; but may also be a private database, i.e., a database that is open only to specific computer devices, such as computer device 101.
The computer device 101 may execute the model processing method provided by the embodiment of the present application, and the specific flow of the model processing method includes the following (1) - (4):
(1) A data processing model is acquired.
In one embodiment, computer device 101 may look up a data processing model from model database 102 in response to a model acquisition request. In a specific implementation, if a user has a pruning requirement on a certain data processing model, a model identifier of the certain data processing model may be selected and confirmed, and the computer device 101 may generate a model acquisition request according to the confirmation instruction, where the model acquisition request includes the model identifier, and the model identifier is, for example, a model name, a model ID, or a model code. The computer device 101 may look up a data processing model from the model database 102 according to the model identification and take the found data processing model as the data processing model to be processed. In another embodiment, the computer device 101 may also obtain the data processing model from other computer devices, for example, given a storage address of the data processing model, the data processing model may be obtained from a corresponding server based on the storage address.
(2) The network topology of the data processing model is analyzed.
In a specific implementation, since multiple network layers are included in the data processing model, and each network layer may be used to implement an algorithmic operation, the network layers may be understood to be operators. The computer device 101 may extract the network topology of the data processing model to obtain a computational topology. The computational topology includes an operational node representing a network layer in the data processing model. The computational topology is used to describe the network topology of the data processing model, and the operational nodes may also be referred to as computational nodes, since the network layers represented by the operational nodes are used to implement the corresponding algorithmic operations. The computer device 101 may also extract attribute information of the operation node and add to the calculation topology, thereby including the attribute information of the operation node in the calculation topology, the attribute information including at least pruning.
It can be understood that for any type of data processing model, the data processing model can be uniformly converted into a calculation topological graph through extraction of a network topological structure, which is a fixed and identifiable format, and in the process of pruning the model, a pruning processing flow is added on the basis of a serialization operation aiming at the format of the fixed structure, so that the pruned data processing model can be obtained. Therefore, model analysis is not needed to be performed manually, model analysis can be performed automatically, and analysis efficiency is improved.
(3) And serializing the data processing model.
In a specific implementation, the computer device may perform serialization processing on the data processing model according to the attribute information of the operation node obtained by the analysis, so as to obtain a serialization result to indicate a network layer to be pruned in the data processing model. This step may also be referred to as model serialization for short, and in one embodiment may include model grouping and operation group wrapping, where model grouping refers to grouping operation nodes representing a network layer to obtain a plurality of operation groups, each operation group including at least one operation node. The pruning among the operation groups can be prevented from influencing each other by grouping, and the accuracy of model pruning can be ensured. The operation group packaging refers to adding pruning control variables to the pruning operation group to simulate pruning effect so as to determine proper pruning marking information to identify a network layer to be pruned.
(4) Pruning data processing model.
And pruning can be performed on the network layer to be pruned in the data processing model according to the indication of the serialization result. In a specific implementation, the network layer to be pruned indicated by the serialization result is a network layer including network parameters, and the computer device 101 may determine, from the data processing model, the network layers to be pruned as target network layers according to the serialization result, then determine a parameter layer to be pruned in the target network layers, and prune parameters included in the parameter layer to be pruned, thereby reducing the number of model parameters and achieving the model pruning effect.
In the model processing system provided by the application, the data processing model can be automatically analyzed and serialized by the computer equipment, so that the pruning analysis of the model is realized, and the pruning processing is performed based on the result obtained by the pruning analysis. The pruning machine is a full-automatic process, manual intervention is not needed, and pruning treatment efficiency can be improved. The application can be applied to the realization of auxiliary model pruning, can solve some problems existing in the model pruning, can improve the use efficiency of the model pruning and reduce the use threshold of the model pruning aiming at the scene of pruning the data processing model. For example, if a deep learning model needs to be applied online, before online, the network layer to be pruned can be automatically searched through the scheme, and pruning processing is performed on the network parameters of the network layer to be pruned to optimize the model.
The application also breaks the technical barriers between algorithm engineers and engineers responsible for model optimization. This is because, from the point of view of the product line, the model pruning scene as shown in fig. 1b is designed and implemented through a series of designs, and the algorithm engineer finally trains to obtain a model, but since the progress of the model is usually focused, the performance of the model cannot be estimated relatively, so that there is a problem of application when the model is actually deployed: ① The model is too large, resulting in a deployed model that consumes a significant amount of resources. ② The parameter amount of the model is larger, and in order to achieve a better effect when the model is designed, a conservative mode is adopted to use more parameters to obtain more characteristics of the data. This results in problems of too slow model reasoning speed and too long reasoning time when AI is actually applied. Aiming at the problems, model pruning is used as one of the directions of model optimization, and the effect of accelerating model reasoning can be achieved by reducing model parameters; because of the technical barriers between the algorithm engineers and the engineers responsible for model optimization, other personnel are needed to manually participate in model pruning so as to realize model optimization, but the network layer to be pruned can be automatically searched for to prune before the model is online through the scheme, so that the engineers without model optimization need to analyze the designated model, and the pruning optimization can be automatically carried out after any model is trained or in the training process through the scheme, thereby avoiding the study cost of manual pruning and the time consumption of manual analysis, and the model pruning efficiency is high. Furthermore, the size of the model can be reduced through model pruning, so that the reasoning speed of the model in practical application can be improved.
The model processing method provided by the embodiment of the application is described next.
Fig. 2 is a schematic flow chart of a model processing method according to an exemplary embodiment of the application. The model processing method may be performed by a computer device, such as the computer device 101 in fig. 1a, and may comprise the following steps S201-S204.
S201, acquiring a data processing model to be processed.
The data processing model comprises a plurality of network layers; the network layer (layer) is a network structure in the data processing model for implementing algorithmic operations (which may also be referred to as computational operations or model operations). For example, the data processing model may include the following network layers: full connection layer (Dense layer), convolution layer (Conv layer), dimension conversion layer (Reshape, flatten layer, for example), activation layer (Relu, elu layer, for example), union layer (Concatenate layer, for example), etc.; the algorithm operation is, for example, a convolution calculation operation implemented by a convolution layer, an addition operation implemented by an addition layer, and a conversion operation in which a multi-dimensional input is subjected to a unidimensional conversion operation by a flat layer. The algorithmic operations have corresponding computational logic, so the network layer in the present application can also be understood as computational units, which may also be referred to as operators (OP for short). The operators can be calculated according to the corresponding calculation logic, for example, the convolution layer is an operator, and the weight summation process in the full-connection layer can also be an operator.
At least one of the plurality of network layers has network parameters, which refer to parameters used by the network layer in implementing the algorithm operations, including, but not limited to: weight parameters, bias parameters, etc. For example, in a neural network model, there are multiple network layers for deep learning with parameters, such as fully connected layers, convolutional layers, etc., all with weight parameters. Depending on whether there are network parameters, the network layers can be divided into the following two categories: a first computation operator and a second computation operator. The first computation operator is a computation operator that does the computation purely without network parameters, while the second computation operator can compute according to the corresponding computation logic, with network parameters. For example, the data processing model is a neural network model, which includes an Add layer and a Dense layer, and the Add layer only needs to perform addition calculation on the input data, so that the Add layer is a first calculation operator, while the Dense layer needs to perform calculation on the input data and weight parameters, bias parameters, and the like of the Add layer, so that the Dense layer is a second calculation operator. It should be noted that, for the first calculation operator, a non-pruning network may be set, so that pruning processing is not performed on the parameters of the first calculation operator in the actual pruning process.
The plurality of network layers in the data processing model are connected in sequence, and based on the connection sequence between the network layers, the plurality of network layers can form a corresponding network topology structure so as to realize data processing. The data processing model may include sequentially connected network layers and network layers having a parallel relationship. Illustratively, in the schematic structure of the data processing model shown in fig. 3a, the data processing model in fig. 3a includes a network layer as follows: four full connection layers (Dense_1, dense_2_1, dense_2_2, dense_3), an addition layer (Add), a flattening layer (flat) and an input layer (input). The structure between the several network layers is: the input layer, the flat layer and the Dense_1 layer are sequentially connected, the Dense_2_1 layer and the Dense_2_2 layer are parallel, the Dense_2_1 layer and the Dense_2_2 layer are sequentially connected with the Add layer, and the Add layer and the Dense_3 layer are sequentially connected. Based on the network structure, the data processing sequence is as follows: the input parameter is processed by the flat layer, the output flows to the Dense_1 layer, the Dense_1 layer calculates based on the output of the flat layer, the obtained results flow into the Dense_2_1 layer and the Dense_2_2 layer respectively, the outputs of the Dense_2_1 layer and the Dense_2_2 layer are added by the Add layer respectively, and the added result is input into the Dense_3 layer, so that the final calculation result is obtained. There is a data dependency between the sequentially connected network layers because the algorithmic operation of one network layer needs to wait until the computation of another network layer or layers is completed, as there is a data dependency between the above-mentioned flat layer and the Dense_1 layer, among the sequentially connected network layers.
S202, analyzing and processing the network topology structure of the data processing model to obtain a calculation topological graph.
The calculation topological graph comprises operation nodes corresponding to the network layer and attribute information of the operation nodes. In a specific implementation, the network layers in the data processing model are connected in sequence to form a corresponding network topology structure, and the network parameters of the network layers and the network layers are stored in the data processing model, so that the network layers which need pruning cannot be analyzed directly through the information. Therefore, the application can increase some information for analyzing the pruning of the network layer through the model topology analysis (namely analyzing the network topology structure of the data processing model), thereby realizing automatic pruning analysis.
In one implementation, the network topology of the data processing model may be extracted by creating a Graph (Graph) form to obtain a computation topology, which may be understood as a computation Graph model, i.e. a model representing the computation process in the form of a Graph; the method can be used for describing the network topology structure of the data processing model, can save system overhead, improve the utilization rate of resources and perform operation more efficiently. Specifically, the computing topological graph comprises a plurality of operation nodes and at least one directed edge, each operation node can be used for representing one network layer in the data processing model, and each network layer corresponds to the corresponding operation node one by one; any directed edge is used for connecting two operation nodes, and the direction of the directed edge is the flow direction of data output by the operation nodes.
In extracting the network topology, some topology information may be computed that is subsequently needed, including but not limited to: an operation node for representing an operator, an edge for describing data dependencies, etc. In addition, the attribute information of the operation node and the attribute information of the edge can be extracted and added into the calculation topological graph, so that the attribute information of the operation node and the attribute information of the edge are included in the calculation topological graph. Alternatively, a graph database (e.g., memory-based graph database networkx) may be employed to store the computed topology and the analyzed attribute information.
The attribute information of the operation node includes at least pruning information of the operation node, which is a characteristic that is included in the pruning process, and may be also referred to as a pruning characteristic. Pruning information includes pruning nature and pruning transitivity, the pruning nature being used to indicate whether pruning is allowed or not, and the pruning transitivity being used to indicate whether pruning is allowed or not cross-layer propagation is allowed. For example, if the pruning transitivity of a certain operation node indicates that the pruning of the operation node allows cross-layer propagation, then other operation nodes separated from the cross-layer of the operation node may have the same pruning property. For different types of operation nodes, the pruning characteristics are different in the pruning process, and the information is not related in the data processing model.
The attribute information of the edges is used for recording the relevance between the operation nodes, and through the relevance, pruning performance between the operation nodes can be synchronized and updated so as to analyze the network layer needing pruning in the data processing model. Optionally, the attribute information of the edge includes, but is not limited to: input node identification, output data type, number of input nodes, number of output nodes, etc. Wherein each operational node has a node identification, such as a node ID or node name, etc.
FIG. 3b is a schematic diagram of a model topology analysis result provided by an exemplary embodiment of the present application. Based on the data processing model shown in fig. 3a, the data processing model shown in fig. 3b (1) can be converted into a calculation topological diagram shown in fig. 3b (2) through analysis, and the calculation topological diagram can be regarded as a mapping of a network topological structure of the data processing model, and can store other attribute information for pruning analysis, so that a network layer needing pruning can be identified by using the attribute information. In the computational topology shown in fig. 3b (2), the operation node marked 0 represents the input layer (input) in the data processing model, the operation node marked 1 represents the flat layer in the data processing model, the operation node marked 2 represents the Dense_1 layer in the data processing model, and so on, the operation node marked 6 represents the Dense_3 layer in the data processing model, and it is seen that the network topology of the data processing model can be abstracted to achieve more efficient processing. By means of topology extraction processing, it is possible to automatically identify which network layers in the data processing model have dependencies between themselves, so that the data processing model can be automatically analyzed by computer equipment instead of the work of a person having model compression experience. It can be understood that, for the analysis process of the network topology structure of the data processing model, it can be understood as a conversion of the network topology structure, by converting the data processing model into a calculation topology diagram describing the network topology structure, on one hand, the network topology structure can be obtained to improve the processing efficiency, and on the other hand, pruning information of the operation node can be extracted for subsequent pruning analysis.
S203, carrying out serialization processing on the data processing model based on the attribute information of the operation nodes in the calculation topological graph to obtain a serialization result.
In the application, the serialization result is used for indicating a network layer needing pruning in the data processing model. Pruning refers to pruning of model parameters, and a network layer needing pruning in the data processing model can be called a pruning network, which is a network layer needing parameter pruning in the data processing model. The parameter layer to be pruned in the pruning network may be referred to as a pruning layer. The non-pruning network is a network layer which prohibits parameter pruning in the data processing model, and the non-pruning layer is a parameter layer which prohibits pruning, and is a basic guarantee of model precision. Based on the above, during the serialization processing, two parameter layers close to the input and output of the model can be automatically set as non-pruning layers, so that the decline of the model precision can be reduced. This is because: the parameters close to the input layer of the model comprise important parameters required by the model in executing the processing, and if pruning is carried out, the overall accuracy of the model is affected; the model output is because some outputs have a fixed dimension requirement, and if pruning affects the output result, the accuracy of the model output is affected.
During serialization processing, the influence value (such as the model precision reduction size) of the reduction of the network parameters contained in the network layers in the data processing model on the model precision can be calculated through a pruning algorithm, the influence of the reduction of the network parameters contained in the network layers on the model precision is smaller or has little influence on the model precision, and therefore the network layers can be used as the network layers needing pruning. Therefore, the network layer to be pruned indicated by the serialization result is a network layer with an influence value on the accuracy of the data processing model smaller than a preset influence threshold; and, the network layers to be pruned indicated by the serialization are all network layers with network parameters.
In one implementation, the attribute information of the operation nodes includes pruning, and in the serialization processing, the computer device may first combine the associated operation nodes according to the dependency relationship between the operation nodes to obtain an operation group, and then analyze whether the corresponding network layer needs to be pruned according to the pruning performance of the operation nodes by taking the operation group as a unit, so as to determine the network layer needing pruning in the data processing model. This also corresponds to a serialization of the data processing model; the serialization of the data processing model may also be understood as a network structure search of the data processing model, where the purpose of the network structure search is to automatically find the network layer in the data processing model that needs pruning. The automatic network search is carried out on the model based on the serialization processing, which network structure to be pruned in the data processing model can be automatically found, so that more accurate pruning processing can be carried out.
S204, pruning is carried out on the data processing model according to the serialization result.
In a specific implementation, network parameters to be reserved in a network layer to be pruned can be determined according to an indication of a serialization result; and then, carrying out reconstruction processing on the network layer indicated by the serialization result to obtain a reconstructed network layer, and copying the network parameters to be reserved into the reconstructed network layer, wherein the reconstructed network can replace the original network layer in the data processing model, thereby completing pruning processing of the data processing model.
The model processing method provided by the embodiment of the application can acquire the data processing model to be processed, wherein the data processing model comprises a plurality of network layers, and then, the network topology structure of the data processing model can be analyzed and processed to obtain a calculation topology graph, wherein the calculation topology graph comprises operation nodes corresponding to the network layers and attribute information of the operation nodes, and the attribute information comprises pruning. Therefore, attribute information which is not involved in the data processing model can be obtained through topology automatic analysis, so that reliable processing basis can be provided for model serialization. Further, based on the attribute information of the operation nodes in the calculation topological graph, serialization processing can be performed on the data processing model to obtain a serialization result, and the serialization result is used for indicating a network layer to be pruned in the data processing model. Therefore, through model serialization, which network layers in the data processing model need pruning can be automatically analyzed, manual intervention is not needed in the process, and topology analysis and serialization are used as the process of pruning analysis, so that the method has the advantage of high efficiency due to the automation characteristic, and in addition, the accuracy of pruning analysis can be ensured based on pruning property included in the attribute information. Then, pruning processing can be performed on the data processing model according to the serialization result. Based on the indication of the serialization results, accurate pruning of the data processing model may be performed. Therefore, the whole process from the model acquisition to the model pruning is a full-automatic process, and whether analysis of pruning is needed for network layers in the data processing model or real pruning processing is carried out on the data processing model, automation can be realized according to a fixed flow, so that on one hand, the use threshold and labor cost of model pruning can be reduced, and on the other hand, the model pruning efficiency can be improved, and therefore, the model optimizing speed is improved.
Referring to fig. 4, a flowchart of another model processing method according to an exemplary embodiment of the present application is shown. The model processing method may be performed by a computer device, such as the computer device 101 in fig. 1a, and may comprise the following steps S401-S406.
S401, acquiring a data processing model to be processed.
S402, analyzing and processing the network topology structure of the data processing model to obtain a calculation topological graph.
In a specific implementation, the implementation of S402 may include what is shown in (1) - (2) below.
(1) And extracting and processing the network topology structure of the data processing model to obtain a calculation topology graph.
Mapping processing can be performed on each network layer in the data processing model respectively to obtain a plurality of operation nodes, and each operation node is used for representing one network layer in the data processing model; generating at least one directed edge according to the connection relation between each network layer in the data processing model, wherein each directed edge is used for connecting two associated operation nodes; and obtaining a calculation topological graph according to the plurality of operation nodes and at least one directed edge. Thus, the calculation topology map includes: a plurality of operational nodes and at least one edge. In the process of extracting the network topology structure, besides calculating the topology information required by the subsequent process, useless information in the analysis process can be filtered out, for example, useless parameters irrelevant to a calculation operator in TensorFlow can be filtered out, and parameters relevant to the calculation operator, such as weight parameters, can be reserved in the calculation topology graph.
(2) And carrying out attribute extraction processing on each operation node in the calculation topological graph to obtain attribute information of each operation node.
In one implementation, the attribute extraction process for the operational node may include: the node type is identified and pruning information is determined, and the attribute information of the operation node extracted in this way comprises the node type and the pruning information. Based on this, a specific implementation of the attribute extraction process performed on the operation node may include the following steps 1.1 to 1.3.
And 1.1, performing type identification processing on each operation node according to a network layer represented by each operation node in the calculation topological graph to obtain the node type of each operation node.
For any operation node, the computer device may first identify the operation type of the algorithm operation implemented by the network layer represented by the operation node, and then determine the operation type of the network layer as the node type of the operation node. In general, the node type of an operational node may be identified according to the function of the network layer represented by the operational node, including, but not limited to: a Dense node (fully connected node, corresponding to fully connected layer), a Conv node (convolution operation node, corresponding to convolution layer), an Add node (addition operation node, corresponding to addition layer), a Concatenate node (joint operation node, corresponding to joint layer), a Reshape node (dimension conversion node, corresponding to dimension conversion layer), and the like.
And 1.2, determining pruning information of each operation node according to the node type of each operation node according to the corresponding relation between the node type and the pruning information.
For the operation nodes of different node types, the pruning characteristics are not identical in the pruning process. An operational node (e.g., compute node) of a node type, for example, allows pruning, while an operational node (e.g., input node) of a node type prohibits pruning. The node types and the pruning information have corresponding relations, and the corresponding relations can indicate the pruning information of the corresponding node types; for this, pruning information of the operation node may be determined according to the node type of the operation node, and the pruning information includes pruning property and pruning transitivity. Pruning refers to whether pruning is allowed or not, pruning transitivity refers to whether pruning allows cross-layer propagation or not. For example, the operation node is an input node, and it may be determined that the pruning performance and pruning transitivity of the operation node are both "unable", so as to indicate that the network layer represented by the operation node is prohibited from being pruned, and that the pruning performance is prohibited from propagating across layers; if set to "enable," it indicates that the network layer represented by the operational node is allowed to be pruned, and that pruning allows cross-layer propagation. In the data processing model, since input and output dimensions of some network layers do not change, for example, a plurality of unchanged network layers (such as reshape, relu layers and the like) are separated between two network layers, but the network layers are not really pruning layers, in the case that the network layers can prune, the network layer really needing pruning can be found through pruning transitivity.
The pruning information may include other characteristics besides the two characteristics of pruning property and pruning transitivity, such as pruning dimension requirements, for example, input and output dimensions are required to be consistent when pruning is performed, or output dimensions are fixed.
And 1.3, adding the node type of each operation node and pruning information of each operation node into attribute information of the corresponding operation node.
In a specific implementation, the attribute information of the operation node may include the node type and pruning information by adding the node type and pruning information (including pruning property and pruning transitivity) of the operation node to the attribute information of the operation node. Alternatively, in addition to extracting node type and pruning information of the operation node, the computer device may extract and add other attributes of the operation node to the attribute information, and the other attributes may include: node name, ID of the output edge, etc., so that the attribute information may further include attributes such as node name, output edge identification of the operation node, etc.
The implementation manner of extracting the attribute of the operation node in the steps 1.1-1.3 can analyze the node type of the operation node, and determine the pruning information of the operation node based on the node type through the corresponding relation between the preset node type and the pruning information, so that the node type and the pruning information are added into the attribute information, and pruning information which is not originally available in the data processing model can be obtained, thereby being beneficial to accurately carrying out sequential processing on the data processing model and improving the pruning accuracy.
In one embodiment, in addition to performing attribute extraction processing on each operation node in the computation topology, attribute extraction processing may be performed on each edge in the computation topology, so as to obtain attribute information of each edge. Based on the attribute information of each edge, pruning property between the operation nodes can be synchronized and updated to ensure accuracy of the pruning property of the operation nodes. Further, attribute information of the operation node may be added to the computation topology, and attribute information of an edge may be added to the computation topology, thereby obtaining the computation topology with the attribute information.
In one embodiment, the number of operation nodes includes a plurality, and the pruning property of each operation node is used to indicate whether the corresponding operation node is allowed to be pruned, and when the computer device performs serialization processing on the data processing model based on the attribute information of the operation node in the calculation topology map, the computer device may specifically perform serialization processing on the data processing model based on the pruning property of the operation node, including the following steps S403-S405.
S403, grouping the operation nodes according to the dependency relationship among the operation nodes in the calculation topological graph to obtain K operation groups, wherein K is a positive integer.
In the pruning process, network parameters among a plurality of network layers included in the data processing model may affect each other, and such an effect may cause an error in the pruning process, for example, when performing a dot product calculation on two parameter matrices a and B output by different network layers, a corresponding calculation formula is as follows: . Wherein, two parameter matrices are respectively as follows:
assuming that a is pruned and B is not pruned in the pruning process, the matrix size of the parameter matrix a becomes after pruning processing The matrix size of parameter matrix B is still/>At this time, the point multiplication calculation formula is no longer established, so that an error in the pruning process can be caused.
Based on such effects, the dependency relationships existing between the operation nodes are a kind of computation dependency affecting the execution of other nodes, and such dependency relationships affect the execution of the operators represented by the operation nodes. The dependency relationship may be used to indicate an association between operation nodes, so that in the pruning process, for a computation operation (such as a dot product computation operation) performed on the output of such operation nodes with computation dependencies, all input multiple parameter matrices need to be pruned, that is, all operation nodes with computation dependencies need to be pruned. Therefore, the application provides the concept of an operation Group (Group), and the operation nodes with the dependency relationship can be combined through the operation Group, so that the operation nodes with the related characteristics can be combined together for pruning, and the accuracy of the pruning process is ensured.
In one implementation, a dependency relationship is used to indicate a computation dependency, which refers to the existence of a dependency between the outputs of at least two operational nodes involved in a certain computation operation. The computer device, when executing S403, may: firstly, each operation node in the calculation topological graph is respectively initialized to obtain each initial operation group, and each initial operation group comprises one operation node. That is, each operation node can be divided into one operation group in advance by initialization. And then, according to the calculation dependence among the plurality of operation nodes, carrying out combination processing on related initial operation groups in each initial operation group to obtain K operation groups. Specifically, each initial operation group can be traversed, relevance searching is carried out according to attribute information of operation nodes included in the traversed initial operation groups, the initial operation group with calculation dependence on the operation node can be searched from a plurality of initial operation groups, and if the initial operation group is searched, the searched initial operation group and the traversed initial operation group are combined, so that the associated operation nodes are combined into one operation group; if not, the traversed initial operation group can be used as an operation group.
Illustratively, fig. 5a is a schematic diagram of grouping of an operation group according to an exemplary embodiment of the present application, and since the grouping of operation nodes is also equivalent to the grouping of network layers in the data processing model, the description is based on the original data processing model for the sake of understanding. The data processing model shown in fig. 5a (1) may be serialized into the operation group shown in fig. 5a (3), the data processing model is a neural network model, and as known from the above-mentioned division rule, each node may be initially represented as an operation group (shown in fig. 5a (2)) through the defined operation group, and then the associated initial operations may be combined to obtain the operation group, so that a neural network calculation map of multiple nodes may be divided into multiple operation groups, and pruning between the operation groups may not be affected. The input layer may be set to a non-pruned layer when generating the operation nodes so that it may not be counted into the operation group, because the parameter layers near the input layer and the output layer may be automatically set to a non-pruned layer so that the operation nodes corresponding to these network layers may not perform the initialization process when initializing the respective operation nodes.
In another implementation manner, after obtaining the calculation topological graph and the corresponding attribute information, traversing each operation node in the calculation topological graph, then searching the operation node associated with the current operation node according to the calculation dependence among the operation nodes for the traversed current operation node, and if so, merging the searched operation node and the current operation node to obtain an operation group, wherein the operation group comprises at least two operation nodes; if not, generating an operation group according to the current operation node, wherein the operation group comprises an operation node. Thus, for each operation node in the computation topology, the operation nodes can be divided into corresponding operation groups, thereby obtaining at least one operation group.
By grouping the operation nodes, at least one operation group can be obtained, and each operation group comprises at least one operation node. Specifically, the plurality of operation nodes may include independent operation nodes, where independent means that pruning the network layer represented by the operation nodes alone does not cause errors in computing operations performed by other operation nodes, so that the operation nodes may individually form an operation group, where an operation node is included in the operation group. The plurality of operation nodes may include two or more operation nodes having a relationship, and the operation nodes may have the same upstream node and downstream node, so that the operation nodes have a dependency relationship therebetween. These operation nodes having a dependency relationship may be combined into one operation group, so that two or more operation nodes are included in the operation group.
Therefore, by introducing the concept of the operation group, the application can decompose a large data processing model (such as a deep learning model) into a plurality of small operation groups, so that pruning among the operation groups is not mutually influenced, namely, network layers represented by operation nodes in different operation groups are not mutually influenced in the pruning process, and therefore, the search of a pruning network can be more accurate and efficient based on the operation groups.
S404, selecting a target operation group from the K operation groups according to pruning performance of the operation nodes included in each operation group.
In one implementation, the computer device, when executing S404, may perform as follows: traversing K operation groups, and when traversing to a kth (K E [1, K ]) operation group, wherein the kth operation group comprises operation nodes allowing pruning, determining the kth operation group as a target operation group; as can be seen, the target operation group refers to an operation group in which operation nodes that allow pruning exist; stated another way, the target operation group includes operation nodes having target pruning property for indicating that the operation nodes are allowed to be pruned. After traversing the K operation groups, at least one target operation group can be obtained. I.e. the number of target operation groups may be at least one, the same operation may be performed for each target operation group. For ease of understanding, the following description will take a target operation group as an example.
In another implementation, the computer device, when executing S404, may perform steps 2.1-2.2 as follows: step 2.1, according to the pruning performance of the operation nodes included in each operation group, carrying out pruning performance analysis on the corresponding operation group to obtain the pruning performance of each operation group; and 2.2, selecting an operation group which is allowed to be pruned from K operation groups as a target operation group according to pruning property of each operation group.
In one specific implementation, the pruning analysis for each operational group may include the following: traversing K operation groups, and taking the traversed operation groups as the ith operation group, wherein i is E [1, K ]; if the pruning performance of at least one operation node in the ith operation group is used for indicating that the corresponding operation node is allowed to be pruned, generating a first pruning performance, and determining the first pruning performance as the pruning performance of the ith operation group, wherein the first pruning performance is used for indicating that the ith operation group is allowed to be pruned; if the pruning performance of each operation node included in the ith operation group is used for indicating that the corresponding operation node is forbidden to be pruned, generating a second pruning performance, and determining the second pruning performance as the pruning performance of the ith operation group, wherein the second pruning performance is used for indicating that the ith operation group is forbidden to be pruned; after traversing the K operation groups, pruning performance of each operation group is obtained.
Based on the logic of the pruning analysis, for any operation group, if at least one operation node allowing pruning exists in the operation group, the pruning of the operation group can be determined to be the first pruning, and if the pruning of all operation nodes included in the operation group indicate that pruning is forbidden, the pruning of the operation group can be determined to be the second pruning. Optionally, the operation groups may be traversed in series or in parallel, and in parallel, the pruning performance of at least two operation groups may be analyzed in parallel, so as to improve the pruning performance determining speed of the whole operation group. After the pruning performance of the K operation groups is determined, the pruning performance of each operation group is determined, and further, subsequent analysis processing can be performed by using the pruning performance of the operation groups.
The selection logic of the target operation group described in the above steps 2.1-2.2 analyzes the pruning performance of the operation group based on the pruning performance of the operation nodes included in the operation group, and determines the pruning performance in units of the operation group without traversing the operation nodes in the operation group to find the pruning performance of the operation nodes, so that the target operation group can be selected more quickly according to the pruning performance of the operation group. Further, since the target operation groups are operation groups allowing pruning, network layers needing pruning in the data processing model can be searched based on the target operation groups.
The pruning property of the operation group is used for indicating whether the operation group is allowed to be pruned; optionally, the pruning property of the operation group is a first pruning property or a second pruning property. For any operation group, if the pruning performance of the operation group is the first pruning performance, the operation group is determined to be a target operation group if the pruning performance of the operation group is the first pruning performance, and if the pruning performance of the operation group is the second pruning performance, the operation group is prevented from being pruned, and the operation group is filtered out and is not used as the target operation group. Thus, based on the pruning property of the operation groups, the operation group having the first pruning property can be selected from the K operation groups, that is, the operation group which is allowed to be pruned from the K operation groups is selected as the target operation group.
Illustratively, the 4 operation groups shown in fig. 5a are respectively denoted as Group1, group2, group3 and Group4, and the pruning indication of the node of group_3 in Group1 is allowed to be pruned, so the pruning property of Group1 may be set to true to indicate that Group1 is allowed to be pruned, and similarly, the pruning properties of Group2 and Group3 may be set to true, while the pruning indication of the flag node in Group3 is prohibited from being pruned, and thus the pruning property of Group4 may be set to false to indicate that Group4 is prohibited from being pruned. Based on the pruning analysis of the operation Group described above, the operation Group having the pruning of "true" may be taken as the target operation Group, so that the target operation Group includes Group1, group2, and Group3.
S405, according to the target operation group, carrying out recognition processing on the network layer in the data processing model to obtain a pruning recognition result, and determining a serialization result according to the pruning recognition result.
In one embodiment, the computer device may specifically perform the following steps (1) - (3) when performing the identification processing on the network layer in the data processing model according to the target operation group to obtain the pruning identification result.
And (1) packaging the target operation group according to the pruning control variable to obtain a packaging operation group.
Specifically, the pruning control variable may be a vector containing parameters, and may be denoted as mask=And the specific value of the parameter in the mask can be 1, so that the pruning control variable is a unit vector. During the wrapping process, an operation node may be selected and wrapped with the pruning control variable, so as to obtain the operation node with the pruning control variable as a wrapping node, so as to represent a wrapping operation group (denoted by GroupWrapper).
In one implementation, the logic of the packaging process may include steps 3.1-3.2 as follows.
And 3.1, selecting an output operation node from at least one operation node included in the target operation group.
Specifically, if the target operation group includes one operation node, the operation node may be determined as an output operation node, and if the target operation group includes at least two operation nodes, a downstream-most operation node may be determined from the at least two operation nodes as an output operation node, where the downstream-most operation node refers to an operation node in the operation group, which is arranged in a last pointing order, because the plurality of operation nodes are connected by a directional edge. 4 operation groups as shown in fig. 5a are followed, wherein the target operation group includes: group1, group2 and Group3, while Group1 and Group3 comprise one operation node, so that for Group1 and Group3, corresponding nodes can be determined as output operation nodes, three operation nodes are included in Group2, and then Add nodes with directional edges pointing together can be determined as output operation nodes; the downstream node of the output operation node is an operation node in a different operation group than the output operation node. Thus, for each operation group allowed to be pruned, one operation node can be found as an output operation node of the target operation group.
And 3.2, acquiring a pruning control variable, and decorating the output operation node according to the pruning control variable to obtain a packaging operation group.
When the computer equipment is decorated, the output matrix of the output operation node can be multiplied by the pruning control variable mask to obtain a packaging operation group; pruning control variables can also be considered as a Wrapper adornment added to the output operational node. In this way, by finding out a certain operation node from the operation group as an output operation node of the operation group and adding a Wrapper decoration to the output operation node, the pruning simulation effect can be initialized. Alternatively, the serialization process may use the model programming method of keras of tensorflow, and the operation nodes may be converted by wrapping into custom wrapping nodes (i.e., output operation nodes that add mask information).
Illustratively, fig. 5b is a schematic packaging diagram of an operational group according to an exemplary embodiment of the present application. The data processing model is a neural network model and comprises a plurality of full connection layers (Dense), an addition calculation layer (Add), a flattening layer (flat) and an input layer (input), a calculation topological graph can be extracted by analyzing the data processing model, then the neural network model is divided into a plurality of operation groups (groups) which are not affected in the pruning process based on the calculation topological graph, and the pruning performance of each operation Group is analyzed by the pruning performance of operation nodes included in the operation groups. As shown in (1) in fig. 5b, the method includes 4 operation groups, each operation group can obtain pruning performance of true or false after pruning performance analysis, true represents that the operation group is allowed to be pruned, and false represents that the operation group is forbidden to be pruned. For an operation group with pruning property true, an output operation node can be selected, a corresponding output matrix position is found, mask information is added to the output operation node, namely, a decoration body is constructed for the output operation node, and packaging of the operation group is completed. As shown in fig. 5b (2), mask information is added to the node_3 in Group1 to obtain a wrapper operation Group1 (GroupWrapper 1), mask information is added to the node Add in Group2 to obtain a wrapper operation Group2 (GroupWrapper 2), mask information is added to the node_1 in Group3 to obtain a wrapper operation Group3 (GroupWrapper 3). Assuming that the network layer corresponding to the output operation node is a DNN layer, the calculation process of the DNN layer is as follows: . The calculation formula for matrix multiplication of the output result Y of the DNN layer and the mask is as follows: . For the pruning process of the DNN layer, it is described from the matrix as: for input as/> W is the weight parameter to be pruned, mask is the pruning control variable (also called pruning description),/>Is the offset.
The input matrix X, the weight parameter matrix W and the offset matrix b are assumed to be specifically as follows:
substituting the calculation process of the DNN layer, and obtaining a result Y after calculation, wherein the result Y is as follows:
the introduced mask variable may be initialized to a unit vector first, namely:
substituting the calculation formula of matrix multiplication of Y and mask, and obtaining a result Z by calculation, wherein the result Z is as follows:
and the Z calculated here can be used as a representation of the set of packaging operations. After a series of processing, the data processing model is converted into a structure of groups, and pruning control variables are added to the corresponding groups to simulate the accuracy of pruning. The packaging operation group constructed by multiplying the output matrix of the target operation group with the pruning control variable can represent the pruning simulation effect of the target operation group, and the pruning effect on different parameter layers can be simulated by subsequently modifying the added pruning control variable, so that pruning marking information is determined.
And (2) carrying out pruning decision processing according to the packaging operation group to obtain first pruning marking information of the target operation group.
In the pruning decision processing, in one implementation, a pruning algorithm may be invoked to determine the importance degree of each network parameter, and then modify the pruning control variable based on the importance degree of each network parameter to obtain the first pruning marking information. In another implementation, at least one candidate pruning marking information may be determined according to the pruning control variable, the candidate pruning marking information being obtained by modifying the pruning control variable, the at least one candidate pruning marking information having different values based on the pruning control variable; and then, pruning calculation is carried out on at least one piece of candidate pruning marking information and the output result of the network layer corresponding to the output operation node respectively to obtain at least one simulation calculation result, the precision of a data processing model under each simulation calculation result is determined, and the candidate pruning marking information used by the simulation calculation result corresponding to the maximum precision is determined to be the first pruning marking information. It can be understood that the number of the target operation groups includes at least one, and for each target operation group, corresponding first pruning marking information can be obtained in the above manner; the first pruning marking information for different target operation groups may be the same or different.
The first pruning marking information is used for marking the parameter dimension of the pruning required. Optionally, the first pruning marking information includes at least one of the following marking values: a first marking value (e.g., a value of "0") for marking the parameter dimension to be pruned, and a second marking value (e.g., a value of "1") for marking the parameter dimension to be preserved. The parameter dimension specifically refers to a parameter of a certain computing channel, for example, a first row of parameters in the expression Z of the packing operation group is a parameter dimension. By marking the parameter dimensions of the needed pruning through the first pruning marking information, it can be determined which dimension parameters need pruning, but in reality, the parameter amount of the matrix will not change, which is a pruning simulation process, and the marked parameter dimensions can be deleted based on the first pruning marking information to complete the actual pruning.
For example, when a pruning is required for which layer of parameters, the corresponding parameters in the representation of the packing operation group are set to 0, so that a corresponding pruning simulation result can be obtained. For example, when pruning is required for the first layer, only m1 is set to 0, and the output result corresponding to z is:
After the mask is set, the first layer of the output structure of the DNN layer is 0. It can be seen that for each network layer requiring pruning parameters in the pruning process, the parameter amount of the matrix does not actually change in the pruning simulation process. And when the model is stored later, selecting the matrix parameters of the first dimension for deletion, and finishing the final pruning. It will be appreciated that the above specific parameter values of the mask applied to the DNN layer are merely examples, and the pruning decision may be performed according to a pruning algorithm, so as to determine the mask of the final application as the first pruning marking information.
The following description will be made with reference to the result schematic diagram of the first pruning marking information shown in fig. 5c, where the first pruning marking information may be determined for the corresponding operation group by pruning simulation, and the description will be given taking the data processing model as an example of a relatively complex neural network. As shown in (1) of fig. 5c, the original neural network is divided into 4 groups, and after pruning feasibility analysis, the information of mask is added to the output of the pruned groups, so as to obtain a result as shown in (2) of fig. 5 c. It can be found that after a series of serialization processes, the data processing model is serialized into a plurality of operation groups, and for the target operation groups, the pruning simulation effect is realized by multiplying the mask, so that the accuracy in the pruning process can be simulated, but the actual parameter quantity of the model is not reduced yet; the initial identity matrix of the mask can be obtained after pruning calculation, and the output dimension of 0 in the mask is obtained after calculation, namely, the output dimension needs to be pruned. In the Group, the pruning layer does not actually obtain the corresponding mask information, and some target operation groups are not marked with mask information, for example, the mask_2 node is not marked with mask information, which indicates that parameter values in the masks of the target operation groups are all 1 to indicate that there is no pruning parameter layer, that is, the mask_2 layer does not need pruning weight parameters.
And (3) carrying out recognition processing on the network layer in the data processing model according to the first pruning marking information of the target operation group to obtain a pruning recognition result.
In one implementation, the first pruning marking information includes one or more of a first marking value and a second marking value. If the first pruning flag information mask of the target operation group includes a second flag value (e.g., a value of "1"), the weight parameters of the operation nodes in the target operation group are not pruned even if the pruning performance of the target operation group indicates that pruning is allowed. If an operation node in the target operation group receives the transparent pruning marking information, and the pruning marking information includes a first marking value (for example, a numerical value of "0"), pruning may be performed on the input parameters of the operation node.
In the identifying process, the first pruning marking information can be updated to the whole network, so that each operation node in the target operation group can obtain second pruning marking information for simulating pruning, and whether the network layer represented by the target operation node is a pruning network can be determined based on the second pruning marking information. Further, network parameters to be pruned in the network layer represented by the target operation node can be determined based on the second pruning marking information. For example, if the second pruning marking information of any target operation node includes a first marking value (for example, a numerical value of 0), it may be determined that the network layer represented by the target operation node needs pruning, and because the parameter dimension where the first marking value is located is the parameter dimension of the pruning, the network parameter of the pruning needs pruning may be determined.
And (3) packaging the target operation group, performing pruning simulation based on the packaged result, so that pruning marking information of the target operation group can be determined, and whether pruning is required to be performed on the network layer or not can be accurately identified based on the pruning marking information, and a pruning identification result is obtained.
In one implementation, the computer device may specifically perform the following steps ① - ② when performing step (3) above.
And ①, performing transparent transmission processing according to the first pruning marking information of the target operation group so as to enable at least one target operation node to obtain the second pruning marking information.
The target operation node refers to an operation node for representing a network layer having network parameters. Since the network layer represented by the target operation node has network parameters, the target operation node may also be referred to as a weight node, i.e. an operation node with parameters. Specifically, the target operation node may be an operation node with a parameter in the target operation group, or may be an operation node with a parameter in other operation groups. The target operation node can obtain the second pruning marking information by transmitting the first pruning marking information of the target operation group. The second pruning marking information and the first pruning marking information may be the same or different.
In the present application, the node types of the operation nodes are converged into six pruning types, as shown in the following table 1:
TABLE 1
As can be seen from table 1, the present application converges for a wide variety of node types, and classifies them into the above pruning types. Including but not limited to: pruning nodes, element nodes, passive pruning nodes, connection nodes, filtering nodes and complex nodes. And for the newly added calculation operation in the data processing model, the calculation operation is only needed to be classified according to corresponding rules, so that the expandability is high and the flexibility is high. The passive pruning nodes and the pruning nodes are nodes with parameters, and pruning marking information is not set for the passive pruning nodes, so that the process of determining network parameters of pruning is realized.
Based on the contents shown in table 1, the target operation node is an operation node of a target pruning type including: pruning nodes and passive pruning nodes. In the actual pruning process, the pruning type corresponding to the node type is processed, so that the calculation amount of the model can be reduced, and the pruning node and the passive pruning node can be used as target operation nodes in the specific processing, thereby realizing pruning of model parameters.
After the pruning marking information of the operation group is determined, the model needs to be durable, and at the moment, corresponding parameter amounts are required to be calculated for each operation node (comprising pruning nodes and passive pruning nodes) with parameters, so that when the model is durable, the input non-zero parameter amounts and the output non-zero parameter amounts can be calculated, and the subsequent pruning processing is facilitated. While this parameter may be a weight parameter, the parameter of the weight parameter is not only related to the own output mask, but also to the mask parameter of the previous layer. Therefore, the application can design a transparent transmission calculation method through pruning type, and the input and output of each operation node can be determined based on transparent transmission pruning mark information. The mask information of all the operation nodes can be updated through transparent transmission, so that the mask information of the operation nodes is comprehensive and accurate, and the accuracy of calculation operation execution is ensured.
In one implementation, the transparent transmission process may include one or more of the following: a. the pruning marking information is reversely and retrospectively updated in the target operation group; b. externally transmitting pruning marking information; c. the pruning marking information is searched forward. Accordingly, the computer device, when performing step ① above, may specifically perform one or more of the following a-c.
A. And if the target operation group comprises a plurality of operation nodes, performing reverse backtracking processing according to the first pruning marking information of the target operation group, so that other operation nodes except the output operation node in the target operation group obtain the output pruning marking information.
Specifically, during the backward backtracking processing, the output pruning marking information can be determined according to the first pruning marking information of the target operation group and the node type of the target operation node, and then the output pruning marking information is transmitted to the target operation node in the target operation group according to the topological relation between the operation nodes in the target operation node. The output pruning marking information is the same as or different from the first pruning marking information of the target operation group, depending on the node type of the target operation node and the dependency relationship with other operation nodes in the target operation group.
For example, the first pruning mark information mask of the target operation group is [1,0,1,0], and the target operation group includes the following nodes: add, dense, flatten and Conv nodes. The Add node's computation depends on the outputs of the Dense node and the flatten node, while the flatten's input depends on the Conv's output. Assuming that the mask of the output operation node Add is [1,0,1,0], it can be determined that the output pruning marking information mask of both the Dense node and the flatten node is [1,0,1,0], and the output pruning marking information mask of the Conv node is [1,0]. The second pruning marking information of the target operating node includes output pruning marking information.
It will be appreciated that if the target operation group includes an operation node, external transfer described in (2) or forward lookup described in (3) may be performed, so as to obtain corresponding pruning flag information. In addition, if a plurality of operation nodes are included in the target operation group, external transfer processing or forward lookup may also be performed.
B. If the downstream node of the output operation node in the target operation group is a node of a preset type, external transmission processing is performed according to the first pruning marking information of the target operation group, so that the downstream node obtains the output pruning marking information.
Specifically, the downstream node of the output operation node refers to an operation node to which the directed edge of the output operation node points. If the downstream node of the output operation node is a node of a preset type, for example, a node of a preset type is a flat node, the first pruning marking information of the target operation group can be transmitted to the outside, so that the downstream node obtains the output pruning marking information. It will be appreciated that the output pruning marking information may also be determined based on the first pruning marking information and the node type of the downstream node when communicated externally. For example, the output of the Conv node is used as the input of the flag node, and when the mask information of the Conv node is [1, 0], the mask information of the flag node is [1,1,0,1,1,0]. If the downstream node of the output operation node is a special type node, such as reshape, input, output nodes, etc., the first pruning flag information of the target operation group is not transferred to the outside.
C. if the operation group to which the upstream node of the target operation node included in the target operation group belongs is different from the target operation group, the pruning mark information is searched from the upstream node, and the searched pruning mark information is determined as the input pruning mark information of the target operation node.
Specifically, an upstream node of a target operation node included in a target operation group refers to a start node whose directed edge points to the target operation node. I.e., the directed edge of the upstream node points to the target operational node, then the upstream node of the output operational node is the other operational node whose directed edge points to the output operational node, assuming that the target operational node is the output operational node. If the operation group to which the upstream node of the target operation node in the target operation group belongs is another operation group different from the target operation group, pruning flag information may be searched from the upstream node. Because the upstream node may already obtain the output pruning marking information or has the first pruning marking information by means of internal backtracking, outward transmission and the like, the upstream node is the output operation node of other operation groups, so if the upstream node of the target operation node has the pruning marking information, the found pruning marking information can be found and used as the input pruning marking information of the target operation node, and the found pruning marking information can be the output pruning marking information of the upstream node or the first pruning marking information of the upstream node, which can be used as the input pruning marking information of the current operation node. For example, assume that an operation node in the target operation Group2 is a Dense, and an upstream node thereof is an Add node in Group1, and the Add node has mask information [1,0,1,0], so that the mask information of the Add node can be input_mask, i.e., input_mask is [1,0,1,0], as input pruning flag information of the Dense node.
If the operation group to which the upstream node of the target operation node included in the target operation group belongs is the same as the target operation group, or if the upstream node of the target operation node is not classified as the operation group, the second pruning marking information of the target operation node includes the output pruning marking information without looking forward to obtain the input pruning marking information.
It may be appreciated that after the target operation group completes the transparent transmission process, the target operation node has second pruning marking information, and the second pruning marking information includes one or more of the following: output pruning marking information and input pruning marking information.
For the transparent logic described in the above a-c, a schematic diagram of transparent pruning marking information shown in fig. 5d may be provided based on the mask information shown in fig. 5c, and since the pruning layer has not yet obtained the mask information, transparent processing is required to update the mask information into the entire neural network. In the transparent transmission processing process, the mask information of the operation Group can be updated to the whole neural network according to the pruning type of the operation node (which cannot be predicted by special computing nodes such as reshape and the like to cause the operation node to be unable to prune) and the mask information of the operation Group, and finally the weight node (including pruning nodes and passive pruning nodes) searches the corresponding output mask information to the previous node, so that the input mask information (input_mask) of the weight node is updated. Fig. 5d (1) is a transparent transmission result obtained by backtracking the inside of the target operation group, fig. 5d (2) is a transparent transmission result obtained by outwards transmitting mask information of the target operation group, and fig. 5d (3) is a transparent transmission result obtained by searching each weight node forwards.
Step ②, if the second pruning marking information of any target operation node includes the first marking value, determining the network layer represented by any target operation node as the network layer to be pruned, and adding the node identifier of the target operation node to the pruning identification result.
If the second pruning marking information of any target operation node comprises a first marking value (for example, the numerical value of 0), the target network layer represented by any target operation node is a network layer needing pruning, so that the node identification of the target operation node can be added into a pruning identification result; alternatively, the node identification of the target operation node may be a network name or a network ID of the target network layer. If the second pruning marking information of any target operation node includes a second marking value (for example, the value of "1"), it is indicated that the target network layer represented by any target operation node is a network layer without pruning, so that the node identifier of the target operation node may not be added to the pruning identification result. It may be appreciated that the number of the target operation nodes includes at least one, and in the above manner, for each target network layer represented by each target operation node, it may be determined whether the represented target network layer needs pruning according to the second pruning mark information of each target operation node.
Further, because the parameter dimension marked by the first marking value is the parameter dimension to be pruned, pruning calculation is performed based on the second pruning marking information and the output result of the target operation node, and the network parameter to be pruned in the target network layer can be determined. Based on the above processing, the pruning identification result includes at least one node identifier, and both indicate the network layer to be pruned, and the serialization result can be obtained according to the network layer to be pruned indicated by the pruning identification result.
The pruning identification process shown in the steps ① - ② can realize the update of the pruning marking information in the whole network through the transparent transmission of the first pruning marking information of the target operation group, so that the network layer to be pruned is automatically identified based on the updated pruning marking information, and the automatic pruning process is performed subsequently.
If the pruning identification result is used for indicating whether the corresponding network layer needs pruning, screening the network layer needing pruning according to the pruning identification result, thereby obtaining a serialization result; if the pruning identification result indicates a network layer to be pruned, the pruning identification result may be directly determined as a serialization result.
It can be seen that an optimized model can be obtained through the serialization process, and the optimized model is a parameter node model which is transparent to mask information, and the parameter node model includes a packaging operation group (GroupWrapper, which can be regarded as a result after merging of the self-defined neurons) obtained through serialization. In practical application, a model is required to be persistent, so that the optimized model is required to be subjected to deserialization, and a self-defined neuron (corresponding to an operation group) is converted into an original neuron structure, so that a pruned data processing model is obtained.
S406, pruning is carried out on the data processing model according to the serialization result.
The serialization results may include at least one node identification, as described in the serialization process above. In one implementation, at least one network layer of the plurality of network layers has a network parameter, and the network layer to be pruned indicated by the serialization result includes a target network layer represented by a target operation node, where the target network layer has the network parameter; the target operation node has second pruning marking information. Based on this, the computer device may specifically execute the following steps 4.1 to 4.3 when pruning the data processing model according to the serialization result.
And 4.1, determining a target parameter matrix according to the second pruning marking information of the target operation node and the network parameters of the target network layer.
Specifically, the second pruning marking information includes one or more of the following: output pruning marking information (or may be referred to as weight pruning marking information) and input pruning marking information input_mask; the network parameters of the target network layer include at least weight parameters. In one embodiment, the parameter values in the input pruning marking information included in the second pruning marking information of the target operation node have a first marking value (for example, a value of 0), and the parameter values in the output pruning marking information are all second marking values (for example, a value of 1), which can be understood that the second pruning marking information includes the input pruning marking information input_mask, and then pruning simulation calculation is performed according to the input pruning marking information and the network parameters of the target network layer, so as to obtain the target parameter matrix. If the second pruning marking information comprises output pruning marking information and the output pruning marking information comprises the first marking value, performing pruning simulation calculation according to the output pruning marking information and the network parameters of the target network layer to obtain a target parameter matrix. Optionally, the target parameter matrix includes a non-zero matrix dimension, the parameter layer corresponding to the non-zero matrix dimension is a non-pruning layer, and the parameter layer corresponding to the zero matrix dimension is a pruning layer.
In another embodiment, the second pruning marking information includes input_mask and output pruning marking information, and the network parameters of the target network layer include weight parameters, so when executing step 4.1, the computer device may perform pruning simulation calculation with the network parameters of the target network layer according to the input pruning marking information and the output pruning marking information to obtain a target parameter matrix, and specifically includes the following steps ①-②:① performing pruning simulation calculation processing on the input parameters of the target network layer according to the input pruning marking information of the target operation node to obtain the input parameters after pruning; ② And performing pruning simulation calculation processing according to the pruned input parameters, the output pruning marking information of the target operation node and the weight parameters of the target network layer to obtain a target parameter matrix. Specifically, the input parameters of the target network layer refer to the parameters input by other network layers to the target network layer, and are usually the output of the upstream network layer of the target network layer, for example, the parameters in the output parameter matrix of the upstream network layer may be used as the input parameters of the target network layer. The input pruning marking information input_mask may be multiplied by the output parameter matrix, and the obtained parameter matrix includes the pruned input parameters. Further, the result of multiplying the input parameter after pruning, the output pruning marking information of the target operation node and the weight parameter of the target network layer can be used as a target parameter matrix.
For example, the target network layer is a DNN layer, and for the pruning simulation process executed in the persistence process, the output pruning marking information of the DNN layer is assumed to be [1,0,1], and the calculation formula of the DNN layer after pruning is obtained through the above processing is as follows:
Wherein, And W is a weight parameter matrix, wherein the weight parameter matrix comprises a plurality of weight parameters. And W/>
Assuming that the input pruning flag information input_mask obtained by the DNN layer is [0, 1], x1 in the above calculation formula is 0 as input, and the above formula may be converted into the following formula:
Wherein, The input parameters after pruning, which are obtained by calculating the input pruning marking information and the input parameters included in the input parameter matrix, are W is a weight parameter matrix, and the following target parameter matrix/>, can be obtained according to the calculation result
Based on the target parameter matrix, the parameters marked by 0 are redundant parameters, namely network parameters needing pruning, other non-zero parameters are network parameters needing reservation, and the non-zero parameters in the target parameter matrix can be used for reconstructing the DNN layer.
And 4.2, carrying out reconstruction processing on the target network layer according to the target parameter matrix to obtain a reconstructed network layer.
In one implementation, the target parameter matrix includes a first flag value (e.g., a number "0"), where the parameter dimension in which the first flag value is located is the parameter dimension that is marked. Therefore, during the reconstruction processing, the parameter corresponding to the parameter dimension marked by the first marking value in the target parameter matrix can be determined as the target network parameter to be pruned in the target network layer; for example, the parameters of the dimension in which the zero parameter is in the target parameter matrix in the above exampleThe target network parameters to be pruned can be determined. And constructing a pruned target network layer according to the parameter dimension marked by the second marking value in the target parameter matrix, and copying network parameters except the target network parameters in the target parameter matrix into the pruned target network layer to obtain a reconstructed network layer. Specifically, the second flag value is, for example, a value of "1", and the parameter dimension that is marked is a non-zero parameter dimension. Still with the above example of the target parameter matrix/>For example, by using non-zero parameter dimensions in the target parameter matrix, a pruned target network layer can be reconstructed, and the reconstructed DNN layer has the following calculation formula:
Wherein, Reconstructing input parameters of the DNN layer, and reconstructing a weight parameter matrix of the DNN layer
From the above formula, the W variable parameter in the original DNN layer is reduced from a 3x3 dimensional matrix before pruning to a 2x2 dimensional matrix. And the input parameter matrix is also reduced from 3x3 dimensions to a 2x2 dimensional matrix. Non-zero parameters in the target parameter matrix comprisingCan be copied to the pruned target network layer to obtain a reconstructed network layer.
And 4.3, adopting a reconstruction network layer to replace the target network layer in the data processing model.
Specifically, the target network layer in the data processing model is replaced by the reconstructed network layer, so that the persistence of the data processing model after pruning can be realized, the network parameter persistence reserved after pruning is applied to the model, the deserialization process is completed, the actual pruning processing of the data processing model is realized, and the parameter quantity in the data processing model after pruning is actually reduced.
The pruning process shown in the steps 4.1 to 4.3 is to prune the data processing model truly, so that the size of the model parameters can be reduced. After pruning is completed, mask variables and the like are deleted, and the wrapping nodes defined by the wrapping operation group are reversely converted into corresponding network layers through anti-serialization processing, so that model persistence is realized.
Based on the above-described scheme, a flowchart of a model processing method shown in fig. 6 may be provided, where the flowchart includes the following functional modules: a conversion module (Converter module), a grouping module (Group module), a packaging module (Wrapper module), a pruning module (pruner module), and a storage module (Saver module). The functions of the respective modules are as follows: ① The conversion module is used for analyzing and processing the network topology structure of the data processing model to obtain a calculation topology graph and attribute information. ② The grouping module is used for grouping the plurality of operation nodes according to the dependency relationship among the operation nodes to obtain a plurality of operation groups, and analyzing the pruning property of the operation groups according to the pruning property included in the attribute information. ③ The packaging module is used for selecting a target operation group from at least one operation group according to pruning property of the operation groups, packaging output operation nodes of the target operation group to obtain a packaging operation group, and constructing a parameter node model with mask information. ④ The pruning module can perform pruning simulation according to the packaging operation group to determine pruning marking information of the target operation group. ⑤ The storage module can be used for transmitting pruning marking information of the target operation group in a transparent way, further determining which network layers need parameter pruning based on the transparent pruning marking information, and determining the parameter layers needing pruning. In a specific implementation, a non-zero matrix dimension can be calculated according to pruning marking information, a network layer after pruning is reconstructed according to the calculated non-zero matrix dimension, and then network parameters to be reserved are copied into the network layer after pruning, so that model persistence is realized, and pruning of a data processing model is completed.
The interaction flow between the modules comprises the following contents: first, a computer device may obtain a data processing model, which is a raw model. The computer device may invoke the conversion module to convert the data processing model into a computational topology map through analysis of the network topology to describe the network topology of the original model. The computational topology is a simplified Model Graph (Model Graph) that includes operational nodes for representing operators in the data processing Model. Then, a grouping module may be invoked to Group the operation nodes in the computation topology map to obtain a plurality of operation groups (Multi groups), and the pruning property of the operation groups may be analyzed according to the pruning property of the operation nodes extracted by the Converter module. Then, the Wrapper module may select an operation Group allowing pruning from the plurality of operation groups as a target operation Group, select an output operation node from the target operation Group, and add mask information (as a unit vector) to the output operation node to obtain a Wrapper operation Group (Group Wrapper). And then, a pruning module can be called, pruning simulation processing is carried out according to the packaging operation group, and pruning mark information is decided so as to determine a network layer to be pruned according to the pruning mark information. And finally, a storage module can be called to carry out transparent transmission processing on the pruning marking information, and the network layers to be pruned and the network parameters to be pruned can be calculated based on the transparent transmission pruning marking information, so that the data processing model after pruning is obtained by reconstructing the network layers and copying the network parameters to be reserved and persisting the parameters into the applied model.
The scheme provided by the application can be applied to the optimization field of various AI models, and can particularly fall to the model pruning field, and redundancy can be reduced through model pruning, so that the reasoning performance of the model is accelerated. When the model is optimized, enough model optimization experience is not required to be mastered, pruning after pruning analysis is performed on the model in a manual intervention mode is avoided, so that the model optimization threshold can be reduced, the pruning output process from the input of the model to the model is realized fully automatically, and the model persistence is performed in a mode of no manual intervention. In addition, the application converges to the open-source deep learning operation, so that various deep learning calculations are uniformly classified into several pruning types, and the application can effectively and rapidly converge on the compatibility of the expandable and open-source deep learning frames (such as tensorflow).
Referring to fig. 7, fig. 7 is a schematic structural diagram of a model processing device according to an embodiment of the present application, where the model processing device may be disposed in a computer device according to an embodiment of the present application. The model processing means shown in fig. 7 may be a computer program (comprising program code) running in a computer device, which model processing means may be used to perform some or all of the steps of the method embodiments shown in fig. 2 or fig. 4. Referring to fig. 7, the model processing apparatus may include the following units:
an acquiring unit 701, configured to acquire a data processing model to be processed; the data processing model comprises a plurality of network layers;
The processing unit 702 is configured to analyze and process a network topology structure of the data processing model to obtain a computation topology map, where the computation topology map includes operation nodes corresponding to a network layer and attribute information of the operation nodes, and the attribute information includes pruning; based on the attribute information of the operation nodes in the calculation topological graph, carrying out serialization processing on the data processing model to obtain a serialization result, wherein the serialization result is used for indicating a network layer needing pruning in the data processing model; and pruning the data processing model according to the serialization result.
In one embodiment, the processing unit 702 is specifically configured to, when performing analysis processing on a network topology of the data processing model to obtain a calculated topology map: extracting and processing a network topology structure of the data processing model to obtain a calculation topology graph, wherein the calculation topology graph comprises: a plurality of operation nodes and at least one edge; each operation node is used for representing one network layer in the data processing model, and each edge is used for connecting two associated operation nodes; and carrying out attribute extraction processing on each operation node in the calculation topological graph to obtain attribute information of each operation node, and adding the attribute information of each operation node into the calculation topological graph.
In one embodiment, when performing attribute extraction processing on each operation node in the computation topology map to obtain attribute information of each operation node, the processing unit 702 is specifically configured to: according to the network layer represented by each operation node in the calculation topological graph, carrying out type identification processing on each operation node to obtain the node type of each operation node; determining pruning information of each operation node according to the corresponding relation between the node type and the pruning information and the node type of each operation node; pruning information includes pruning nature and pruning transitivity; and adding the node type of each operation node and pruning information of each operation node into the attribute information of the corresponding operation node.
In one embodiment, the computational topology includes a plurality of operational nodes; the pruning property of each operation node is used for indicating whether the corresponding operation node is allowed to be pruned; the processing unit 702 is specifically configured to, when performing serialization processing on the data processing model based on attribute information of the operation node in the computation topology map to obtain a serialization result: according to the dependency relationship among a plurality of operation nodes in the calculation topological graph, grouping the plurality of operation nodes to obtain K operation groups; each operation group comprises at least one operation node; k is a positive integer; selecting a target operation group from the K operation groups according to pruning property of the operation nodes included in each operation group; wherein the target operation group refers to an operation group in which operation nodes allowing pruning exist; and according to the target operation group, carrying out identification processing on the network layer in the data processing model to obtain a pruning identification result, and determining a serialization result according to the pruning identification result.
In one embodiment, the dependency relationship is used to indicate a computation dependency, and the processing unit 702 is specifically configured to, when performing packet processing on a plurality of operation nodes according to the dependency relationship provided between the plurality of operation nodes in the computation topology, obtain K operation groups: initializing each operation node in the calculation topological graph to obtain each initial operation group, wherein one initial operation group comprises one operation node; and combining the related initial operation groups in the initial operation groups according to the calculation dependence among the operation nodes to obtain K operation groups.
In one embodiment, the processing unit 702 is specifically configured to, when selecting a target operation group from the K operation groups according to pruning of the operation nodes included in each operation group: according to the pruning performance of the operation nodes included in each operation group, pruning performance analysis is carried out on the corresponding operation group, and pruning performance of each operation group is obtained; the pruning property of the operation group is used for indicating whether the operation group is allowed to be pruned; according to the pruning property of each operation group, an operation group allowing to be pruned is selected from K operation groups as a target operation group.
In one embodiment, the processing unit 702 is specifically configured to, when performing pruning analysis on each operation group according to the pruning performance of the operation node included in each operation group to obtain the pruning performance of each operation group: traversing K operation groups, and taking the traversed operation groups as the ith operation group, wherein i is E [1, K ]; if the pruning performance of at least one operation node in the ith operation group is used for indicating that the corresponding operation node is allowed to be pruned, generating a first pruning performance, and determining the first pruning performance as the pruning performance of the ith operation group, wherein the first pruning performance is used for indicating that the ith operation group is allowed to be pruned; if the pruning performance of each operation node included in the ith operation group is used for indicating that the corresponding operation node is forbidden to be pruned, generating a second pruning performance, and determining the second pruning performance as the pruning performance of the ith operation group, wherein the second pruning performance is used for indicating that the ith operation group is forbidden to be pruned; after traversing the K operation groups, pruning performance of each operation group is obtained.
In one embodiment, the processing unit 702 is specifically configured to, when performing recognition processing on the network layer in the data processing model according to the target operation group to obtain a pruning recognition result: packaging the target operation group according to the pruning control variable to obtain a packaging operation group; pruning decision processing is carried out according to the packaging operation group, so that first pruning marking information of the target operation group is obtained; the first pruning marking information is used for marking the parameter dimension of the pruning to be performed; and carrying out recognition processing on the network layer in the data processing model according to the first pruning marking information of the target operation group to obtain a pruning recognition result.
In one embodiment, the processing unit 702 is specifically configured to, when performing identification processing on the network layer in the data processing model according to the first pruning marking information of the target operation group to obtain a pruning identification result: performing transparent transmission processing on the first pruning marking information of the target operation group so as to enable at least one target operation node to obtain second pruning marking information; the target operation node refers to an operation node for representing a network layer having network parameters; if the second pruning marking information of any target operation node comprises the first marking value, determining the target network layer represented by any target operation node as the network layer to be pruned, and adding the node identification of the target operation node into the pruning identification result.
In one embodiment, the processing unit 702 is specifically configured to, when performing transparent transmission processing on the first pruning marking information of the target operation group so that at least one target operation node obtains the second pruning marking information: if the target operation group comprises a plurality of operation nodes, performing reverse backtracking processing according to the first pruning marking information of the target operation group, so that other operation nodes except the output operation node in the target operation group obtain the output pruning marking information; if the downstream node of the output operation node in the target operation group is a node of a preset type, performing external transmission processing according to the first pruning marking information of the target operation group, so that the downstream node obtains the output pruning marking information; if the operation group to which the upstream node of the target operation node included in the target operation group belongs is different from the target operation group, searching first pruning marking information from the upstream node, and determining the searched first pruning marking information as input pruning marking information of the target operation node; wherein the second pruning marking information includes one or more of: output pruning marking information and input pruning marking information.
In one embodiment, at least one of the plurality of network layers has a network parameter; the network layer to be pruned indicated by the serialization result comprises a target network layer represented by a target operation node, wherein the target network layer is provided with network parameters, and the target operation node is provided with second pruning mark information; the processing unit 702 is specifically configured to, when pruning the data processing model according to the serialization result: determining a target parameter matrix according to the second pruning marking information of the target operation node and the network parameters of the target network layer; carrying out reconstruction processing on the target network layer according to the target parameter matrix to obtain a reconstructed network layer; and adopting the reconstruction network layer to replace the target network layer in the data processing model.
In one embodiment, the second pruning marking information includes: inputting pruning marking information and outputting pruning marking information; the network parameters of the target network layer comprise weight parameters; the processing unit 702 is specifically configured to, when determining the target parameter matrix according to the second pruning marking information of the target operation node and the network parameter of the target network layer: performing pruning simulation calculation processing on the input parameters of the target network layer according to the input pruning marking information of the target operation node to obtain pruned input parameters; and performing pruning simulation calculation processing according to the pruned input parameters, the output pruning marking information of the target operation node and the weight parameters of the target network layer to obtain a target parameter matrix.
In one embodiment, the target parameter matrix includes a first marker value; the processing unit 702 is specifically configured to, when performing reconstruction processing on the target network layer according to the target parameter matrix to obtain a reconstructed network layer: determining a parameter corresponding to a parameter dimension marked by a first marking value in a target parameter matrix as a target network parameter to be pruned in a target network layer; constructing a pruned target network layer according to the parameter dimension marked by the second marking value in the target parameter matrix; copying parameters except the target network parameters in the target parameter matrix into the pruned target network layer to obtain a reconstructed network layer.
It may be understood that the specific functions of each unit of the model processing device described in the embodiments of the present application may be specifically implemented according to the method in the foregoing method embodiments, and the specific implementation process may refer to the relevant description of the foregoing method embodiments, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.
The following description is provided with respect to a computer device according to an embodiment of the present application.
The embodiment of the application also provides a structural schematic diagram of the computer equipment, and the structural schematic diagram of the computer equipment can be seen in fig. 8; the computer device may include: a processor 801, input devices 802, output devices 803, and a memory 804. The processor 801, the input device 802, the output device 803, and the memory 804 are connected by buses. The memory 804 is used for storing a computer readable storage medium including a computer program, and the processor 801 is used for executing the computer program stored in the memory 804.
In one embodiment, the processor 801 performs the following operations by running a computer program in the memory 804: acquiring a data processing model to be processed; the data processing model comprises a plurality of network layers; analyzing and processing a network topology structure of the data processing model to obtain a calculation topology graph, wherein the calculation topology graph comprises operation nodes corresponding to a network layer and attribute information of the operation nodes, and the attribute information comprises pruning property; based on the attribute information of the operation nodes in the calculation topological graph, carrying out serialization processing on the data processing model to obtain a serialization result, wherein the serialization result is used for indicating a network layer needing pruning in the data processing model; and pruning the data processing model according to the serialization result.
In one embodiment, the processor 801 is specifically configured to, when performing analysis processing on a network topology of the data processing model to obtain a calculated topology map: extracting and processing a network topology structure of the data processing model to obtain a calculation topology graph, wherein the calculation topology graph comprises: a plurality of operation nodes and at least one edge; each operation node is used for representing one network layer in the data processing model, and each edge is used for connecting two associated operation nodes; and carrying out attribute extraction processing on each operation node in the calculation topological graph to obtain attribute information of each operation node, and adding the attribute information of each operation node into the calculation topological graph.
In one embodiment, when performing attribute extraction processing on each operation node in the computation topology map to obtain attribute information of each operation node, the processor 801 is specifically configured to: according to the network layer represented by each operation node in the calculation topological graph, carrying out type identification processing on each operation node to obtain the node type of each operation node; determining pruning information of each operation node according to the corresponding relation between the node type and the pruning information and the node type of each operation node; pruning information includes pruning nature and pruning transitivity; and adding the node type of each operation node and pruning information of each operation node into the attribute information of the corresponding operation node.
In one embodiment, the computational topology includes a plurality of operational nodes; the pruning property of each operation node is used for indicating whether the corresponding operation node is allowed to be pruned; the processor 801 is specifically configured to, when performing serialization processing on the data processing model based on attribute information of the operation node in the computation topology map to obtain a serialization result: according to the dependency relationship among a plurality of operation nodes in the calculation topological graph, grouping the plurality of operation nodes to obtain K operation groups; each operation group comprises at least one operation node; k is a positive integer; selecting a target operation group from the K operation groups according to pruning property of the operation nodes included in each operation group; wherein the target operation group refers to an operation group in which operation nodes allowing pruning exist; and according to the target operation group, carrying out identification processing on the network layer in the data processing model to obtain a pruning identification result, and determining a serialization result according to the pruning identification result.
In one embodiment, the dependency relationship is used to indicate a computation dependency, and the processor 801 is specifically configured to, when performing packet processing on a plurality of operation nodes according to the dependency relationship provided between the plurality of operation nodes in the computation topology, obtain K operation groups: initializing each operation node in the calculation topological graph to obtain each initial operation group, wherein one initial operation group comprises one operation node; and combining the related initial operation groups in the initial operation groups according to the calculation dependence among the operation nodes to obtain K operation groups.
In one embodiment, the processor 801 is specifically configured to, when selecting a target operation group from the K operation groups according to pruning of operation nodes included in each operation group: according to the pruning performance of the operation nodes included in each operation group, pruning performance analysis is carried out on the corresponding operation group, and pruning performance of each operation group is obtained; the pruning property of the operation group is used for indicating whether the operation group is allowed to be pruned; according to the pruning property of each operation group, an operation group allowing to be pruned is selected from K operation groups as a target operation group.
In one embodiment, the processor 801 is specifically configured to, when performing pruning analysis on each operation group according to the pruning performance of the operation node included in each operation group to obtain the pruning performance of each operation group: traversing K operation groups, and taking the traversed operation groups as the ith operation group, wherein i is E [1, K ]; if the pruning performance of at least one operation node in the ith operation group is used for indicating that the corresponding operation node is allowed to be pruned, generating a first pruning performance, and determining the first pruning performance as the pruning performance of the ith operation group, wherein the first pruning performance is used for indicating that the ith operation group is allowed to be pruned; if the pruning performance of each operation node included in the ith operation group is used for indicating that the corresponding operation node is forbidden to be pruned, generating a second pruning performance, and determining the second pruning performance as the pruning performance of the ith operation group, wherein the second pruning performance is used for indicating that the ith operation group is forbidden to be pruned; after traversing the K operation groups, pruning performance of each operation group is obtained.
In one embodiment, when the processor 801 performs recognition processing on the network layer in the data processing model according to the target operation group to obtain a pruning recognition result, the processor 801 is specifically configured to: packaging the target operation group according to the pruning control variable to obtain a packaging operation group; pruning decision processing is carried out according to the packaging operation group, so that first pruning marking information of the target operation group is obtained; the first pruning marking information is used for marking the parameter dimension of the pruning to be performed; and carrying out recognition processing on the network layer in the data processing model according to the first pruning marking information of the target operation group to obtain a pruning recognition result.
In one embodiment, when the processor 801 performs recognition processing on the network layer in the data processing model according to the first pruning marking information of the target operation group to obtain a pruning recognition result, the processor 801 is specifically configured to: performing transparent transmission processing on the first pruning marking information of the target operation group so as to enable at least one target operation node to obtain second pruning marking information; the target operation node refers to an operation node for representing a network layer having network parameters; if the second pruning marking information of any target operation node comprises the first marking value, determining the target network layer represented by any target operation node as the network layer to be pruned, and adding the node identification of the target operation node into the pruning identification result.
In one embodiment, when the processor 801 performs transparent transmission processing on the first pruning mark information of the target operation group so that at least one target operation node obtains the second pruning mark information, the processor is specifically configured to: if the target operation group comprises a plurality of operation nodes, performing reverse backtracking processing according to the first pruning marking information of the target operation group, so that other operation nodes except the output operation node in the target operation group obtain the output pruning marking information; if the downstream node of the output operation node in the target operation group is a node of a preset type, performing external transmission processing according to the first pruning marking information of the target operation group, so that the downstream node obtains the output pruning marking information; if the operation group to which the upstream node of the target operation node included in the target operation group belongs is different from the target operation group, searching first pruning marking information from the upstream node, and determining the searched first pruning marking information as input pruning marking information of the target operation node; wherein the second pruning marking information includes one or more of: output pruning marking information and input pruning marking information.
In one embodiment, at least one of the plurality of network layers has a network parameter; the network layer to be pruned indicated by the serialization result comprises a target network layer represented by a target operation node, wherein the target network layer is provided with network parameters, and the target operation node is provided with second pruning mark information; the processor 801 is specifically configured to, when pruning the data processing model according to the serialization result: determining a target parameter matrix according to the second pruning marking information of the target operation node and the network parameters of the target network layer; carrying out reconstruction processing on the target network layer according to the target parameter matrix to obtain a reconstructed network layer; and adopting the reconstruction network layer to replace the target network layer in the data processing model.
In one embodiment, the second pruning marking information includes: inputting pruning marking information and outputting pruning marking information; the network parameters of the target network layer comprise weight parameters; the processor 801 is specifically configured to, when determining the target parameter matrix according to the second pruning marking information of the target operation node and the network parameter of the target network layer: performing pruning simulation calculation processing on the input parameters of the target network layer according to the input pruning marking information of the target operation node to obtain pruned input parameters; and performing pruning simulation calculation processing according to the pruned input parameters, the output pruning marking information of the target operation node and the weight parameters of the target network layer to obtain a target parameter matrix.
In one embodiment, the target parameter matrix includes a first marker value; the processor 801 is specifically configured to, when performing reconstruction processing on the target network layer according to the target parameter matrix to obtain a reconstructed network layer: determining a parameter corresponding to a parameter dimension marked by a first marking value in a target parameter matrix as a target network parameter to be pruned in a target network layer; constructing a pruned target network layer according to the parameter dimension marked by the second marking value in the target parameter matrix; copying parameters except the target network parameters in the target parameter matrix into the pruned target network layer to obtain a reconstructed network layer.
It should be understood that the computer device described in the embodiments of the present application may perform the description of the model processing method in the foregoing corresponding embodiment, or may perform the description of the model processing device in the foregoing corresponding embodiment, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.
Furthermore, it should be noted here that: the embodiment of the present application further provides a computer readable storage medium, and the computer readable storage medium stores a computer program, where the computer program includes program instructions, when executed by a processor, can perform the method in the embodiment corresponding to fig. 2 and fig. 4, and therefore, a detailed description will not be given here.
According to one aspect of the present application, there is provided a computer program product comprising a computer program stored in a computer readable storage medium. The processor of the computer device reads the computer program from the computer readable storage medium, and the processor executes the computer program, so that the computer device can perform the method in the corresponding embodiment of fig. 2 and fig. 4, and thus, a detailed description will not be given here.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.
The above disclosure is only a preferred embodiment of the present application, and it should be understood that the scope of the application is not limited thereto, but all or part of the procedures for implementing the above embodiments can be modified by one skilled in the art according to the scope of the appended claims.

Claims (16)

1. A method of model processing, the method comprising:
acquiring a data processing model to be processed; the data processing model comprises a plurality of network layers;
analyzing and processing a network topology structure of the data processing model to obtain a calculation topology graph, wherein the calculation topology graph comprises operation nodes corresponding to the network layer and attribute information of the operation nodes, and the attribute information comprises pruning property; the computing topological graph comprises a plurality of operation nodes; the pruning property of each operation node is used for indicating whether the corresponding operation node is allowed to be pruned;
according to the dependency relationship among the operation nodes in the calculation topological graph, grouping the operation nodes to obtain K operation groups; each operation group comprises at least one operation node; k is a positive integer;
Selecting a target operation group from the K operation groups according to pruning property of operation nodes included in each operation group; wherein the target operation group refers to an operation group in which operation nodes allowing pruning exist;
According to the target operation group, carrying out identification processing on the network layer in the data processing model to obtain a pruning identification result, and determining a serialization result according to the pruning identification result, wherein the serialization result is used for indicating the network layer to be pruned in the data processing model;
and pruning the data processing model according to the serialization result.
2. The method of claim 1, wherein analyzing the network topology of the data processing model to obtain a calculated topology map comprises:
Extracting the network topology structure of the data processing model to obtain a calculation topology graph, wherein the calculation topology graph comprises: a plurality of operation nodes and at least one edge; each of the operation nodes is used for representing one network layer in the data processing model, and each of the edges is used for connecting two associated operation nodes;
And carrying out attribute extraction processing on each operation node in the calculation topological graph to obtain attribute information of each operation node, and adding the attribute information of each operation node into the calculation topological graph.
3. The method of claim 2, wherein performing attribute extraction processing on each operation node in the computation topology to obtain attribute information of each operation node comprises:
According to the network layer represented by each operation node in the calculation topological graph, carrying out type identification processing on each operation node to obtain the node type of each operation node;
determining pruning information of each operation node according to the corresponding relation between the node type and the pruning information and the node type of each operation node; the pruning information comprises pruning property and pruning transitivity;
And adding the node type of each operation node and pruning information of each operation node into attribute information of the corresponding operation node.
4. The method of claim 1, wherein the dependency relationship is used to indicate a computation dependency, and the grouping the operation nodes according to the dependency relationship of the operation nodes in the computation topology map to obtain K operation groups includes:
Initializing each operation node in the calculation topological graph to obtain each initial operation group, wherein one initial operation group comprises one operation node;
And combining the related initial operation groups in the initial operation groups according to the calculation dependence among the operation nodes to obtain K operation groups.
5. The method of claim 1, wherein the selecting a target operation group from the K operation groups according to pruning of operation nodes included in each of the operation groups comprises:
according to the pruning performance of the operation nodes included in each operation group, carrying out pruning performance analysis on the corresponding operation group to obtain the pruning performance of each operation group; the pruning property of the operation group is used for indicating whether the operation group is allowed to be pruned;
and selecting an operation group which is allowed to be pruned from the K operation groups as a target operation group according to pruning property of each operation group.
6. The method of claim 5, wherein said performing a pruning analysis on each of said operation groups based on pruning of operation nodes included in each of said operation groups to obtain pruning of each of said operation groups comprises:
Traversing the K operation groups, and taking the traversed operation groups as the ith operation group, wherein i is E [1, K ];
If the pruning performance of at least one operation node in the ith operation group is used for indicating that the corresponding operation node is allowed to be pruned, generating a first pruning performance, and determining the first pruning performance as the pruning performance of the ith operation group, wherein the first pruning performance is used for indicating that the ith operation group is allowed to be pruned;
If the pruning performance of each operation node included in the ith operation group is used for indicating that the corresponding operation node is forbidden to be pruned, generating a second pruning performance, and determining the second pruning performance as the pruning performance of the ith operation group, wherein the second pruning performance is used for indicating that the ith operation group is forbidden to be pruned;
And after traversing the K operation groups, obtaining pruning property of each operation group.
7. The method of claim 1, wherein the identifying the network layer in the data processing model according to the target operation group to obtain the pruning identification result comprises:
packaging the target operation group according to the pruning control variable to obtain a packaging operation group;
Performing pruning decision processing according to the packaging operation group to obtain first pruning marking information of the target operation group; the first pruning marking information is used for marking parameter dimensions of pruning required;
and according to the first pruning marking information of the target operation group, carrying out identification processing on the network layer in the data processing model to obtain a pruning identification result.
8. The method of claim 7, wherein the identifying the network layer in the data processing model according to the first pruning marking information of the target operation group to obtain a pruning identification result comprises:
Performing transparent transmission processing on the first pruning marking information of the target operation group so as to enable at least one target operation node to obtain second pruning marking information; the target operation node is an operation node for representing a network layer with network parameters;
If the second pruning marking information of any target operation node comprises the first marking value, determining the target network layer represented by the any target operation node as the network layer to be pruned, and adding the node identification of the target operation node into a pruning identification result.
9. The method of claim 8, wherein the performing a transparent transmission process on the first pruning marking information of the target operation group to enable at least one target operation node to obtain second pruning marking information comprises:
if the target operation group comprises a plurality of operation nodes, performing reverse backtracking processing according to the first pruning marking information of the target operation group, so that other operation nodes except the output operation node in the target operation group obtain the output pruning marking information;
If the downstream node of the output operation node in the target operation group is a node of a preset type, performing external transmission processing according to the first pruning marking information of the target operation group, so that the downstream node obtains the output pruning marking information;
if the operation group to which the upstream node of the target operation node included in the target operation group belongs is different from the target operation group, searching first pruning marking information from the upstream node, and determining the searched first pruning marking information as input pruning marking information of the target operation node;
Wherein the second pruning marking information includes one or more of the following: output pruning marking information and input pruning marking information.
10. The method of claim 1, wherein at least one of the plurality of network layers has network parameters; the network layer to be pruned indicated by the serialization result comprises a target network layer represented by a target operation node, wherein the target network layer is provided with network parameters, and the target operation node is provided with second pruning mark information;
And pruning the data processing model according to the serialization result, wherein the pruning processing comprises the following steps:
Determining a target parameter matrix according to the second pruning marking information of the target operation node and the network parameters of the target network layer;
carrying out reconstruction processing on the target network layer according to the target parameter matrix to obtain a reconstructed network layer;
and adopting the reconstruction network layer to replace the target network layer in the data processing model.
11. The method of claim 10, wherein the second pruning marking information comprises: inputting pruning marking information and outputting pruning marking information; the network parameters of the target network layer comprise weight parameters; the determining a target parameter matrix according to the second pruning marking information of the target operation node and the network parameters of the target network layer includes:
Performing pruning simulation calculation processing on the input parameters of the target network layer according to the input pruning marking information of the target operation node to obtain pruned input parameters;
And performing pruning simulation calculation processing according to the pruned input parameters, the output pruning marking information of the target operation node and the weight parameters of the target network layer to obtain a target parameter matrix.
12. The method of claim 10, wherein the target parameter matrix comprises a first marker value; the reconstructing the target network layer according to the target parameter matrix to obtain a reconstructed network layer, including:
Determining a parameter corresponding to a parameter dimension marked by a first marking value in the target parameter matrix as a target network parameter to be pruned in the target network layer;
Constructing a pruned target network layer according to the parameter dimension marked by the second marking value in the target parameter matrix;
Copying parameters except the target network parameters in the target parameter matrix into the pruned target network layer to obtain a reconstructed network layer.
13. A model processing apparatus, comprising:
The acquisition unit is used for acquiring the data processing model to be processed; the data processing model comprises a plurality of network layers;
The processing unit is used for analyzing and processing the network topology structure of the data processing model to obtain a calculation topology graph, wherein the calculation topology graph comprises operation nodes corresponding to the network layer and attribute information of the operation nodes, and the attribute information comprises pruning property; the computing topological graph comprises a plurality of operation nodes; the pruning property of each operation node is used for indicating whether the corresponding operation node is allowed to be pruned;
The processing unit is further configured to perform grouping processing on the plurality of operation nodes according to the dependency relationship among the plurality of operation nodes in the computation topology map, so as to obtain K operation groups; each operation group comprises at least one operation node; k is a positive integer; selecting a target operation group from the K operation groups according to pruning property of operation nodes included in each operation group; wherein the target operation group refers to an operation group in which operation nodes allowing pruning exist; according to the target operation group, carrying out identification processing on the network layer in the data processing model to obtain a pruning identification result, and determining a serialization result according to the pruning identification result, wherein the serialization result is used for indicating the network layer to be pruned in the data processing model;
and the processing unit is also used for pruning the data processing model according to the serialization result.
14. A computer device, comprising:
A processor adapted to execute a computer program;
A computer readable storage medium having stored therein a computer program which, when executed by the processor, performs the model processing method according to any one of claims 1-12.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, performs the model processing method according to any one of claims 1-12.
16. A computer program product, characterized in that the computer program product comprises a computer program or computer instructions which are executed by a processor to implement the model processing method according to any of claims 1-12.
CN202410241105.XA 2024-03-04 2024-03-04 Model processing method and related equipment Active CN117829242B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410241105.XA CN117829242B (en) 2024-03-04 2024-03-04 Model processing method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410241105.XA CN117829242B (en) 2024-03-04 2024-03-04 Model processing method and related equipment

Publications (2)

Publication Number Publication Date
CN117829242A CN117829242A (en) 2024-04-05
CN117829242B true CN117829242B (en) 2024-05-03

Family

ID=90521309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410241105.XA Active CN117829242B (en) 2024-03-04 2024-03-04 Model processing method and related equipment

Country Status (1)

Country Link
CN (1) CN117829242B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001483A (en) * 2020-08-14 2020-11-27 广州市百果园信息技术有限公司 Method and device for pruning neural network model
CN112487033A (en) * 2020-11-30 2021-03-12 国网山东省电力公司电力科学研究院 Service visualization method and system for data flow and network topology construction
CN113657896A (en) * 2021-08-20 2021-11-16 成都链安科技有限公司 Block chain transaction topological graph analysis method and device based on graph neural network
CN116915450A (en) * 2023-06-30 2023-10-20 西安理工大学 Topology pruning optimization method based on multi-step network attack recognition and scene reconstruction

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200234129A1 (en) * 2019-01-22 2020-07-23 Nvidia Corporation Techniques for removing masks from pruned neural networks
US20230153623A1 (en) * 2021-11-18 2023-05-18 GM Global Technology Operations LLC Adaptively pruning neural network systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112001483A (en) * 2020-08-14 2020-11-27 广州市百果园信息技术有限公司 Method and device for pruning neural network model
CN112487033A (en) * 2020-11-30 2021-03-12 国网山东省电力公司电力科学研究院 Service visualization method and system for data flow and network topology construction
CN113657896A (en) * 2021-08-20 2021-11-16 成都链安科技有限公司 Block chain transaction topological graph analysis method and device based on graph neural network
CN116915450A (en) * 2023-06-30 2023-10-20 西安理工大学 Topology pruning optimization method based on multi-step network attack recognition and scene reconstruction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于PLS快速剪枝法的神经网络盲均衡;李振兴 等;四川兵工学报;20100125(第01期);第133-135、147页 *

Also Published As

Publication number Publication date
CN117829242A (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN112633010B (en) Aspect-level emotion analysis method and system based on multi-head attention and graph convolution network
CN111462137A (en) Point cloud scene segmentation method based on knowledge distillation and semantic fusion
CN109086683A (en) A kind of manpower posture homing method and system based on cloud semantically enhancement
CN105550746A (en) Training method and training device of machine learning model
WO2021254114A1 (en) Method and apparatus for constructing multitask learning model, electronic device and storage medium
CN112101525A (en) Method, device and system for designing neural network through NAS
EP4350572A1 (en) Method, apparatus and system for generating neural network model, devices, medium and program product
CN110196928B (en) Fully parallelized end-to-end multi-turn dialogue system with domain expansibility and method
CN114997412A (en) Recommendation method, training method and device
CN104866310B (en) The processing method and system of knowledge data
CN109871809A (en) A kind of machine learning process intelligence assemble method based on semantic net
CN115017178A (en) Training method and device for data-to-text generation model
CN114528898A (en) Scene graph modification based on natural language commands
CN109657794A (en) A kind of distributed deep neural network performance modelling method of queue based on instruction
CN111738435A (en) Online sparse training method and system based on mobile equipment
CN112528108A (en) Model training system, gradient aggregation method and device in model training
CN111935005B (en) Data transmission method, device, processing equipment and medium
CN112748953B (en) Data processing method and device based on neural network model and electronic equipment
CN108053033A (en) A kind of function calling sequence generation method and system
CN112132281A (en) Model training method, device, server and medium based on artificial intelligence
CN117829242B (en) Model processing method and related equipment
CN116912629A (en) General image text description generation method and related device based on multi-task learning
WO2023122854A1 (en) Data processing method and apparatus
WO2022127603A1 (en) Model processing method and related device
CN109711543A (en) A kind of restructural depth confidence network implementations system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant