CN117077739A

CN117077739A - Model data processing method, device, equipment and medium

Info

Publication number: CN117077739A
Application number: CN202211743583.8A
Authority: CN
Inventors: 程雅慧
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-11-17

Abstract

The application discloses a model data processing method, a device, equipment and a medium, wherein when a model is compressed, model data of the model to be processed is firstly obtained, then compression parameters of a first network layer in at least two network layers in the model to be processed are obtained, compression parameters of each network layer except the first network layer are determined according to the compression parameters of the first network layer, so that the compression parameters of each network layer in the model to be processed are obtained, and finally the model data is compressed based on the compression parameters of each network layer. When the compression parameters of each network layer except the first network layer are determined, the compression parameters of the first network layer can be directly determined according to the compression parameters of the first network layer, the operation result of the first network layer is not required to be acquired, and the compression parameters are determined based on the operation result, so that the memory consumption of the model compression process is greatly reduced. Meanwhile, a large number of intermediate operation processes are not needed, so that the operation time of model compression is effectively saved, and the overall efficiency of model compression is improved.

Description

Model data processing method, device, equipment and medium

5 technical field

The present disclosure relates generally to the field of computers, and more particularly, to a method, apparatus, device, and medium for model data processing.

Background

With the development of artificial intelligence, neural network models are widely used in various fields. In the related art, an academic model is generally constructed according to the requirements of the model, the model is trained to reach the expected precision, and then the trained model is deployed on equipment with lower power consumption, so that industrial deployment is realized. In particular applications, to enable a trained model to be better deployed on low power devices, the model is typically trimmed and compressed to reduce the size of the model and reduce inference delays.

The existing compression tool can achieve the purpose of reducing the size of a model, but when the existing compression tool performs parameter calculation on a network layer without compression parameters, the middle value after calculation of each network layer, for example, a convolution result of a convolution layer is usually saved, and then the compression parameters are calculated according to the middle value, so that the whole compression process occupies a large amount of memory, and the whole compression efficiency is affected.

Disclosure of Invention

In view of the foregoing drawbacks or shortcomings of the prior art, it is desirable to provide a method, apparatus, device, and medium for model data processing that can greatly improve the efficiency of model compression.

In a first aspect, an embodiment of the present application provides a method for processing model data, including: obtaining model data of a model to be processed, wherein the model data are used for representing at least two network layers included in the model to be processed and execution logic between the at least two network layers, and the network layers are used for processing data input into the model to be processed;

Acquiring compression parameters of a first network layer in the at least two network layers, and determining compression parameters of each network layer except the first network layer according to the compression parameters of the first network layer; the compression parameter is used for indicating a network module to be deleted in the network layer;

and carrying out compression processing on the model data based on the compression parameters of each network layer to obtain a compression result of the model to be processed.

In a second aspect, an embodiment of the present application provides a model data processing apparatus, including:

the first acquisition module is used for acquiring model data of a to-be-processed model, wherein the model data are used for representing at least two network layers included in the to-be-processed model and execution logic between the at least two network layers, and the network layers are used for processing data input into the to-be-processed model;

the second acquisition module is used for acquiring the compression parameters of a first network layer in the at least two network layers, and determining the compression parameters of each network layer except the first network layer according to the compression parameters of the first network layer; the compression parameter is used for indicating a network module to be deleted in the network layer;

And the compression module is used for carrying out compression processing on the model data based on the compression parameters of each network layer to obtain a compression result of the model to be processed.

In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a method as described in the embodiment of the present application when the program is executed by the processor.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in embodiments of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements a method as described in embodiments of the present application.

Therefore, in the model data processing method provided by the embodiment of the application, when the model is compressed, firstly, a model file of a model to be processed is obtained, then, compression parameters of a first network layer in at least two network layers in the model to be processed are obtained, then, the compression parameters of each network layer except the first network layer are determined according to the compression parameters of the first network layer, so as to obtain the compression parameters of each network layer in the model to be processed, and finally, compression processing is carried out on the model data based on the compression parameters of each network layer. When the compression parameters of each network layer except the first network layer are determined, the compression parameters of the first network layer can be directly determined according to the compression parameters of the first network layer, the operation result of the first network layer is not required to be acquired, and the compression parameters are determined based on the operation result, so that the memory consumption of the model compression process is greatly reduced. Meanwhile, a large number of intermediate operation processes are not needed, so that the operation time of model compression is effectively saved, and the overall efficiency of model compression is improved.

In addition, the compression parameters of each network layer except the first network layer are determined based on the compression parameters of the first network layer, the compression parameters of other network layers can be determined only by providing the compression parameters for a specific model structure, and therefore the method is universally applicable to various network models, such as classification models, detection models and the like, and the universality of model compression is effectively improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:

FIG. 1 shows an implementation environment architecture diagram of a model data processing method provided by an embodiment of the present application;

FIG. 2 is a flow chart illustrating a method for model data processing according to an embodiment of the present application;

FIG. 3 illustrates a compression parameter determination schematic diagram of a normalization layer provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of determining compression parameters of an activation function layer according to an embodiment of the present application;

FIG. 5 is a flow chart of a model data processing method according to another embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a first network layer according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating the principle of resolving conflicts under logic according to an embodiment of the present application;

FIG. 8 is a flow chart of a model data processing method according to another embodiment of the present application;

FIG. 9 is a schematic diagram of determining compression parameters based on split logic according to an embodiment of the present application;

FIG. 10 is a schematic diagram of determining compression parameters based on summing logic according to one embodiment of the present application;

FIG. 11 is a schematic diagram of determining compression parameters of other network layers according to an embodiment of the present application;

FIG. 12 is a schematic diagram of determining compression parameters of other network layers according to another embodiment of the present application;

FIG. 13 is a schematic diagram of determining compression parameters of other network layers according to another embodiment of the present application;

FIG. 14 is a schematic diagram of determining compression parameters of other network layers according to another embodiment of the present application;

FIG. 15 is a schematic diagram of determining an input tensor according to an embodiment of the present application;

FIG. 16 is a block diagram showing a structure of a model data processing apparatus provided by an embodiment of the present application;

fig. 17 shows a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.

Detailed Description

The application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the application are shown in the drawings.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

The specific implementation environment of the model data processing method provided by the application is shown in fig. 1. Fig. 1 shows an implementation environment architecture diagram of a model data processing method according to an embodiment of the present application.

As shown in fig. 1, the implementation environment architecture includes: a terminal device 101 and a server 102.

The terminal device 101 is configured to provide an interactive interface, so that a user can upload a model file of a model to be processed, and select a compression policy.

The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, a content delivery network (content delivery network, CDN), basic cloud computing services such as big data and an artificial intelligence platform.

The terminal device 101 and the server 102 are directly or indirectly connected by wired or wireless communication. Alternatively, the wireless network or wired network described above uses standard communication techniques and/or protocols. The network is typically the Internet, but may be any network including, but not limited to, a local area network (Local Area Network, LAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks.

The model data processing method provided by the application can be implemented by a model data processing device, and the model data processing device can be installed on a terminal device or a server.

In order to further explain the technical solution provided by the embodiments of the present application, the following details are described with reference to the accompanying drawings and the detailed description. Although embodiments of the present application provide method operational instruction steps as illustrated in the following embodiments or figures, more or fewer operational instruction steps may be included in the method, either on a regular or non-inventive basis. In steps where there is logically no necessary causal relationship, the execution order of the steps is not limited to the execution order provided by the embodiments of the present application. The methods may be performed sequentially or in parallel as shown in the embodiments or the drawings when the actual processing or the apparatus is performed.

Referring to fig. 2, fig. 2 is a flow chart illustrating a model data processing method according to an embodiment of the application. As shown in fig. 2, the method includes:

and 201, obtaining model data of a to-be-processed model, wherein the model data are used for representing at least two network layers included in the to-be-processed model and execution logic between the at least two network layers, and the network layers are used for processing data input into the to-be-processed model.

The model data of the model to be processed can be obtained by uploading configuration items of the compression tool by a user. The model data of the model to be processed may be a data file of the model to be processed, the data file having recorded at least two network layers and execution logic between the at least two network layers comprised by the model to be processed. It is understood that the network layer of the model may be considered as an algorithm module in the model, which is used to operate on the data input to the model to obtain the processing result of the model. That is, the model may include a plurality of different network layers, and execution logic is provided between the different network layers, and after data is input into the model, the data is operated by algorithms of the different network layers, so that a processing result of the model may be obtained. For example, the image processing model may include: a feature extraction network and a classification prediction network. The feature extraction network is used for extracting features of the image data to obtain feature data corresponding to the original image, the classification prediction network is used for scoring and classifying the feature data, and the execution logic between the feature extraction network and the classification prediction network can be used for classifying the feature data into a group, namely, the obtained feature data is input into the classification prediction network in a group of data.

In the embodiment of the application, the model data can be used for recording the network layers of the model and execution logic among the network layers. For example, the model data of the model to be processed includes at least two network layers and execution logic between the two network layers that the model to be processed includes.

The execution logic includes, but is not limited to, operation logic between two adjacent network layers, such as superposition, splitting, propagation, etc., and operation logic between modules in the two network layers, such as fusion, splicing, etc. It should also be appreciated that the network layer in the model to be processed is then used to process data input to the model to be processed, such as convolution processing of image data, or vector conversion of text data, as well as other operations to process input data or feature data.

In one possible embodiment, a model file for the model to be processed may be obtained using a torch. Fx toolkit published by PyTorch. Where torch. Fx is a system for capturing and converting a PyTorch program. The method mainly comprises three structural blocks: symbol capture (symbol tracker), intermediate representation (intermediate representation), and Python code generation (Python code generation).

The method comprises the steps of capturing data information in a model file through a symbol capturing module after a model file of a model to be processed uploaded by a user is obtained, obtaining graph data (graph) of the model to be processed, then performing a series of data processing operations on the graph data of the model to be processed through an intermediate representation module, for example, performing data compression processing on the model to be processed through the model data processing method provided by the application to obtain model data of a target model, and finally encoding the model data of the target model through a Python code generating module to generate a Python code file of the model to be processed, namely, obtaining a code file of the target model.

According to the embodiment of the application, the data file of the target model is packaged by using the torch. Fx tool, so that the code invasion can be further reduced, and the call can be completed only by adding two extra lines of codes. That is, when the model compression method provided by the application is called, the research and development personnel can realize the calling only through codes of two lines, and the large code information amount is not required to be memorized and stored, so that the learning burden of the research and development personnel can be effectively reduced.

202, obtaining compression parameters of a first network layer in at least two network layers, and determining compression parameters of each network layer except the first network layer according to the compression parameters of the first network layer, wherein the compression parameters are used for indicating network modules to be deleted in the network layers.

It should be noted that, in the embodiment of the present application, at least two network layers in the model to be processed are divided into two types, namely, a first network layer and a network layer except the first network layer. The first network layer is a network layer capable of directly acquiring initial compression parameters through a compression strategy, and the network layers except the first network layer need to indirectly acquire the compression parameters through the compression parameters of the first network layer, namely, the network layer incapable of directly acquiring the initial compression parameters through the compression strategy.

It should also be noted that the compression parameter is used to indicate the network module to be deleted in the network layer.

That is, the compression parameter may be a network module whose mask (mask) is 0 in unstructured clipping, or may be a network module deleted in structured clipping. The embodiment of the application can be used for unstructured cutting of the model to be processed and also can be used for structured cutting of the model to be processed. The network module to be deleted may be a network element included in a network layer, for example, a convolution kernel in a convolution layer, or a neuron in a neural network layer, etc., and the type of the network module to be deleted is not particularly limited in the present application. In an embodiment of the present application, the compression parameters may be represented by a mask.

In one possible embodiment, the initial compression parameters of the first network layer are calculated based on a compression policy. The method includes the steps of obtaining a compression policy, and performing data processing on data information of a first network layer in a model file based on the compression policy to obtain initial compression parameters of the first network layer.

It should be noted that, the compression policy may be a policy set by a user through configuration items provided by the compression tool, and the compression policy may include a compression target and an operation rule, where the compression target may be a target result of final clipping, for example, the compression rate is 50%, i.e. a network module of 50% is clipped, in other words, a network module of half of the model to be processed is reserved, so as to reduce the size of the model. The operation rule is to select the network module to be cut according to what rule, for example, the weight value, that is, the deleted network module is selected according to the weight value, the weight values are usually ordered according to a certain order, for example, the positive order or the reverse order, and then the cut network module is determined from the weight value sequence according to the percentage of the compression rate.

For example, when the compression target is the compression rate of 50% and the operation rule is the weight value, the compression policy orders the weight values of the network modules in each network layer according to a certain order, determines the number of the network modules in each network layer, selects a half of the total number of the network layers, and uses the network module corresponding to the half of the weight value in the order as the network module to be cut. The network module corresponding to the second half of the numerical values in the weight value sequence is used as the network module to be cut, and is the initial compression parameter of the first network layer obtained based on the compression strategy.

In one possible embodiment, the compression policy may be obtained through an interface. The model compression tool may provide a computer_mask interface to obtain a compression policy set by a user, and calculate an initial compression parameter of each first network layer in the to-be-processed model based on the compression policy, to obtain the initial compression parameter of the first network layer. For example, the user inputs the compression target and the operation rule as parameters of the computer_mask interface, and the compression tool calculates the initial compression parameters of each first network layer based on the compression target and the operation rule acquired by the interface, so as to obtain the initial compression parameters of the first network layer.

In a possible embodiment, the compression parameter of the first network layer may be an initial compression parameter of the first network layer, or may be a compression parameter that is finally determined by modification or the like based on the initial compression parameter of the first network layer and may be used for final clipping.

It should be understood that, because there is execution logic between at least two network layers of the model to be processed, and part of the execution logic requires that a specific constraint condition is satisfied between two adjacent network layers, when the initial compression parameter of the first network layer cannot satisfy the constraint condition of the execution logic, the initial compression parameter of the first network layer needs to be corrected, so that the compression parameter of the first network layer satisfies both the clipping requirement and the constraint condition of the execution logic.

In one possible embodiment, determining the compression parameters of each network layer except the first network layer based on the compression parameters of the first network layer includes: taking the compression parameter of the first network layer as the compression parameter of the next network layer of the first network layer.

That is, after the compression parameter of the first network layer is obtained, the compression parameter of the first network layer can be directly used as the compression parameter of the next network layer of the first network layer, without performing intermediate operation according to the compression parameter of the first network layer.

It should be understood that since the type of each network layer except the first network layer is not the same, the manner in which the compression parameter of the first network layer is taken as the compression parameter of the next network layer of the first network layer is also not the same.

Illustratively, when the next network layer is the normalization layer, taking the compression parameter of the first network layer as the compression parameter of the next network layer of the first network layer includes: and carrying out forward propagation processing on the compression parameters of the first network layer to obtain compression parameters corresponding to the normalized layer structure.

It should be noted that, the normalization layer generally scales the value of a certain column of features in the array data to between 0 and 1, so the input of the normalization layer is a 1×n array, and the compression parameters of the normalization layer should also be converted into n-dimensional data input to the normalization layer based on the normalized characteristics, that is, the compression parameters of the first network layer are regarded as compression parameters of multiple dimensions in the normalization layer.

Further, in the deep learning field, the forward propagation is to take the output of the upper layer as the input of the lower layer, when the compression parameter is calculated, the output of the upper layer is the compression parameter of the first network layer, and the input is the compression parameter of the lower network layer, so that the compression parameter of the first network layer is subjected to the forward propagation treatment, that is, the compression parameter of the first network layer is taken as the compression parameter of the normalization layer.

As shown in fig. 3, the first network layer includes a plurality of network modules for processing multidimensional data, where a solid box is a network module not corresponding to a compression parameter, that is, a network module remaining after compression processing, and a hollow box is a network module to be deleted corresponding to the compression parameter. Because the normalization layer (bn) has only one channel, but the corresponding data information may be multidimensional data, the first network layer corresponding to the normalization layer is a convolution layer (conv 2 d) above the first network layer, and the compression parameter corresponding to the convolution layer is [1,0,1,0], so that the compression network structure obtained by the compression processing of the convolution layer on the input data (input) input to the convolution layer is [1,0,1,0], the compression network structure output by the convolution layer is taken as the input of the normalization layer, namely, the forward propagation of the normalization layer, at this time, the compression parameter corresponding to the normalization layer is the compression structure indicated by the dotted line in the normalization layer, and the structure is the same as the compression parameter of the first network layer, namely, the corresponding compression parameter is configured for the corresponding position of the multidimensional in the single channel of the normalization layer.

The channel-by-channel convolution layer (depth-wise convolution) also has the same structural characteristics as the normalization layer, so that the compression parameters of the channel-by-channel convolution layer can be obtained by adopting a mode of performing forward propagation processing on the compression parameters of the first network layer.

In addition, when the next network layer is an active layer, taking the compression parameter of the first network layer as the compression parameter of the next network layer of the first network layer includes: the correction node is inserted before the activation layer, and data input to the activation layer is corrected to correspond to compression parameters of the first network layer.

It should be noted that the activation function is a nonlinear function, and is typically disposed at a connection layer between two neurons, for enhancing the expression capability of the neural network. Illustratively, as shown in fig. 4, an active layer is disposed between two convolution layers, and the active function used is a ReLU function. Wherein the ReLU function features an output proportional to the input greater than 0, and an input channel less than 0 is set to 0. Therefore, in order to avoid the data structure of the active layer output due to the input value change, as shown in fig. 4 (a), the number of channels passing through the active layer (ReLU) is reduced by two compared with the normalization layer (bn), and therefore, in order to avoid this phenomenon, the present application inserts a correction node after the normalization layer, as shown in fig. 4 (b), to ensure that the input data of the channels corresponding to the compression parameters of the first network layer is greater than or equal to 0, thereby ensuring that the compression parameters corresponding to the active layer are consistent with the compression parameters of the first network layer, that is, the compression parameters of the first network layer are taken as the compression parameters of the active layer.

Optionally, a correction node may be further disposed at each layer after the first network layer corresponding to the active layer, for example, a corresponding correction node is also disposed after the first network layer, so as to ensure that the compression parameter of the active layer is consistent with the compression parameter of the first network layer through successive correction.

Wherein the ReLU function may also include variants of other ReLU functions, such as ReLU6 functions, etc.

And 203, compressing the model data based on the compression parameters of each network layer to obtain a compression result of the model to be processed.

In the embodiment of the application, the network module identified by the compression parameter is preferably deleted when the network layer is compressed.

That is, after the model file of the to-be-processed model is obtained, at least two network layers in the to-be-processed model are divided into two types of network layers based on the model file of the to-be-processed model, namely, a first network layer and a network layer except for the first network layer, then the compression parameter of the first network layer is obtained, when the initial compression parameter of the first network layer does not need to be corrected, the compression parameter of the first network layer is the initial compression parameter of the first network layer, and when the initial compression parameter of the first network layer needs to be corrected, the compression parameter of the first network layer is the compression parameter after the initial compression parameter of the first network layer is corrected. After the compression parameters of the first network layer are obtained, the compression parameters of each network layer outside the first network layer are determined based on the compression parameters of the first network layer, so that the compression parameters of each network layer in the to-be-processed model are obtained, and then the corresponding network module is deleted according to the compression parameters for each network layer to realize the compression processing of the network process, so that the target model after the to-be-processed model is compressed is obtained.

Therefore, in the model data processing method provided by the embodiment of the application, when the model is compressed, firstly, a model file of a model to be processed is obtained, then, compression parameters of a first network layer in at least two network layers in the model to be processed are obtained, then, the compression parameters of each network layer except the first network layer are determined according to the compression parameters of the first network layer, so that the compression parameters of each network layer in the model to be processed are obtained, and finally, the compression processing is carried out on the network layers based on the compression parameters of each network layer. When the compression parameters of each network layer except the first network layer are determined, the compression parameters of the first network layer can be directly determined according to the compression parameters of the first network layer, the operation result of the first network layer is not required to be acquired, and the compression parameters are determined based on the operation result, so that the memory consumption of the model compression process is greatly reduced. Meanwhile, a large number of intermediate operation processes are not needed, so that the operation time of model compression is effectively saved, and the overall efficiency of model compression is improved.

In another possible embodiment, a policy to modify the initial compression parameters of the first network layer is also provided.

Illustratively, as shown in fig. 5, the compression parameters of a first network layer of the at least two network layers are obtained, including the following features:

501, acquiring an initial compression parameter of a first network layer, and determining a compression network structure of the first network layer based on the initial compression parameter.

It should be noted that, because the compression parameter is used to indicate the network module to be deleted in the network layer, after the initial compression parameter of the first network layer is obtained, the network module to be deleted corresponding to the initial compression parameter of the first network layer may be determined according to the initial compression parameter. The network structure formed by the network modules to be deleted, which are not identified as corresponding to the initial compression parameters, in the first network layer is the compressed network structure obtained by the first network layer based on the initial compression parameters.

As shown in fig. 6, the structure corresponding to the ellipse is a first network layer, the solid boxes included in the ellipse are network modules not corresponding to the compression parameters, the network structure formed by the solid boxes is the compressed network structure obtained based on the initial compression parameters of the first network layer, and the hollow boxes are network modules to be deleted corresponding to the compression parameters.

502 determining whether there is a conflict between the compressed network structure and other network layers of the model to be processed, the conflict including a logical conflict between the compressed network structure and the other network layers, or a conflict between the compressed network structure and preset logic.

It should be noted that, the execution logic between at least two network layers in the model is generally a compressed network structure determined by the initial compression parameters, and the loss of a part of the network module structure causes a problem that the execution logic between the at least two network layers cannot be continuously executed, i.e. a logic conflict.

503, if there is a conflict between the compressed network structure and other network layers of the to-be-processed model, correcting the initial compression parameters of the first network layer until there is no conflict between the compressed network structure and other network layers of the to-be-processed model, and obtaining the compression parameters of the first network layer.

That is, when there is a conflict between the compressed network structure corresponding to the first network layer and other network layers of the to-be-processed model, it is stated that the compression according to the initial compression parameters of the first network layer affects the overall operation of the to-be-processed model, and the expected analysis effect of the model cannot be achieved, so that the initial compression parameters of the first network layer need to be corrected, so that the compressed network structure can perform effective logic operation with the other network layers.

In a possible embodiment, after correcting the initial compression parameter of the first network layer to obtain the compression parameter of the first network layer, the method further includes: and determining the compression parameters of the target network layer according to the compression parameters of the first network layer, wherein the target network layer comprises a target compression network structure obtained by the first network layer based on the compression parameters, or a network structure obtained after the compression network structure executes preset logic.

That is, after the compression parameters of the first network layer are obtained, the compression parameters of the target network layer obtained by the first network layer are further determined according to the compression parameters of the first network layer. It should be understood that the target network layer is a compressed network structure of the first network layer or a network structure obtained after the compressed network structure executes a preset logic, in other words, the target network is a network layer obtained by the first network layer through operation, and the target network layer may be regarded as a sub-network layer of the first network layer, and in summary, the target network layer is not the "network layer except the first network layer" referred to above.

It should be understood that, in the embodiment of the present application, the compression parameters of the target network layer are determined to ensure that the compression parameters of each network layer of the to-be-processed model are obtained, so as to ensure that the to-be-processed model can be compressed to obtain the target model.

In one possible embodiment, the preset logic includes split logic when the conflict may be a conflict between the compressed network structure and the preset logic.

It should be noted that the execution logic between at least two network layers of the model to be processed includes logic that the two network layers perform logic operations with each other, and logic that the previous network layer executes for further operations. In the embodiment of the present application, the preset logic executed by the compressed network structure is the execution logic of the first network layer based on the own network structure, that is, the splitting logic, such as cutting (split) and blocking (chunk), etc. The cutting and blocking are to cut the network layer, and parameters in the blocking are usually to cut the network layer into several parts, and constants in the cutting are several parts.

That is, the model file of the model to be processed may include splitting logic for equally dividing the network structures of the first network layer, so as to expect that the plurality of network modules in the first network layer are equally divided to obtain a plurality of network structures. However, after the first network layer determines the compressed network structure based on the initial compression parameters, the network module is missing, so that the compressed network structure corresponding to the first network layer easily includes a single number of network modules, and the compressed network structure of the first network layer cannot equally divide the remaining network modules when executing the splitting logic, so that a conflict occurs.

For example, as shown in fig. 7, the execution logic corresponding to the first network layer included in the model file of the model to be processed is split logic, for example, the execution logic corresponding to the first network layer is divided into 2 blocks on average, that is, chunk=2. The first network layer performs splitting logic when the first network layer is not compressed, and should split network modules at positions 0,1 and 2 of the first network layer into a first network structure and network modules at positions 3, 4 and 5 of the first network layer into a second network structure. However, the initial compression parameter of the first network layer is [0,1,0,1,0,1], that is, the network modules at the 0 th, 2 nd and 4 th positions are the network modules to be deleted, that is, the network modules at the 0 th and 2 nd positions in the first network structure under the original split logic are the network modules to be deleted, that is, only 1 network module remains in the first network structure, and the network module at the 1 st position in the second network structure under the original split logic is the network module to be deleted, that is, 1 network module remains in the second network structure. It can be seen that the first network module cannot perform the logical operation of splitting the logic to average the scores under the influence of the initial compression parameters.

In one possible embodiment, as shown in fig. 8, the correcting the initial compression parameters of the first network layer includes:

801, determining a splitting result obtained after the compression network structure executes the splitting logic based on the splitting logic and the compression network structure, wherein the splitting result is used for representing at least two network structures after the compression network structure is split and initial compression parameters of each network structure.

That is, in order to correct the initial compression parameters of the first network layer, it is necessary to understand the splitting result generated by the compressed network structure determined by the first network layer based on the initial compression parameters when the splitting logic is executed. The initial compression parameter of each network structure in the splitting result is a network module to be deleted, which is included in each network structure under the initial compression parameter of the first network layer, that is, a mapping of the initial compression parameter of the first network layer in each network structure in the splitting result.

For example, as shown in fig. 7, the initial compression parameter corresponding to the first network layer is [0,1,0,1,0,1], the first network layer executes the splitting logic with the content of chunk=2 based on the compression network structure determined by the initial compression parameter, and the obtained splitting result is a first network structure and a second network structure, where the initial compression parameter corresponding to the first network structure is [0,1,0], and the initial compression parameter corresponding to the second network structure is [1,0,1].

And 802, determining the minimum value in initial compression parameters corresponding to at least two network structures, wherein the minimum value is used for representing the minimum number of network modules to be deleted in the at least two network structures.

It should be noted that, in order to enable the compressed network structures corresponding to the first network layer to be equally divided, the number of network modules to be deleted in each network structure in the splitting result needs to be equal, that is, in the case that the number of network modules to be deleted in each network structure is equal, the number of remaining network modules in each network structure is also equal, so that the purpose of equally dividing the remaining network modules (remaining network modules in the compressed network structure) can be satisfied.

It should be further noted that, since the initial compression parameters of the first network layer are determined according to the compression policy set by the user, that is, all the network modules to be deleted are already included. In other words, if the deletion amount is increased to a small value, the influence on the model accuracy is caused, and the initial compression parameter is also required to be recalculated, that is, the erasability of the network module that is not deleted originally is calculated, which increases the calculation difficulty and the calculation amount. Therefore, the application proposes to determine the minimum value of the initial compression parameters corresponding to at least two network structures, namely, by cutting in a mode of reducing the number of network modules to be deleted, the equivalent cutting of each network structure in the splitting result can be ensured, the accuracy of the model to be processed after compression can be ensured, and meanwhile, the calculated amount of the compression parameter calculation process is not additionally increased.

803, correcting the initial compression parameters of the first network layer based on the minimum value to obtain the compression parameters of the first network layer.

In one possible embodiment, the initial compression parameters corresponding to the at least two network structures are corrected based on the minimum value to obtain compression parameters of the at least two network structures, and the compression parameters of the at least two network structures are spliced to obtain the compression parameters of the first network layer.

That is, after determining the minimum number of network modules to be deleted in at least two network structures, the initial compression parameters of at least two network structures may be modified, that is, the minimum number to be deleted and the network modules corresponding to the minimum number to be deleted are determined according to a preset rule for the modules to be deleted in each network structure, so as to obtain the compression parameters of at least two network structures. The determining the preset rule of the to-be-deleted module may include sequentially determining the network modules corresponding to the minimum to be deleted according to the position sequence of the to-be-deleted network modules in the initial compression parameter, or randomly determining the network modules corresponding to the minimum to be deleted from the to-be-deleted network modules in the initial compression parameter, or calculating the network modules corresponding to the minimum to be deleted in the network structure according to the compression policy.

Further, since the at least two network structures are obtained by equally dividing the first network layer, that is, the first network layer can be obtained by splicing the at least two network structures, and correspondingly, the compression parameters of the at least two network structures can be obtained by splicing the compression parameters of the first network layer.

For example, as shown in fig. 9, after determining that the initial compression parameter corresponding to the first network structure is [0,1,0], and the initial compression parameter corresponding to the second network structure is [1,0,1], the minimum number of network modules to be deleted in the first network structure is determined to be 2, and the minimum number of network modules to be deleted in the second network structure is determined to be 1, so that the minimum value of the initial compression parameters corresponding to at least two network structures may be determined to be 1, that is, the minimum number of network modules to be deleted in each network structure is determined to be 1. At this time, the network modules corresponding to the minimum number of to-be-deleted may be sequentially determined according to the position sequence of the network modules to be deleted, for example, 1 network module with the front position among the 2 network modules to be deleted in the first network structure is used as the network module to be deleted, so as to obtain the compression parameters [0, 1] of the first network structure. The second network structure itself has only one network module to be deleted, and therefore, the initial compression parameter of the second network structure is the compression parameter [1,0,1] of the second network structure. And splicing the compression parameters of the first network structure and the compression parameters of the second network structure to obtain the compression parameters of the first network layer [0,1,1,1,0,1].

In one possible embodiment, when the conflict is a logical conflict between the compressed network structure and the other network layer, adding logic exists between the compressed network structure and the other network layer, and correcting the initial compression parameter of the first network layer includes: acquiring initial compression parameters of other network layers, acquiring an intersection set of the initial compression parameters of the first network layer and the initial compression parameters of the other network layers, and taking the intersection set as the compression parameters of the first network layer and the compression parameters of the other network layers.

It should be noted that, since the compression network structure is determined by the first network layer based on the initial compression parameters, the compression network structure depends on the compression parameters of the first network layer, and when there is an additive logic conflict between the compression network structure and the other network layers, there is actually a conflict between the initial compression parameters of the first network layer and the initial compression parameters of the other network layers, so that the initial compression parameters of the first network layer and the initial compression parameters of the other network layers need to be corrected.

The adding logic is logic for performing addition-related operation on network modules at corresponding positions in two network layers. Therefore, when adding logic exists between the compressed network structure and other network layers, it is required to ensure that the compressed network structure corresponding to the first network layer matches the network module position of the compressed network structure corresponding to the other network layers.

Illustratively, as shown in FIG. 10, the initial compression parameters of the first network layer are [0,1,0,1], and the initial compression parameters of the other network layers with which the summing logic is performed are [0,1, 0]. Therefore, the network modules reserved by the first network layer are the network modules at the 1 st and 3 rd positions, and the network modules reserved by the other network layers are the network modules at the 1 st and 2 nd positions, and when the first network layer and the other network layers respectively take the compressed network structures to perform addition calculation, the 3 rd position of the first network layer and the 2 nd position of the other network layers cannot perform addition logical operation due to the different positions.

At this time, the initial compression parameters of the first network layer and the initial compression parameters of the other network layers are respectively acquired to obtain intersections. It should be appreciated that since the compression parameters are used to indicate the network modules to be deleted, the intersection of the initial compression parameters of the first network layer with the initial compression parameters of the other network layers is the network module in which parameters are required in both the first network layer and the other network layers. Taking fig. 9 as an example, the intersection of the initial compression parameters of the first network layer and the initial compression parameters of the other network layers is [0, 1], that is, only the network module at the 0 th position is the network module that needs to be deleted by both network layers, in order to ensure that the compressed network structure of the first network layer and the compressed network structure of the other network layers can perform the addition logic operation, the network modules that are not deleted by the other network layers in the first network layer are reserved, and the network modules that are not deleted by the first network layer in the other network layers are reserved, so as to obtain the compression parameters [0, 1].

In a possible embodiment, after determining the compression parameter of the first network layer and the compression parameter of the target network layer, the method further includes: and correcting the compression parameters of other network layers which have execution logic with the target network layer according to the compression parameters of the target network layer.

Taking fig. 10 as an example, the compression parameters of the first network layer (i.e., the compression parameters of the target network layer) and the compression parameters of other network layers with which the adding logic is performed are determined simultaneously, and at this time, the compression parameters of the other network layers with which the adding logic is performed are also corrected, and the compression parameters of the first network layer are taken as the compression parameters of the other network layers.

In a possible embodiment, the other network layer may perform a logic operation with the target network layer obtained after the compression network structure performs the preset logic, at this time, because the compression parameters of the target network layer relate to the corresponding compression parameters of the first network layer and the execution logic of the first network layer, if the two are corrected together based on the execution logic between the target network layer and the other network layers, the compression parameters of the first network layer are affected again, even if the target network layer is a splitting result of the first network layer, the compression parameter calculation logic where the other splitting result is located is also affected, the influence range is larger, and the generated calculation amount is larger, so the application corrects the compression parameters of the other network layers having the execution logic with the target network layer directly according to the compression parameters of the target network layer.

Illustratively, as shown in fig. 11, the target network layer is a split result of the first network layer, where the compression parameter of the first network layer is [0,1,0,1,0,1,1,0], and the compression parameter of the target network layer is [0,1,0,1]. At this time, the initial compression parameter of the other network layer performing the addition logic with the target network layer is [0,1, 0]. Obviously, there is a conflict between the compression parameter of the target network layer being [0,1,0,1] and the initial compression parameters of the other network layers being [0,1, 0], and the operation of the adding logic cannot be performed, however, the compression parameters of the target network layer and the first network layer have already completed their own compression parameter correction, and if the correction is performed based on the compression parameters of the target network layer and the initial compression parameters of the other network layers, the compression parameters of the first network layer and the compression parameters of the other target network layers will be affected. Therefore, the initial compression parameters of the other network layers are directly corrected to the compression parameters of the target network layer, that is, the initial compression parameters of the other network layers are corrected to the compression parameters [0,1, 0], and at this time, the other network layers can normally perform logic operation with the target network layer without increasing the calculation amount of the additional compression parameters.

In one possible embodiment, in determining the compression parameters of each network layer of the model to be processed, the initial compression parameters of the first network layer having the execution logic or the more complex execution logic itself may be modified among the plurality of first network layers in order of complexity before simplicity, that is, in addition to determining the compression parameters of the first network layer, and then the other first network layers connected thereto, including but not limited to the first network layer thereon, may be modified according to the determined compression parameters of the first network layer.

As shown in fig. 12, the model to be processed includes 4 network layers, where the first layer and the last layer are first network layers, the second layer and the third layer are network layers except the first network layer, and both the first network layers are network layers of packet convolution (group), where the first network layer convolves into 1 group (group=1), and the second first network layer convolves into 2 groups (group=2). Obviously, the division of the first network layer into 1 group is not different from the non-operation, but the grouping of the second first network layer is more complex, so that the initial compression parameters of the second first network layer need to be corrected first, and then the initial compression parameters of the first network layer need to be corrected according to the compression parameters of the second first network layer.

In one possible embodiment, as shown in fig. 13, the initial compression parameter of the first network layer is [0,1,0,1,0,1], the compression parameter obtained by modification of the second first network layer is [0,1,1,1,0,1], where, in order for the first network layer to eventually enable the second first network layer to obtain the current compression parameter, the compression parameter of the first network layer may be directly used as the compression parameter of the first network layer [0,1,1,1,0,1].

In one possible embodiment, as shown in fig. 14, the initial compression parameter of the first network layer is [0,1,0,1,0,1], the compression parameter of the second first network layer after the correction is [0,0,0,1,1,1], where the initial compression parameter of the first network layer is taken as the compression parameter of the first network layer because the corrected compression parameter of the second first network layer is substantially obtained by correcting 2 packets in the packet convolution of the second first network layer into 1 packet, and the packet effect of the second network layer is not affected even if the initial compression parameter of the first network layer is kept unchanged. Meanwhile, in order to avoid that the second first network layer generates errors due to the fact that 1 packet does not have a network module after compression, the packet parameters of the second first network layer can be modified to 1, namely, group=2 is modified to group=1.

In another possible embodiment, since for one model, it is necessary to determine the input tensor of the model, i.e., the structure (shape) information of the input multidimensional array, in addition to the network structure of the model. Because the symbol capturing module of the torch. Fx can capture the structural characteristics of the to-be-processed model when no input tensor exists, that is, when the model file of the to-be-processed model is obtained, the user does not need to configure the characteristic information of the input tensor, and therefore, when the to-be-processed model is compressed to obtain the target model, the input tensor of the target model also needs to be confirmed to obtain the attribute information of the complete target model.

Exemplary, the compression processing is performed on the network layer based on the compression parameters of each network layer to obtain a plurality of compression network layers, the inverse gradient transfer processing is performed on the plurality of compression network layers to obtain the input tensor of the model to be processed, and the target model is determined according to the input tensor of the model to be processed and the plurality of compression network layers

It should be noted that inverse gradient transfer is a process of inversely determining an input (input) based on an output (output) of a model. In the embodiment of the application, inverse gradient transfer is performed based on a plurality of compression network layers to determine the input tensor (input tensor) of the compressed object model. Specifically, inverse gradient transfer is a method of recursively calculating gradients of an expression using a chain law.

The PyTorch used in the embodiment of the application has an automatic derivation mechanism, and is not needed to be realized through additional codes or additionally store intermediate values. In other words, the embodiment of the application can not need the user to set the input tensor of the model to be processed (the target model) in advance, and can also directly obtain the input tensor through calculation without increasing the memory consumption.

That is, when the compression parameters of the first network layer are obtained and the compression parameters of each network layer except the first network layer are obtained based on the compression parameters of the first network layer, the compression parameters of each network layer of the model to be processed are determined. Then, deleting the network module corresponding to the compression parameter for each network layer to obtain the compressed network layer corresponding to each network layer, namely, obtaining the network layer obtained after compression. And based on execution logic among the plurality of compression network layers, performing inverse gradient transfer processing on the target model by utilizing an automatic derivation mechanism in the tool to obtain an input tensor of the model to be processed, and taking the input tensor of the model to be processed as the input tensor of the target model to obtain the target model.

As illustrated in fig. 15, the model to be processed includes two convolution layers and one normalization layer, wherein the two convolution layers are a first network layer, the normalization layer is a network layer except the first network layer, and the left arrow identifies a structure in which the compression parameters of the normalization layer are obtained from the first convolution layer (the first network layer) through a forward propagation process, so that when the compression result of the normalization layer is input to the second convolution layer as a single-channel input, the compression parameters of the second convolution layer are compression parameters corresponding to the convolution module, and not compression parameters corresponding to the multidimensional data in the convolution module, and therefore, the multidimensional form of the output data of the uncompressed convolution layer of the second convolution layer satisfies the compression parameters of the normalization layer, that is, the normalization layer further transfers the compression parameters of the multidimensional data to the second convolution layer through forward propagation. At this time, the network structure pointed by the dotted arrow is the final form of the compressed model to be processed.

The right arrow of the model to be processed is used for identifying the process of inverse gradient transfer, namely, inverse gradient transfer processing is carried out from the output of the second convolution layer, so that the multidimensional structure of the first convolution layer is obtained. It should be appreciated that in order for the input tensor to be effectively input to the model to be processed, the multidimensional structure of the input tensor should be consistent with the first convolution layer, i.e., the multidimensional structure of the first convolution layer is taken as the input tensor of the model to be processed.

In a possible embodiment, some operation layers for increasing the complexity of the network may be further included in part of the model to be processed, for example, a layer (channel shutdown) that breaks up the order between the multiple network modules in the network layer. Therefore, before the compression parameters of the whole network to be compressed are calculated, the operation can be deleted, and after the target model is obtained, the operation is inserted into the corresponding position in the target model, so that the logic requirement of the target model is met, and the calculation amount of the compression parameters is not influenced.

It should be noted that although the operations of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in that particular order or that all of the illustrated operations be performed in order to achieve desirable results.

In another embodiment, the model to be processed is an image processing model, such as an image recognition model, an image classification model, an image capture model, etc., and the application program can load such a model to effect processing of the image. For example, in a video application program providing a replacement background, if the actual background of a participant is collected and uploaded to a server, the server performs image processing clipping and replaces the background, so that the environment privacy which is not wanted to be revealed by the user is easily exposed on the network. Therefore, it is necessary to deploy an image processing model for person recognition and extraction on an application (or client). In some situations, the mobile terminal provided with the application program has limited operation capability and cannot support excessive occupation of the memory or the computing power by the model. In order to reduce the operation load of the mobile terminal while realizing image processing by using the model, data processing can be performed on model data (also referred to as a model file) of the image processing model, and on the premise of ensuring the image processing accuracy of the image processing model, the operation pressure brought to the terminal by the operation of the image processing model is reduced.

By way of example, the model data can be compressed, so that the occupation of the model to the memory or the computing power is reduced, and the purpose of reducing the computing pressure of the terminal is achieved. In a possible implementation manner, the acquired model data of the image processing model is used for representing at least two network layers included in the image processing model and execution logic between the at least two network layers, and the network layers are used for performing feature extraction and other processing on an image input to the image processing model. Then, the compression parameters of a first network layer in at least two network layers of the image processing model are acquired, and the compression parameters of each network layer except the first network layer are determined according to the compression parameters of the first network layer. The compression parameters are used for compressing the model data of the network layer, where the compression process can be understood as reducing the memory occupation of the model data or reducing the computational power consumption caused by the model operation. For example, some operations in the network layer may be deleted, and optionally, the compression parameters may indicate network modules in the network layer to be deleted, such as convolutional channels, etc.

Furthermore, the model data can be compressed based on the compression parameters of each network layer, so that a compressed result of the image processing model, namely a compressed target model, can be obtained. The target model has the same function as the image processing model and similar image processing precision, but the target model has model data far smaller than the image processing model, so that the client can deploy and call the target model on the terminal equipment to perform image processing, and a target image is obtained. In other words, the target model after data processing on the model data of the image processing model can be deployed on the terminal equipment, when the background replacement is needed to be carried out on the user, the application program calls the target model to carry out character recognition and extraction on the character image acquired by the terminal equipment, so that the requirement that the local image data is not uploaded is met, meanwhile, an image processing result with the precision similar to that of the original image processing model can be provided, more memory or calculation power is not occupied in the image processing process, and the problems of clamping of the mobile terminal and the like are not caused.

In one possible embodiment, the image processing model generally includes at least one convolution layer, a normalization layer, and an activation layer, where the at least one convolution layer may be continuous or may be spaced apart by the normalization layer and the activation layer. The convolution layer in the image processing model is a first network layer capable of calculating initial compression parameters based on a compression strategy, and the normalization layer and the activation layer are network layers incapable of calculating the initial compression parameters based on the compression strategy. There is because the normalization layer is typically a network layer that scales the values of the column features in a certain direction of the convolutional layer, and the activation layer is a nonlinear function that is placed after the convolutional layer to enhance the expressive power of the neural network. That is, the normalization layer and the activation layer are both located after the convolution layer, so that the embodiment of the application takes the compression parameter of the first network layer as the compression parameter of the next network layer of the first network layer, that is, takes the compression parameter of the convolution layer as the compression parameter of the normalization layer and the activation layer, so that data processing can be performed on the convolution layer, the normalization layer and the model data of the activation layer in the model data of the image processing model during compression, that is, part of channel data in the convolution layer and the normalization layer is deleted, and the form of the activation layer is ensured to be consistent with that of the compressed convolution layer, thereby reducing the operation pressure brought by the operation of the image processing model to the terminal, and ensuring the reliability of the compressed image processing model.

In one possible embodiment, when the next network layer is the normalization layer, taking the compression parameter of the first network layer as the compression parameter of the next network layer of the first network layer includes: and carrying out forward propagation processing on the compression parameters of the first network layer to obtain compression parameters corresponding to the normalized layer structure.

In one possible embodiment, when the next network layer is an active layer, taking the compression parameter of the first network layer as the compression parameter of the next network layer of the first network layer includes: the correction node is inserted before the activation layer, and data input to the activation layer is corrected to correspond to compression parameters of the first network layer.

Illustratively, as shown in fig. 4, an active layer is disposed between two convolution layers, and the active function used is a ReLU function. Wherein the ReLU function features an output proportional to the input greater than 0, and an input channel less than 0 is set to 0. Therefore, in order to avoid the data structure of the active layer output due to the input value change, as shown in fig. 4 (a), the number of channels passing through the active layer (ReLU) is reduced by two compared with the normalization layer (bn), and therefore, in order to avoid this phenomenon, the present application inserts a correction node after the normalization layer, as shown in fig. 4 (b), to ensure that the input data of the channels corresponding to the compression parameters of the first network layer is greater than or equal to 0, thereby ensuring that the compression parameters corresponding to the active layer are consistent with the compression parameters of the first network layer, that is, the compression parameters of the first network layer are taken as the compression parameters of the active layer.

In one possible embodiment, multiple convolution layers in the image processing model may each be capable of determining initial compression parameters based on a compression policy. In this case, in addition to determining the compression parameters of the first network layer first, the initial compression parameters of the first network layer having the execution logic or the more complex execution logic may be corrected among the plurality of first network layers first, and then other first network layers connected thereto, including but not limited to the first network layer thereon, may be corrected according to the determined compression parameters of the first network layer.

As shown in fig. 11, the to-be-processed model includes 4 network layers, wherein the first layer and the last layer are first network layers, the second layer and the third layer are network layers except the first network layer, and the two first network layers are network layers of packet convolution (group), wherein the first network layer convolves in 1 group (group=1), and the second first network layer convolves in 2 groups (group=2). Obviously, the division of the first network layer into 1 group is not different from the non-operation, but the grouping of the second first network layer is more complex, so that the initial compression parameters of the second first network layer need to be corrected first, and then the initial compression parameters of the first network layer need to be corrected according to the compression parameters of the second first network layer.

In one possible embodiment, as shown in fig. 12, the initial compression parameter of the first network layer is [0,1,0,1,0,1], the compression parameter obtained by modification of the second first network layer is [0,1,1,1,0,1], where, in order for the first network layer to eventually enable the second first network layer to obtain the current compression parameter, the compression parameter of the first network layer may be directly used as the compression parameter of the first network layer [0,1,1,1,0,1].

In one possible embodiment, as shown in fig. 13, the initial compression parameter of the first network layer is [0,1,0,1,0,1], the compression parameter of the second first network layer after the correction is [0,0,0,1,1,1], where the initial compression parameter of the first network layer is taken as the compression parameter of the first network layer because the corrected compression parameter of the second first network layer is substantially obtained by correcting 2 packets in the packet convolution of the second first network layer into 1 packet, and the packet effect of the second network layer is not affected even if the initial compression parameter of the first network layer is kept unchanged. Meanwhile, in order to avoid that the second first network layer generates errors due to the fact that 1 packet does not have a network module after compression, the packet parameters of the second first network layer can be modified to 1, namely, group=2 is modified to group=1.

It should be appreciated that the model to be processed may be of a different model type in different application scenarios, for example, in a social application, the model to be processed may also be a speech recognition model, etc.

Fig. 16 is a block diagram showing a structure of a model data processing apparatus according to an embodiment of the present application. As shown in fig. 16, the model data processing apparatus 10 includes a first acquisition module 11, a second acquisition module 12, and a data processing module 13.

A first obtaining module 11, configured to obtain model data of a model to be processed, where the model data is used to characterize at least two network layers included in the model to be processed and execution logic between the at least two network layers, and the network layers are used to process data input to the model to be processed;

a second obtaining module 12, configured to obtain a compression parameter of a first network layer of the at least two network layers, and determine a compression parameter of each network layer except the first network layer according to the compression parameter of the first network layer; the compression parameter is used for indicating a network module to be deleted in the network layer;

and the data processing module 13 is used for carrying out compression processing on the model data based on the compression parameters of each network layer to obtain a compression result of the model to be processed.

In some embodiments, the second acquisition module 12 is specifically configured to:

acquiring initial compression parameters of the first network layer, and determining a compression network structure obtained by the first network layer based on the initial compression parameters;

determining whether a conflict exists between the compressed network structure and other network layers of the to-be-processed model, wherein the conflict comprises a logic conflict between the compressed network structure and the other network layers or a conflict between the compressed network structure and preset logic;

if collision exists between the compression network structure and other network layers of the to-be-processed model, correcting the initial compression parameters of the first network layer until collision does not exist between the compression network structure and other network layers of the to-be-processed model, and obtaining the compression parameters of the first network layer.

In some embodiments, the second acquisition module 12 is further configured to:

determining the compression parameters of the target network layer according to the compression parameters of the first network layer; the target network layer comprises a target compressed network structure obtained by the first network layer based on the compression parameters, or a network structure obtained after the compressed network structure executes preset logic.

In some embodiments, the second acquisition module 12 is further configured to:

and correcting the compression parameters of other network layers with the execution logic of the target network layer according to the compression parameters of the target network layer.

and acquiring a compression strategy, calculating the model file of the first network layer based on the compression strategy, and determining the initial compression parameters of the first network layer.

In some embodiments, when the conflict is a logical conflict between a network structure obtained after the compressed network structure executes a preset logic and other network layers of the model to be processed, the preset logic includes a split logic,

the second acquisition module 12 is specifically configured to:

determining a splitting result obtained after the compression network structure executes the splitting logic based on the splitting logic and the compression network structure, wherein the splitting result is used for representing at least two network structures after the compression network structure is split and initial compression parameters of each network structure;

determining the minimum value in initial compression parameters corresponding to at least two network structures, wherein the minimum value is used for representing the minimum number of network modules to be deleted in the at least two network structures;

And correcting the initial compression parameters of the first network layer based on the minimum value to obtain the compression parameters of the first network layer.

correcting initial compression parameters corresponding to at least two network structures based on the minimum value to obtain compression parameters of at least two network structures;

and splicing the compression parameters of at least two network structures to obtain the compression parameters of the first network layer.

In some embodiments, when the conflict is a logical conflict between the compressed network structure and the other network layer, there is addition logic between the compressed network structure and the other network layer, and the second obtaining module 12 is specifically configured to:

acquiring initial compression parameters of the other network layers;

taking intersection of the initial compression parameters of the first network layer and the initial compression parameters of the other network layers;

and taking the intersection as the compression parameter of the first network layer and the compression parameters of the other network layers.

correcting the compression parameter of the last network layer of the first network layer according to the compression parameter of the first network layer; or alternatively

And taking the compression parameter of the first network layer as the compression parameter of the next network layer of the first network layer.

In some embodiments, the next network layer includes an activation layer, and the second acquiring module 12 is specifically configured to:

and inserting a correction node before the activation layer, and correcting the data input to the activation layer to correspond to the compression parameters of the first network layer.

In some embodiments, the next network layer includes a normalization layer, and the second obtaining module 12 is specifically configured to:

and carrying out forward propagation processing on the compression parameters of the first network layer to obtain the compression parameters of the normalization layer.

compressing the network layers based on the compression parameters of each network layer to obtain a plurality of compressed network layers;

performing inverse gradient transfer processing on the plurality of compression network layers to obtain input tensors of the model to be processed;

and determining the target model according to the input tensor of the model to be processed and the plurality of compressed network layers.

It should be understood that the units or modules described in the model data processing apparatus 10 correspond to the individual steps in the method described with reference to fig. 2. Thus, the operations and features described above with respect to the method are equally applicable to the model data processing apparatus 10 and the units contained therein, and are not described in detail herein. The model data processing apparatus 10 may be implemented in advance in a browser of an electronic device or other security application, or may be loaded into the browser of the electronic device or its security application by downloading or the like. The corresponding elements in the model data processing apparatus 10 may cooperate with elements in the electronic device to implement aspects of embodiments of the present application.

The division of the modules or units mentioned in the above detailed description is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Referring now to fig. 17, fig. 17 shows a schematic diagram of a computer system suitable for use in implementing an electronic device or server of an embodiment of the application,

as shown in fig. 17, the computer system includes a Central Processing Unit (CPU) 1601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1602 or a program loaded from a storage section 1608 into a Random Access Memory (RAM) 1603. In the RAM1603, various programs and data required for operation instructions of the system are also stored. The CPU1601, ROM1602, and RAM1603 are connected to each other by a bus 1604. An input/output (I/O) interface 1605 is also connected to the bus 1604.

The following components are connected to the I/O interface 1605; an input portion 1606 including a keyboard, a mouse, and the like; an output portion 1607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage section 1608 including a hard disk or the like; and a communication section 1609 including a network interface card such as a LAN card, a modem, or the like. The communication section 1609 performs communication processing via a network such as the internet. The drive 1610 is also connected to the I/O interface 1605 as needed. A removable medium 1611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 1610 so that a computer program read out therefrom is installed into the storage section 1608 as needed.

In particular, the process described above with reference to flowchart fig. 2 may be implemented as a computer software program according to an embodiment of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program contains program code for performing the method shown in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1609, and/or installed from the removable media 1611. The above-described functions defined in the system of the present application are performed when the computer program is executed by a Central Processing Unit (CPU) 1601.

The computer readable medium shown in the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation instructions of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, blocks shown in two separate connections may in fact be performed substantially in parallel, or they may sometimes be performed in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules involved in the embodiments of the present application may be implemented in software or in hardware. The described units or modules may also be provided in a processor, for example, as: a processor includes a first acquisition module, a second acquisition module, and a compression module. The names of these units or modules do not in any way constitute a limitation of the unit or module itself, for example, the first acquisition module may also be described as "acquiring a model file of a model to be processed, the model file being used to characterize at least two network layers included in the model to be processed and execution logic between the at least two network layers, the network layers being used to process data input to the model to be processed".

As another aspect, the present application also provides a computer-readable storage medium that may be included in the electronic device described in the above embodiment or may exist alone without being incorporated in the electronic device. The computer-readable storage medium stores one or more programs that when executed by one or more processors perform the model data processing methods described herein.

The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in the present application is not limited to the specific combinations of technical features described above, but also covers other technical features which may be formed by any combination of the technical features described above or their equivalents without departing from the spirit of the disclosure. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims

1. A model data processing method, characterized by comprising:

obtaining model data of a model to be processed, wherein the model data are used for representing at least two network layers included in the model to be processed and execution logic between the at least two network layers, and the network layers are used for processing data input into the model to be processed;

2. The method of claim 1, wherein the obtaining the compression parameter of a first network layer of the at least two network layers comprises:

3. The method according to claim 2, wherein the method further comprises:

4. A method according to claim 3, characterized in that the method further comprises:

5. The method according to any of claims 2-4, wherein the obtaining initial compression parameters of the first network layer comprises:

6. The method of claim 2, wherein, when the conflict is a logical conflict between a network structure obtained after performing a predetermined logic for the compressed network structure and other network layers of the model to be processed, the predetermined logic includes a split logic,

The correcting the initial compression parameter of the first network layer includes:

7. The method of claim 6, wherein the modifying the initial compression parameters of the first network layer based on the minimum value results in compression parameters of the first network layer, comprising:

8. The method of claim 2, wherein when the conflict is a logical conflict between the compressed network structure and the other network layer, there is summing logic between the compressed network structure and the other network layer, the modifying the initial compression parameters of the first network layer comprises:

acquiring initial compression parameters of the other network layers;

9. The method of claim 2, wherein said determining compression parameters for each of said network layers except for said first network layer based on compression parameters for said first network layer comprises:

10. The method of claim 9, wherein the next network layer comprises an active layer, wherein the taking the compression parameter of the first network layer as the compression parameter of the next network layer of the first network layer comprises:

11. The method according to claim 9 or 10, wherein the next network layer comprises a normalization layer, and wherein the taking the compression parameter of the first network layer as the compression parameter of the next network layer of the first network layer comprises:

12. The method according to claims 1-11, wherein said compressing the model data based on the compression parameters of each of the network layers to obtain the compression result of the model to be processed comprises:

and determining the compression result according to the input tensor of the to-be-processed model and the plurality of compression network layers.

13. A model data processing apparatus, characterized by comprising:

and the data processing module is used for carrying out compression processing on the model data based on the compression parameters of each network layer to obtain a compression result of the model to be processed.

14. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the model data processing method according to any of claims 1-12 when executing the program.

15. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the model data processing method according to any one of claims 1-12.

16. A computer program product comprising a computer program which, when executed by a processor, implements the model data processing method according to any one of claims 1-12.