CN115049058B

CN115049058B - Compression method and device of topology recognition model, electronic equipment and medium

Info

Publication number: CN115049058B
Application number: CN202210983746.3A
Authority: CN
Inventors: 苑佳楠; 霍超; 白晖峰; 张港红; 高建; 郑利斌; 于华东; 尹志斌; 罗安琴; 谢凡; 申一帆; 杨双双; 丁啸
Original assignee: Beijing Smartchip Microelectronics Technology Co Ltd
Current assignee: Beijing Smartchip Microelectronics Technology Co Ltd
Priority date: 2022-08-17
Filing date: 2022-08-17
Publication date: 2023-01-20
Anticipated expiration: 2042-08-17
Also published as: CN115049058A; WO2024037393A1

Abstract

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for compressing a topology identification model, an electronic device, and a medium, where the method for compressing the topology identification model includes: pruning the model to be compressed to obtain a pruning model, and training the pruning model to obtain a trained pruning model; quantizing the bit number of the weight parameter of each neural network layer in the trained pruning model from a first bit number to a second bit number to obtain a high quantization pruning model, and training the high quantization pruning model to obtain a trained high quantization pruning model; and taking the trained high-quantification pruning model as a model to be compressed, continuing pruning and quantifying the model to be compressed until a compressed topology recognition model is obtained, and deploying the compressed topology recognition model to the power internet of things terminal, so that the size of the topology recognition model is reduced under the condition that the precision of the topology recognition model meets the requirement.

Description

Compression method and device of topology recognition model, electronic equipment and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for compressing a topology identification model, an electronic device, and a medium.

Background

Accurately knowing the topology of the distribution network is a prerequisite for grid planning, operation and control. However, with the access of a large amount of new energy, the topology of the power distribution network changes frequently, and therefore, the topology structure of the power distribution network is often unknown. In order to accurately identify the topological structure of the power distribution network, the topological structure of the power distribution network can be identified by using a topological identification model, the topological identification model is obtained by training a deep neural network model, the identification precision is high, but the topological identification model can only operate at a server side due to the complex structure. With the development of power distribution network services, a topology identification model needs to be operated on the power internet of things terminal so as to conveniently identify a topology structure measured by a power supply transformer station area. However, the hardware resources of the electric power internet of things terminal are limited, and there are limitations in storage, computing capacity, power consumption, bandwidth and the like, so it is difficult to deploy a deep learning model on the electric power internet of things terminal equipment.

The topology identification model is compressed, and the compressed topology identification model is deployed on the electric power internet of things terminal equipment, so that the method is a better solution. However, compressing the topology identification model inevitably brings about the problem that the measurement accuracy of the topology identification model is reduced.

Therefore, how to effectively reduce the size of the topology identification model under the condition that the accuracy of the topology identification model is ensured to meet the requirement so that the topology identification model can be operated on the power internet of things terminal is a technical problem to be solved urgently.

Disclosure of Invention

In order to solve the problems in the related art, embodiments of the present disclosure provide a compression method and apparatus for a topology identification model, an electronic device, and a medium.

In a first aspect, an embodiment of the present disclosure provides a compression method for a topology identification model, including: pruning the model to be compressed to obtain a pruning model, and training the pruning model to obtain a trained pruning model; the model to be compressed is a topology identification model, and the topology identification model is a machine learning model which runs at a server and is used for identifying a power grid topological structure; quantizing the bit number of the weight parameter of each neural network layer in the trained pruning model from a first bit number to a second bit number to obtain a high quantization pruning model, and training the high quantization pruning model to obtain a trained high quantization pruning model; wherein the second number of bits is less than the first number of bits; and taking the trained high-quantification pruning model as a model to be compressed, continuing pruning and quantifying the model to be compressed until a compressed topology identification model is obtained, and deploying the compressed topology identification model to an electric power internet of things terminal.

In some embodiments, the pruning the model to be compressed to obtain a pruning model includes: and determining a redundant channel in the model to be compressed through sparse regularization training, and removing the redundant channel from the model to be compressed to obtain the pruning model.

In some embodiments, the method further comprises: taking the full-connection layer as a linear classifier, and obtaining the identification precision of each neural network layer in the model to be compressed by using the linear classifier; and if the difference value between the identification precision of any two adjacent neural network layers in the model to be compressed is smaller than a preset difference value threshold value, removing the neural network layer with the lowest identification precision from the any two adjacent neural network layers.

In some embodiments, the taking the fully-connected layer as a linear classifier, and obtaining the recognition accuracy of each neural network layer in the model to be compressed by using the linear classifier includes: selecting any two adjacent neural network layers from the model to be compressed, and removing one of the two adjacent neural network layers to obtain a neural network layer precision prediction model; and testing the neural network layer precision prediction model by using the prediction data set with the label to obtain the output precision of the neural network layer precision prediction model, and taking the output precision as the identification precision of the neural network layer which is not removed in any two adjacent neural network layers.

In some embodiments, the training the pruning model to obtain a trained pruning model includes: marking training data in the label-free training data set by using the model to be compressed to obtain a training data set; and training the pruning model by using the training data set to obtain the trained pruning model.

In some embodiments, the labeling training data in the unlabeled training data set using the model to be compressed to obtain a training data set includes: inputting the training data into the model to be compressed, and taking an output result of the model to be compressed as a label of the training data; and taking the training data and the labels of the training data as a group of training data in the training data set.

In some embodiments, the training the pruning model using the training data set to obtain a trained pruning model includes: obtaining a loss value according to the output of the pruning model and the label of the training data in the training data set by using a loss function; judging whether the loss function is converged or not according to the loss value; and if so, stopping training, otherwise, adjusting the parameters of the pruning model, and performing next training until the loss function converges or the training times reach a preset training time threshold.

In some embodiments, the method further comprises: and if the training times reach the preset training time threshold value and the loss function is not converged, increasing a channel or a neural network layer of the pruning model by referring to the model to be compressed.

In some embodiments, quantizing the bit number of the weight parameter of each neural network layer in the trained pruning model from a first bit number to a second bit number to obtain a high quantization pruning model, including; quantizing the weight parameters of each neural network layer in the pruning model from a first bit number to a second bit number, and training the pruning model; quantizing the bit number of the weight parameter of the full connection layer of the pruning model from a first bit number to a second bit number, and training the pruning model; and quantifying the activation function of the pruning model to obtain the high quantification pruning model.

In some embodiments, the training the high-quantization pruning model to obtain a trained high-quantization pruning model includes: selecting at least one neural network layer from the high quantization pruning model to construct a loss function; marking training data in the label-free training data set by using the pruning model to obtain a training data set; and training the high-quantization pruning model according to the loss function and the training data set to obtain the trained high-quantization pruning model.

In some embodiments, the labeling training data in the unlabeled training data set using the pruning model to obtain a training data set includes: inputting the training data into the pruning model, and using the output of a reference network layer in the pruning model as a label of the training data; wherein the reference network layer corresponds to the at least one neural network layer selected in the high quantization pruning model; and taking the training data and the labels of the training data as a group of training data in the training data set.

In some embodiments, the training the high-quantization pruning model according to the loss function and the training data set to obtain a trained high-quantization pruning model includes: obtaining a loss value according to the output of the at least one neural network layer in the high quantization pruning model and the label of the training data in the training data set by using the loss function; judging whether the loss function is converged or not according to the loss value; and if so, stopping training, otherwise, adjusting the parameters of the high-quantification pruning model, and performing next training until the loss function converges or the training times reach a preset training time threshold.

In some embodiments, the method further comprises: and if the training times reach the preset training time threshold value and the loss function is not converged, increasing the bit number of at least one weight parameter in the high quantization pruning model from the second bit number to the first bit number by referring to the weight parameter of the neural network layer in the pruning model.

In some embodiments, the quantizing the weight parameters of the respective neural network layers in the pruning model from a first number of bits to a second number of bits includes: and successively selecting at least one neural network layer from the pruning model by using a preset algorithm, and quantizing the weight parameters of the at least one neural network layer from a first bit number to a second bit number.

In a second aspect, an embodiment of the present disclosure provides a compression apparatus for a topology identification model, including: the pruning module is used for pruning the model to be compressed to obtain a pruning model, and training the pruning model to obtain a trained pruning model; the model to be compressed is a topology identification model, and the topology identification model is a machine learning model which runs at a server and is used for identifying a power grid topological structure; the quantization module is used for quantizing the bit number of the weight parameter of each neural network layer in the trained pruning model from a first bit number to a second bit number to obtain a high-quantization pruning model, and training the high-quantization pruning model to obtain a trained high-quantization pruning model; wherein the second number of bits is less than the first number of bits; and taking the trained high-quantification pruning model as a model to be compressed, continuing pruning and quantifying the model to be compressed until a compressed topology identification model is obtained, and deploying the compressed topology identification model to an electric power internet of things terminal.

In some embodiments, the pruning the model to be compressed to obtain a pruning model includes: and determining redundant channels in the model to be compressed through sparse regularization training, and removing the redundant channels from the model to be compressed to obtain the pruning model.

In some embodiments, the apparatus further comprises: the network layer identification precision acquisition module is used for taking the full connection layer as a linear classifier and obtaining the identification precision of each neural network layer in the model to be compressed by using the linear classifier; and the removing module is used for removing the neural network layer with the lowest identification precision in any two adjacent neural network layers in the model to be compressed if the difference value between the identification precisions of any two adjacent neural network layers is smaller than a preset difference value threshold.

In some embodiments, the taking the fully-connected layer as a linear classifier, and obtaining the identification precision of each neural network layer in the model to be compressed by using the linear classifier includes: selecting any two adjacent neural network layers from the model to be compressed, and removing one of the two adjacent neural network layers to obtain a neural network layer precision prediction model; and testing the neural network layer precision prediction model by using the prediction data set with the label to obtain the output precision of the neural network layer precision prediction model, and taking the output precision as the identification precision of the neural network layer which is not removed in any two adjacent neural network layers.

In some embodiments, the training the pruning model to obtain a trained pruning model includes: marking the training data in the label-free training data set by using the model to be compressed to obtain a training data set; and training the pruning model by using the training data set to obtain the trained pruning model.

In some embodiments, the training the pruning model by using the training data set to obtain a trained pruning model includes: obtaining a loss value according to the output of the pruning model and the label of the training data in the training data set by using a loss function; judging whether the loss function is converged or not according to the loss value; and if so, stopping training, otherwise, adjusting the parameters of the pruning model, and performing next training until the loss function converges or the training times reach a preset training time threshold.

In some embodiments, the apparatus further comprises: and the channel or network layer increasing module is used for increasing the channel or the neural network layer of the pruning model according to the model to be compressed if the training times reach the preset training time threshold value and the loss function is not converged.

In some embodiments, the labeling training data in the unlabeled training data set using the pruning model to obtain a training data set includes: inputting the training data into the pruning model, and taking the output of a reference network layer in the pruning model as a label of the training data; wherein the reference network layer corresponds to the at least one neural network layer selected in the high quantization pruning model; and taking the training data and the labels of the training data as a group of training data in the training data set.

In some embodiments, the training the high-quantization pruning model according to the loss function and the training data set to obtain a trained high-quantization pruning model includes: obtaining a loss value according to the output of the at least one neural network layer in the high quantization pruning model and the label of the training data in the training data set by using the loss function; judging whether the loss function is converged or not according to the loss value; if so, stopping training, otherwise, adjusting the parameters of the high-quantization pruning model, and performing next training until the loss function converges or the training times reach a preset training time threshold.

In some embodiments, the apparatus further comprises: and the inverse quantization module is configured to, if the training times reach the preset training time threshold and the loss function is not converged, increase the number of bits of at least one weight parameter in the high quantization pruning model from the second number of bits to the first number of bits with reference to the weight parameter of the neural network layer in the pruning model.

In a third aspect, the embodiments of the present disclosure provide a chip including the apparatus according to any one of the embodiments of the second aspect.

In a fourth aspect, the disclosed embodiments provide an electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method as described above.

In a fifth aspect, embodiments of the present disclosure provide a computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the method as described above.

According to the technical scheme provided by the embodiment of the disclosure, a model to be compressed is pruned to obtain a pruning model, and the pruning model is trained to obtain a trained pruning model; quantizing the bit number of the weight parameter of each neural network layer in the trained pruning model from the first bit number to the second bit number to obtain a high-quantization pruning model, and training the high-quantization pruning model to obtain the trained high-quantization pruning model. By alternately carrying out pruning operation and quantification operation, the size of the deep learning model can be effectively reduced under the condition of ensuring that the precision of the deep learning model meets the requirement.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other features, objects, and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings.

Fig. 1 illustrates an application scenario of a compression method of a topology identification model according to an embodiment of the present disclosure.

Fig. 2 illustrates an exemplary flow diagram of a compression method of a topology identification model according to an embodiment of the present disclosure.

Fig. 3 illustrates an exemplary schematic diagram of a model to be compressed according to an embodiment of the disclosure.

Fig. 4 illustrates an exemplary schematic diagram of training a high quantization pruning model with reference to a pruning model according to an embodiment of the present disclosure.

FIG. 5 illustrates an exemplary schematic diagram of a compression apparatus for a topology identification model according to an embodiment of the disclosure.

Fig. 6 shows a block diagram of an electronic device according to an embodiment of the present disclosure.

FIG. 7 shows a schematic block diagram of a computer system suitable for use in implementing methods according to embodiments of the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.

It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In the present disclosure, if an operation of acquiring user information or user data or an operation of presenting user information or user data to others is involved, the operations are all operations authorized, confirmed, or actively selected by a user.

As shown in fig. 1, a service end 110, a terminal 120 and a network 130 may be included in an application scenario.

In some embodiments, data or information may be exchanged between the server 110 and the terminal 120 through the network 130. For example, the server 110 may obtain information and/or data in the terminal 120 through the network 130, or may transmit information and/or data to the terminal 120 through the network 130.

The terminal 120 is an electric power internet of things terminal device which needs to be deployed with a topology identification model. In some embodiments, the terminal 120 may receive the compressed convolutional network model from the server 110 and perform the prediction task locally at the terminal 120 using the compressed topology identification model. The terminal 120 may be one or any combination of a mobile device, a tablet computer, and the like having input and/or output capabilities.

In some embodiments, the server 110 may prune the to-be-compressed model to obtain a pruning model, and train the pruning model to obtain a trained pruning model. In some embodiments, the server 110 may quantize the bit number of the weight parameter of each neural network layer in the trained pruning model from the first bit number to the second bit number to obtain a high quantization pruning model, and train the high quantization pruning model to obtain the trained high quantization pruning model. The server 110 may be a single server or a group of servers. The set of servers may be centralized or distributed (e.g., the server 110 may be a distributed system), may be dedicated, or may be served simultaneously by other devices or systems. In some embodiments, the server 110 may be regional or remote. In some embodiments, the server 110 may be implemented on a cloud platform or provided in a virtual manner. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an internal cloud, a multi-tiered cloud, and the like, or any combination thereof.

In some embodiments, the network 130 may be any one or more of a wired network or a wireless network. For example, the network 130 may include a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), etc., or any combination thereof.

Fig. 2 illustrates an exemplary flowchart of a compression method of a topology recognition model according to an embodiment of the present disclosure. As shown in fig. 2, the compression method of the topology identification model includes the following steps S201-S202:

in step S201, pruning is performed on the model to be compressed to obtain a pruning model, and the pruning model is trained to obtain a trained pruning model.

The model to be compressed is a deep neural network model which needs to be compressed. In some embodiments, the model to be compressed may be a pre-trained topology recognition model, which is a machine learning model operating on the server for recognizing the topology of the power grid, and the topology recognition model may be constructed based on a convolutional neural network model. The input of the topology identification model can be voltage and current data of the low-voltage transformer area side, and the output can be the position of each electric meter in the power supply transformer area and the incidence relation among the electric meters. In a specific implementation process, the prediction accuracy of the topology recognition model can be set, and the topology recognition model is trained by using a training data set, wherein the training is to determine parameters of the topology recognition model. And taking the trained topology recognition model as a model to be compressed, ensuring that the precision reduction value of the model to be compressed is within a prediction range, and compressing the model to be compressed. The training data of the training data set can be historical voltage and current data of the low-voltage transformer area side of the power supply transformer, and the labels of the training data are the positions of all the electric meters in the power supply transformer area corresponding to the historical voltage and current data and the incidence relation among all the electric meters.

The model to be compressed has higher identification precision and simultaneously has a large number of redundancy weight parameters. In the process of processing input data by the model to be compressed, not all weight parameters participate in effective calculation and influence the prediction result. The pruning method reduces the scale of the model network to be compressed and reduces the calculation complexity by removing redundant weight parameters, channels or network layers in the network structure of the model to be compressed.

In some embodiments, a pruning model may be obtained by determining redundant channels in a model to be compressed through sparse regularization training and removing the redundant channels from the model to be compressed. Sparse refers to eliminating some features in the data to generalize the model and reduce the probability of overfitting. Regularizer refers to narrowing the solution by adding some rules (constraints) to the objective function that needs to be trained. In a specific implementation process, a BN layer scaling factor may be introduced as a numerical representation of the activation degree of a corresponding channel, a scaling factor of a redundant channel is reduced through sparse regularization training, and the redundant channel is removed by setting a threshold, which may specifically include the following steps.

1. And performing L1 regularization training on the convolutional layer and the BN layer of the model to obtain a convolutional neural network with sparse weight parameter values.

2. And judging the importance of each network layer in the convolutional neural network according to the sparsity of the filter and the scaling factor of the BN layer.

3. And carrying out structured pruning on the sparse filter and the corresponding connection.

The intermediate network layer of the convolutional neural network comprises a plurality of important information related to a prediction task, and more accurate pruning can be realized under the condition of keeping identification precision by analyzing the relevance between any intermediate layer of the convolutional neural network and the network layer adjacent to the intermediate layer.

In some embodiments, the fully-connected layer may be used as a linear classifier, and the linear classifier is used to obtain the recognition accuracy of each neural network layer in the model to be compressed; and if the difference value between the identification accuracies of any two adjacent neural network layers in the model to be compressed is smaller than a preset difference value threshold value, removing the neural network layer with the lowest identification accuracy from the any two adjacent neural network layers. In a specific implementation process, any two adjacent neural network layers can be selected from the model to be compressed, and one of the two adjacent neural network layers is removed to obtain a neural network layer precision prediction model; and testing the neural network layer precision prediction model by using the prediction data set with the label to obtain the output precision of the neural network layer precision prediction model, and taking the output precision as the identification precision of the neural network layer which is not removed in any two adjacent neural network layers.

By way of example only, as shown in fig. 3, the model to be compressed (i.e. the topology identification model) is constructed according to a convolutional neural network model, and the model to be compressed includes a plurality of convolutional layers (the actual model to be compressed includes a plurality of convolutional network layers, and only convolutional layer 1 and convolutional layer 2 are drawn for convenience of description) and a plurality of fully-connected layers. The fully-connected layer (e.g., fully-connected layer 1) of the topology recognition model can be used as a linear classifier for evaluating the effectiveness of each convolutional layer (e.g., convolutional layer 1 and convolutional layer 2), and the embedded vector length of each convolutional layer is unified by adopting an adaptive average pooling algorithm, so that the convolutional layers with different characteristic traits output characteristic maps with the same size. For example, the convolutional layer 1 may be removed, then fine tuning is performed on the linear classifier layer by layer through a back propagation algorithm, each layer obtains the optimal recognition accuracy, the topology recognition model at this time is used as a neural network layer accuracy prediction model, a plurality of test data in the test data set with labels are respectively input into the neural network layer accuracy prediction model to obtain a plurality of test results, the prediction accuracy of the neural network layer accuracy prediction model can be obtained according to the plurality of test results and the labels of the corresponding test data in the test data set, and the prediction accuracy is used as the recognition accuracy of the convolutional layer 2. The recognition accuracy of the layer 1 can be convoluted by the same method as in the above example. And then, carrying out subtraction operation on the identification precision of the convolutional layer 1 and the identification precision of the convolutional layer 2, judging whether the absolute value of the operation result is smaller than a preset difference threshold value, if so, indicating that the influence of one of the convolutional layers 1 and 2 on the final prediction result of the topology identification model can be ignored, and removing the convolutional layer with the lowest identification precision in the convolutional layers 1 and 2.

In the pruning process of the model to be compressed, part of effective information may be lost, so that the performance of the pruning model is reduced. Some embodiments of the disclosure introduce a knowledge distillation technology, obtain distillation loss by comparing a model to be compressed (a trained topology recognition model) with a pruning model, establish a loss function, and implement guidance of the pruning model by the model to be compressed, thereby effectively improving the performance of the pruning model.

In some embodiments, the model to be compressed may be used to label the training data in the unlabeled training data set to obtain the training data set. In a specific implementation process, training data may be input into the model to be compressed, an output result of the model to be compressed is used as a label of the training data, and the training data and the label of the training data are used as a set of training data in the training data set.

In some embodiments, the training data set obtained by the above method may be used to train the pruning model, so as to obtain a trained pruning model. In a specific implementation process, a loss value may be obtained according to an output of the pruning model and a label of training data in the training data set by using a loss function (for example, a square loss function); judging whether the loss function converges (for example, judging whether the loss value is smaller than a preset threshold value) according to the loss value; and if so, stopping training, otherwise, adjusting parameters of the pruning model, and performing next training until the loss function converges or the training times reach a preset training time threshold value.

In some embodiments, if the training times reach the preset training time threshold value and the loss function is not converged yet, which may indicate that the pruning processing affects the precision of the model to be compressed, the channel or the neural network layer of the pruning model may be increased with reference to the model to be compressed. In a specific implementation process, channels or neural network layers cut from a model to be compressed can be gradually added into the pruning model, after the channels or the neural network layers are added each time, the model is trained as above until a loss function is converged in the training process, and the pruning model with the converged loss function is used as the trained pruning model.

In step S202, quantizing the bit numbers of the weight parameters of each neural network layer in the trained pruning model from the first bit number to the second bit number to obtain a high quantization pruning model, and training the high quantization pruning model to obtain a trained high quantization pruning model; wherein the second number of bits is less than the first number of bits.

The model quantization is a compression method which approximates some similar weight parameters of the deep neural network model into the same numerical values, compresses and accelerates the model by reducing bit width representation of the weight parameters, keeps the accuracy performance of the model basically unchanged, and can reduce the scale of the model and accelerate the operation speed. The bit width of the model weight parameter may be represented by a bit number, and the bit number of the weight parameter may be: 1. 2, 4, 8, 16, 32, etc. to the power of 2. The weight parameters of the deep learning model are typically 32 bits. The high bit means that the recognition accuracy of the model is very high, and meanwhile, the model is very large in size, low in reasoning speed and high in consumption of hardware resources.

In some embodiments, the weight parameters of each neural network layer in the pruning model may be quantized step by step, the weight parameters of each neural network layer in the pruning model are quantized to a higher bit number, and then the quantization is continued to a lower bit number from the higher bit number, and so on until the compression task is completed.

In a specific implementation process, the weight parameters of each neural network layer in the pruning model can be quantized to a second bit number from a first bit number, and the pruning model is trained; quantizing the bit number of the weight parameter of the full connection layer of the pruning model from the first bit number to the second bit number, and training the pruning model; and quantifying the activation function of the pruning model to obtain a high-quantification pruning model. For example, the weight parameters of each neural network layer and the full-link layer in the pruning model can be quantized from 32 bits to 16 bits. In a specific implementation process, at least one neural network layer can be selected from the pruning model by using a preset algorithm one by one, and the weight parameter of the at least one neural network layer is quantized from a first bit number to a second bit number. The preset algorithm may include, but is not limited to: random strategies, prune-based strategies, etc. The random strategy is: the weight parameters are randomly divided into disjoint parts, i.e. grouped with equal probability. The prune-based strategy is: the threshold value of each layer is determined layer by layer (generally given according to the splitting ratio), and the absolute value of the weight parameter is compared with the threshold value and is divided into non-overlapping groups.

After the high-quantification pruning model is obtained by the method, the high-quantification pruning model can be trained to obtain the trained high-quantification pruning model.

Embodiments of the present disclosure introduce the idea of knowledge distillation to train high-bit quantization models. The high-quantization pruning model is divided into different parts, a plurality of parts are selected to construct a guide loss function and used as a constraint function, and the precision constraint is carried out on the quantized result. For the network layer which does not meet the precision, the weight parameters of the high-quantization pruning model are guided by the weight parameters of the pruning model, the model precision reduction caused by the quantization process is compensated, and the precision of the quantization pruning model is improved.

In some embodiments, as shown in fig. 4, at least one neural network layer may be selected from a high quantization pruning model to construct a loss function (for example, an absolute value loss function), and the training data in the unlabeled training data set may be labeled by using the pruning model to obtain a training data set. In a specific implementation process, as shown in fig. 4, training data may be input into the pruning model, and an output of a reference network layer in the pruning model is used as a label of the training data; and the reference network layer corresponds to at least one selected neural network layer in the high quantization pruning model, and the training data and the labels of the training data are used as a group of training data in the training data set.

After the training data set is obtained, the high-quantization pruning model can be trained according to the loss function and the training data set, and the trained high-quantization pruning model is obtained.

In a specific implementation process, as shown in fig. 4, a loss function may be used, a loss value is obtained according to a predicted value of an output of at least one neural network layer in the high quantization pruning model and a label of training data in the training data set, whether the loss function converges is judged according to the loss value, if yes, training is stopped, otherwise, a parameter of the high quantization pruning model is adjusted, and next training is performed until the loss function converges or the training frequency reaches a preset training frequency threshold.

In some embodiments, if the training times reach the preset training time threshold value, and the loss function has not converged, which indicates that the quantization process has a large influence on the precision of the pruning model, the number of bits of at least one weight parameter in the high-quantization pruning model may be increased from the second number of bits to the first number of bits with reference to the weight parameter of the neural network layer in the pruning model. After the weight parameters in the high-quantization pruning model are adjusted by referring to the pruning model, the high-quantization pruning model needs to be trained continuously until the loss function is converged in the training process, and then the high-quantization pruning model with the function being converged is used as the trained high-quantization pruning model.

In step S203, the trained high-quantization pruning model is used as a model to be compressed, pruning and quantization are continuously performed on the model to be compressed until a compressed topology identification model is obtained, and the compressed topology identification model is deployed to the power internet of things terminal.

In some embodiments, after the trained high-quantification pruning model is obtained in step S202, the trained high-quantification pruning model may be used as a model to be compressed, and pruning and quantification are continuously performed on the model to be compressed according to steps S201 to S202 until a compressed topology identification model is obtained. The compressed topology identification model can be a topology identification model meeting preset model indexes, and the compressed topology identification model is deployed to the electric power internet of things terminal. The preset model indexes comprise: compression ratio, calculation amount, thrust delay, accuracy loss and the like.

Fig. 5 shows a block diagram of a compression apparatus of a topology recognition model according to an embodiment of the present disclosure. The apparatus may be implemented as part or all of an electronic device through software, hardware, or a combination of both.

As shown in fig. 5, the compressing apparatus 500 of the topology identification model includes a pruning module 510 and a quantization module 520.

A pruning module 510, configured to prune the model to be compressed to obtain a pruning model, and train the pruning model to obtain a trained pruning model; the model to be compressed is a topology identification model, and the topology identification model is a machine learning model which runs at a server and is used for identifying a power grid topological structure.

A quantization module 520, configured to quantize the bit number of the weight parameter of each neural network layer in the trained pruning model from a first bit number to a second bit number, so as to obtain a high-quantization pruning model, and train the high-quantization pruning model, so as to obtain a trained high-quantization pruning model; wherein the second number of bits is less than the first number of bits.

A deployment module 530, configured to take the trained high-quantization pruning model as a to-be-compressed model, continue pruning and quantizing the to-be-compressed model until a compressed topology identification model is obtained, and deploy the compressed topology identification model to an electric power internet of things terminal.

In the embodiment of the compression apparatus for a topology identification model, the specific processing of each module and the technical effects thereof can refer to the relevant descriptions in the corresponding method embodiments, which are not repeated herein.

The embodiment of the present disclosure further provides a chip, where the chip includes the compression apparatus of the topology identification model, and the apparatus may be implemented as part or all of the chip through software, hardware, or a combination of the two.

The present disclosure also discloses an electronic device, and fig. 6 shows a block diagram of the electronic device according to an embodiment of the present disclosure.

As shown in fig. 6, the electronic device includes a memory and a processor, where the memory is configured to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement a method according to an embodiment of the disclosure.

FIG. 7 shows a schematic block diagram of a computer system suitable for use in implementing a method according to an embodiment of the present disclosure.

As shown in fig. 7, the computer system includes a processing unit that can execute the various methods in the above-described embodiments according to a program stored in a Read Only Memory (ROM) or a program loaded from a storage section into a Random Access Memory (RAM). In the RAM, various programs and data necessary for the operation of the computer system are also stored. The processing unit, the ROM, and the RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.

The following components are connected to the I/O interface: an input section including a keyboard, a mouse, and the like; an output section including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section including a hard disk and the like; and a communication section including a network interface card such as a LAN card, a modem, or the like. The communication section performs a communication process via a network such as the internet. The drive is also connected to the I/O interface as needed. A removable medium such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive as necessary, so that a computer program read out therefrom is mounted into the storage section as necessary. The processing unit can be realized as a CPU, a GPU, a TPU, an FPGA, an NPU and other processing units.

In particular, the above described methods may be implemented as computer software programs according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the above-described method. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or by programmable hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation on the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be a computer-readable storage medium included in the electronic device or the computer system in the above embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and the technical features disclosed in the present disclosure (but not limited to) having similar functions are replaced with each other to form the technical solution.

Claims

1. A method of compressing a topology identification model, the method comprising:

pruning a model to be compressed to obtain a pruning model, training the pruning model to obtain a trained pruning model, and if the training times reach a preset training time threshold value and a loss function is not converged, increasing a channel or a neural network layer of the pruning model by referring to the model to be compressed; the model to be compressed is a topology identification model, and the topology identification model is a machine learning model which runs at a server and is used for identifying a power grid topological structure;

quantizing the bit number of the weight parameter of each neural network layer in the trained pruning model from a first bit number to a second bit number to obtain a high quantization pruning model, and training the high quantization pruning model to obtain a trained high quantization pruning model; wherein the second number of bits is less than the first number of bits; for the network layer which does not meet the precision, the weight parameters of the high quantization pruning model are guided and trained through the weight parameters of the pruning model;

taking the trained high-quantification pruning model as a model to be compressed, continuously pruning and quantifying the model to be compressed until a compressed topology identification model is obtained, and deploying the compressed topology identification model to an electric power internet of things terminal;

wherein, training the pruning model to obtain a trained pruning model comprises:

marking training data in the label-free training data set by using the model to be compressed to obtain a training data set; training data of the training data set are historical voltage and current data of a low-voltage transformer area side of the power supply transformer, and labels of the training data are positions of all electric meters in the power supply transformer area corresponding to the historical voltage and current data and incidence relations among all the electric meters;

and training the pruning model by using the training data set to obtain the trained pruning model.

2. The method of claim 1, wherein pruning the model to be compressed to obtain a pruning model comprises:

and determining a redundant channel in the model to be compressed through sparse regularization training, and removing the redundant channel from the model to be compressed to obtain the pruning model.

3. The method of claim 2, further comprising:

taking the full-connection layer as a linear classifier, and obtaining the identification precision of each neural network layer in the model to be compressed by using the linear classifier;

and if the difference value between the identification precision of any two adjacent neural network layers in the model to be compressed is smaller than a preset difference value threshold value, removing the neural network layer with the lowest identification precision from the any two adjacent neural network layers.

4. The method according to claim 3, wherein the using the fully-connected layer as a linear classifier to obtain the recognition accuracy of each neural network layer in the model to be compressed comprises:

selecting any two adjacent neural network layers from the model to be compressed, and removing one of the two adjacent neural network layers to obtain a neural network layer precision prediction model;

and testing the neural network layer precision prediction model by using the prediction data set with the label to obtain the output precision of the neural network layer precision prediction model, and taking the output precision as the identification precision of the neural network layer which is not removed in any two adjacent neural network layers.

5. The method of claim 1, wherein the labeling training data in the unlabeled training data set using the model to be compressed to obtain a training data set comprises:

inputting the training data into the model to be compressed, and taking an output result of the model to be compressed as a label of the training data;

and taking the training data and the labels of the training data as a group of training data in the training data set.

6. The method of claim 1, wherein the training the pruning model using the training data set to obtain a trained pruning model comprises:

obtaining a loss value according to the output of the pruning model and the label of the training data in the training data set by using a loss function;

judging whether the loss function is converged or not according to the loss value;

and if so, stopping training, otherwise, adjusting the parameters of the pruning model, and performing next training until the loss function converges or the training times reach a preset training time threshold.

7. The method according to claim 1, wherein the quantizing the bit number of the weight parameter of each neural network layer in the trained pruning model from a first bit number to a second bit number, to obtain a high quantization pruning model, comprises;

quantizing the weight parameters of each neural network layer in the pruning model from a first bit number to a second bit number, and training the pruning model;

quantizing the bit number of the weight parameter of the full connection layer of the pruning model from a first bit number to a second bit number, and training the pruning model;

and quantizing the activation function of the pruning model to obtain the high quantization pruning model.

8. The method according to claim 1, wherein the training the high-quantization pruning model to obtain a trained high-quantization pruning model comprises:

selecting at least one neural network layer from the high quantization pruning model to construct a loss function;

marking training data in the label-free training data set by using the pruning model to obtain a training data set;

and training the high-quantization pruning model according to the loss function and the training data set to obtain the trained high-quantization pruning model.

9. The method of claim 8, wherein using the pruning model to label training data in the unlabeled training data set to obtain a training data set comprises:

inputting the training data into the pruning model, and using the output of a reference network layer in the pruning model as a label of the training data; wherein the reference network layer corresponds to the at least one neural network layer selected in the high quantization pruning model;

10. The method of claim 8, wherein the training the high-quantization pruning model according to the loss function and the training data set to obtain a trained high-quantization pruning model comprises:

obtaining a loss value according to the output of the at least one neural network layer in the high quantization pruning model and the label of the training data in the training data set by using the loss function;

if so, stopping training, otherwise, adjusting the parameters of the high-quantization pruning model, and performing next training until the loss function converges or the training times reach a preset training time threshold.

11. The method of claim 10, further comprising:

and if the training times reach the preset training time threshold value and the loss function is not converged, increasing the bit number of at least one weight parameter in the high quantization pruning model from the second bit number to the first bit number by referring to the weight parameter of the neural network layer in the pruning model.

12. The method of claim 7, wherein quantizing the weight parameters of each neural network layer in the pruning model from a first number of bits to a second number of bits comprises:

and successively selecting at least one neural network layer from the pruning model by using a preset algorithm, and quantizing the weight parameters of the at least one neural network layer from a first bit number to a second bit number.

13. An apparatus for compressing a topology identification model, comprising:

the pruning module is used for pruning the model to be compressed to obtain a pruning model, training the pruning model to obtain a trained pruning model, and if the training times reach a preset training time threshold value and a loss function is not converged, increasing a channel or a neural network layer of the pruning model by referring to the model to be compressed; the model to be compressed is a topology identification model, and the topology identification model is a machine learning model which runs at a server and is used for identifying a power grid topological structure;

the quantization module is used for quantizing the bit number of the weight parameter of each neural network layer in the trained pruning model from a first bit number to a second bit number to obtain a high-quantization pruning model, and training the high-quantization pruning model to obtain a trained high-quantization pruning model; wherein the second number of bits is less than the first number of bits; for the network layer which does not meet the precision, the weight parameters of the high quantization pruning model are guided and trained through the weight parameters of the pruning model;

the deployment module is used for taking the trained high-quantification pruning model as a model to be compressed, continuing pruning and quantifying the model to be compressed until a compressed topology identification model is obtained, and deploying the compressed topology identification model to an electric power internet of things terminal;

14. The apparatus of claim 13, wherein the pruning the model to be compressed to obtain a pruning model comprises:

15. The apparatus of claim 14, further comprising:

the network layer identification precision acquisition module is used for taking the full connection layer as a linear classifier and obtaining the identification precision of each neural network layer in the model to be compressed by using the linear classifier;

and the removing module is used for removing the neural network layer with the lowest identification precision in any two adjacent neural network layers if the difference value between the identification precisions of any two adjacent neural network layers in the model to be compressed is smaller than a preset difference value threshold.

16. The apparatus according to claim 15, wherein the using the fully-connected layer as a linear classifier to obtain the recognition accuracy of each neural network layer in the model to be compressed comprises:

17. The apparatus of claim 13, wherein labeling training data in the unlabeled training data set using the model to be compressed to obtain a training data set comprises:

18. The apparatus of claim 13, wherein the training the pruning model using the training data set to obtain a trained pruning model comprises:

19. The apparatus according to claim 13, wherein the quantizing the bit number of the weight parameter of each neural network layer in the trained pruning model from a first bit number to a second bit number, resulting in a high quantized pruning model, comprises;

and quantifying the activation function of the pruning model to obtain the high quantification pruning model.

20. The apparatus of claim 13, wherein the training of the high-level pruning model to obtain a trained high-level pruning model comprises:

21. The apparatus of claim 20, wherein the labeling training data in the unlabeled training data set using the pruning model to obtain a training data set comprises:

22. The apparatus of claim 20, wherein the training the high-quantization pruning model according to the loss function and the training data set to obtain a trained high-quantization pruning model comprises:

and if so, stopping training, otherwise, adjusting the parameters of the high-quantification pruning model, and performing next training until the loss function converges or the training times reach a preset training time threshold.

23. The apparatus of claim 22, further comprising:

and the inverse quantization module is configured to, if the training times reach the preset training time threshold and the loss function is not converged, increase the number of bits of at least one weight parameter in the high quantization pruning model from the second number of bits to the first number of bits with reference to the weight parameter of the neural network layer in the pruning model.

24. The apparatus of claim 19, wherein quantizing the weight parameters of each neural network layer in the pruning model from a first number of bits to a second number of bits comprises:

25. A chip, characterized in that,

the chip comprising the apparatus of any one of claims 13-24.

26. An electronic device comprising a memory and a processor; wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method steps of any of claims 1-12.

27. A computer-readable storage medium having stored thereon computer instructions, characterized in that the computer instructions, when executed by a processor, implement the method steps of any of claims 1-12.