CN114418084A

CN114418084A - Unstructured pruning model obtaining method and device, electronic equipment and storage medium

Info

Publication number: CN114418084A
Application number: CN202111452232.7A
Authority: CN
Inventors: 李明昊; 王豪爽; 党青青; 文灿
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2022-04-29

Abstract

The disclosure provides an unstructured pruning model obtaining method, an unstructured pruning model obtaining device, electronic equipment and a storage medium, and relates to the field of artificial intelligence such as deep learning, wherein the method comprises the following steps: determining a compression mode according to the type of the unstructured pruning model to be obtained; and training by combining the determined compression mode to obtain the unstructured pruning model. By applying the scheme disclosed by the disclosure, the training time can be shortened, the training efficiency can be improved, and the like.

Description

Unstructured pruning model obtaining method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, and in particular relates to an unstructured pruning model acquisition method and device in the field of deep learning, electronic equipment and a storage medium.

Background

In the deep learning field, in order to achieve smaller model volume, faster reasoning speed and the like, a series of model compression strategies (namely compression modes) are provided, including quantization, pruning and the like, and unstructured pruning is a pruning mode. Currently, for training an unstructured pruning model, a long training time is usually required, i.e., training efficiency is low.

Disclosure of Invention

The disclosure provides an unstructured pruning model acquisition method, an unstructured pruning model acquisition device, electronic equipment and a storage medium.

An unstructured pruning model acquisition method comprises the following steps:

determining a compression mode according to the type of the unstructured pruning model to be obtained;

and training by combining the determined compression mode to obtain the unstructured pruning model.

An unstructured pruning model acquisition apparatus, comprising: the device comprises a first processing module and a second processing module;

the first processing module is used for determining a compression mode according to the type of the unstructured pruning model to be acquired;

and the second processing module is used for training in combination with the determined compression mode to obtain the unstructured pruning model.

An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method as described above.

A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described above.

A computer program product comprising computer programs/instructions which, when executed by a processor, implement a method as described above.

One embodiment in the above disclosure has the following advantages or benefits: the corresponding compression mode can be determined according to the type of the unstructured pruning model, and the unstructured pruning model can be trained by combining the determined compression mode, so that the training time is shortened, the training efficiency is improved, and the like.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of an embodiment of a method for obtaining an unstructured pruning model according to the present disclosure;

FIG. 2 is a schematic diagram of a process for performing each round of training on a floating-point unstructured pruning model in the manner described in the present disclosure;

FIG. 3 is a first schematic diagram of a process for performing each round of training on a fixed-point unstructured pruning model in a manner described in the present disclosure;

FIG. 4 is a second schematic diagram of a process for performing each round of training on a fixed-point unstructured pruning model in a manner consistent with the present disclosure;

fig. 5 is a schematic structural diagram illustrating a composition of an embodiment 500 of an unstructured pruning model obtaining apparatus according to the present disclosure;

FIG. 6 illustrates a schematic block diagram of an electronic device 600 that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In addition, it should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Fig. 1 is a flowchart of an embodiment of an unstructured pruning model acquisition method according to the present disclosure. As shown in fig. 1, the following detailed implementation is included.

In step 101, a compression mode is determined according to the type of the unstructured pruning model to be acquired.

In step 102, the unstructured pruning model is obtained by training in combination with the determined compression mode.

It can be seen that, in the scheme of the embodiment of the method, the corresponding compression mode can be determined according to the type of the unstructured pruning model, and the unstructured pruning model can be trained by combining the determined compression mode, so that the training time is shortened, the training efficiency is improved, and the like.

In one embodiment of the present disclosure, the types of unstructured pruning models to be acquired may include: floating point unstructured pruning models. For example, there may be an unstructured pruning model of FP32, and FP32 is a single precision floating point number.

Accordingly, in an embodiment of the present disclosure, the method for obtaining the unstructured pruning model by training in combination with the determined compression method may be: the unstructured pruning model is obtained by combining with knowledge distillation training, namely the floating-point type unstructured pruning model can be obtained by combining with knowledge distillation training.

Aiming at the problem that a floating point type unstructured pruning model needs longer training time, the knowledge distillation method is introduced in the training process of the floating point type unstructured pruning model by means of the characteristic that the convergence speed of the model can be accelerated by the knowledge distillation method.

In an embodiment of the disclosure, an original model corresponding to the floating-point unstructured pruning model and not subjected to pruning and other processing may be used as a teacher model, the floating-point unstructured pruning model may be used as a student model, and knowledge distillation may be performed, that is, a self-distillation mode of distilling the floating-point unstructured pruning model using the original model may be adopted, so that training duration may be shortened, training efficiency may be improved, and model precision may be improved.

Fig. 2 is a schematic diagram of a process of performing each round of training on a floating-point unstructured pruning model in the manner described in the present disclosure.

As shown in fig. 2, in each round of training, a clipping operator may be used to clip the model, and then a forward propagation result may be determined based on the clipped model, so as to determine a distillation loss function according to the forward propagation result, and accordingly, model parameters may be updated according to the obtained distillation loss function, and so on.

In an embodiment of the present disclosure, the type of the unstructured pruning model to be acquired may further include: fixed-point unstructured pruning models. For example, it may be an INT8 unstructured pruning model.

Accordingly, in an embodiment of the present disclosure, the method for obtaining the unstructured pruning model by training in combination with the determined compression method may be: and (3) combining the quantization mode training to obtain the unstructured pruning model, namely combining the quantization mode training to obtain the fixed-point unstructured pruning model.

In the traditional mode, verification is only carried out on the FP32 numerical precision aiming at an unstructured pruning model, and corresponding training of the unstructured pruning model is not carried out on an INT8 model widely used in products, but after the mode disclosed by the disclosure is adopted, a quantization and unstructured pruning training mode can be adopted, so that a required fixed-point unstructured pruning model such as an INT8 unstructured pruning model can be obtained, and the training efficiency and the like are improved.

Fig. 3 is a first schematic diagram of a process of performing each round of training on a fixed-point unstructured pruning model in the manner described in the present disclosure.

As shown in fig. 3, in each round of training, a clipping operator may be used to clip the model, and then a quantization operator may be used to quantize the clipped model, for example, the weighting parameters and features in the model are quantized from a floating point of FP32 (full numerical precision) to a fixed point of INT8 (quantization precision), how quantization is performed is not limited, for example, an existing quantization mode may be used, and then a forward propagation result may be determined based on the quantized model, and further a loss function may be determined according to the forward propagation result, so as to distinguish from a distillation loss function, the loss function may be referred to as an original loss function, and accordingly, the model parameters may be updated according to the obtained original loss function, and so on.

For a fixed-point unstructured pruning model, in an embodiment of the present disclosure, the method for obtaining the unstructured pruning model by combining with the determined compression method training may further be: the unstructured pruning model is obtained by combining the training of a quantization mode and a knowledge distillation mode, namely the fixed-point unstructured pruning model can be obtained by combining the training of the quantization mode and the knowledge distillation mode.

In an embodiment of the disclosure, an original model corresponding to the fixed-point unstructured pruning model and not subjected to pruning and quantization processing can be used as a teacher model, the fixed-point unstructured pruning model can be used as a student model, knowledge distillation is performed, namely, a self-distillation mode of distilling the fixed-point unstructured pruning model by using the original model can be adopted, so that the training time can be shortened, the training efficiency can be improved, the model precision can be improved, and the finally obtained model precision can basically reach the level of dense FP 32.

Fig. 4 is a second schematic diagram of a process of performing each round of training on a fixed-point unstructured pruning model in the manner described in the present disclosure.

As shown in fig. 4, in each round of training, a model may be clipped by using a clipping operator, and then a quantization operator may be used to quantize the clipped model, for example, weighting parameters and features in the model are quantized from a floating point of FP32 to a fixed point of INT8, how quantization is not limited, and then a forward propagation result may be determined based on the quantized model, and further a distillation loss function may be determined according to the forward propagation result, and accordingly, model parameters may be updated according to the obtained distillation loss function, and the like.

It can be seen that, for the fixed-point unstructured pruning model, two training modes are proposed in the present disclosure, one is a training mode of quantization + unstructured pruning, and the other is a training mode of quantization + unstructured pruning + knowledge distillation, which is specifically adopted may be determined according to actual needs, and preferably, the training mode of quantization + unstructured pruning + knowledge distillation may be adopted to obtain a better training effect, etc.

It is noted that while for simplicity of explanation, the foregoing method embodiments are described as a series of acts, those skilled in the art will appreciate that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present disclosure, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required for the disclosure.

In a word, by adopting the scheme of the embodiment of the method disclosed by the invention, the unstructured pruning models, such as the FP32 unstructured pruning model and the INT8 unstructured pruning model, can be obtained by training through the combined use of multiple compression strategies/compression modes, the training time of the unstructured pruning models is shortened, the training efficiency and the model precision are improved, and the land falling possibility of the unstructured pruning model compression technology is greatly improved.

The above is a description of embodiments of the method, and the embodiments of the apparatus are further described below.

Fig. 5 is a schematic structural diagram illustrating a composition of an unstructured pruning model obtaining apparatus 500 according to an embodiment of the present disclosure. As shown in fig. 5, includes: a first processing module 501 and a second processing module 502.

The first processing module 501 is configured to determine a compression method according to a type of an unstructured pruning model to be obtained.

A second processing module 502, configured to obtain the unstructured pruning model by training in combination with the determined compression method.

In the scheme of the embodiment of the device, the corresponding compression mode can be determined according to the type of the unstructured pruning model, and the unstructured pruning model can be trained by combining the determined compression mode, so that the training time is shortened, the training efficiency is improved, and the like.

In one embodiment of the present disclosure, the types of unstructured pruning models to be acquired may include: floating point unstructured pruning models. For example, there may be an FP32 unstructured pruning model.

Accordingly, in an embodiment of the present disclosure, the way for the second processing module 502 to obtain the unstructured pruning model by combining the determined compression mode training may be: the unstructured pruning model is obtained by combining with knowledge distillation training, namely the floating-point type unstructured pruning model can be obtained by combining with knowledge distillation training.

In an embodiment of the disclosure, the second processing module 502 may use the original model without pruning or the like as a teacher model, and use the floating point unstructured pruning model as a student model, to perform knowledge distillation, that is, may use a self-distillation method of distilling the floating point unstructured pruning model using the original model.

Accordingly, in an embodiment of the present disclosure, the way for the second processing module 502 to obtain the unstructured pruning model by combining the determined compression mode training may be: and (3) combining the quantization mode training to obtain the unstructured pruning model, namely combining the quantization mode training to obtain the fixed-point unstructured pruning model.

For the fixed-point unstructured pruning model, in an embodiment of the present disclosure, the mode in which the second processing module 502 combines the determined compression mode to train to obtain the unstructured pruning model may also be: the unstructured pruning model is obtained by combining the training of a quantization mode and a knowledge distillation mode, namely the fixed-point unstructured pruning model can be obtained by combining the training of the quantization mode and the knowledge distillation mode.

In an embodiment of the present disclosure, the original model without pruning and quantization processing may be used as a teacher model, the fixed-point unstructured pruning model may be used as a student model, and knowledge distillation may be performed, that is, a self-distillation method of distilling the fixed-point unstructured pruning model using the original model may be employed.

For a specific work flow of the apparatus embodiment shown in fig. 5, reference is made to the related description in the foregoing method embodiment, and details are not repeated.

In a word, by adopting the scheme of the embodiment of the device disclosed by the disclosure, the unstructured pruning models, such as the FP32 unstructured pruning model and the INT8 unstructured pruning model, can be obtained through training by using a combination of multiple compression strategies/compression modes, the training time of the unstructured pruning models is shortened, the training efficiency and the model precision are improved, and the land falling possibility of the unstructured pruning model compression technology is greatly improved.

The scheme disclosed by the disclosure can be applied to the field of artificial intelligence, in particular to the fields of deep learning and the like. Artificial intelligence is a subject for studying a computer to simulate some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning and the like) of a human, and has a hardware technology and a software technology, the artificial intelligence hardware technology generally comprises technologies such as a sensor, a special artificial intelligence chip, cloud computing, distributed storage, big data processing and the like, and the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge graph technology and the like.

In addition, in the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related users all accord with the regulations of related laws and regulations, and do not violate the good custom of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 601 performs the various methods and processes described above, such as the methods described in this disclosure. For example, in some embodiments, the methods described in this disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the methods described in the present disclosure may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the methods described in the present disclosure.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An unstructured pruning model acquisition method comprises the following steps:

2. The method of claim 1, wherein,

the types include: a floating-point unstructured pruning model;

the training of the determined compression mode to obtain the unstructured pruning model comprises the following steps: and training by combining a knowledge distillation mode to obtain the unstructured pruning model.

3. The method of claim 1, wherein,

the types include: a fixed-point unstructured pruning model;

the training of the determined compression mode to obtain the unstructured pruning model comprises the following steps: and training in a quantitative mode to obtain the unstructured pruning model.

4. The method of claim 1, wherein,

the types include: a fixed-point unstructured pruning model;

the training of the determined compression mode to obtain the unstructured pruning model comprises the following steps: and training by combining a quantification mode and a knowledge distillation mode to obtain the unstructured pruning model.

5. The method of claim 2 or 4, wherein training the unstructured pruning model in conjunction with a knowledge-based distillation mode comprises:

and taking the original model corresponding to the unstructured pruning model as a teacher model, taking the unstructured pruning model as a student model, and carrying out knowledge distillation.

6. An unstructured pruning model acquisition apparatus, comprising: the device comprises a first processing module and a second processing module;

7. The apparatus of claim 6, wherein,

the types include: a floating-point unstructured pruning model;

and the second processing module is trained by combining a knowledge distillation mode to obtain the unstructured pruning model.

8. The apparatus of claim 6, wherein,

the types include: a fixed-point unstructured pruning model;

and the second processing module is combined with a quantitative mode for training to obtain the unstructured pruning model.

9. The apparatus of claim 6, wherein,

the types include: a fixed-point unstructured pruning model;

and the second processing module combines a quantification mode and a knowledge distillation mode to train to obtain the unstructured pruning model.

10. The apparatus of claim 7 or 9,

and the second processing module takes the original model corresponding to the unstructured pruning model as a teacher model and takes the unstructured pruning model as a student model to carry out knowledge distillation.

11. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-5.

13. A computer program product comprising a computer program/instructions which, when executed by a processor, implement the method of any one of claims 1-5.