CN111260052A

CN111260052A - Image processing method, device and equipment

Info

Publication number: CN111260052A
Application number: CN201811459626.3A
Authority: CN
Inventors: 窦则胜; 李�昊; 朱胜火
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-09

Abstract

The embodiment of the invention provides an image processing method, an image processing device and image processing equipment, wherein the image processing method processes an image by using a model, and the model can be obtained by the following steps: obtaining an initial model; determining a compression strategy corresponding to the model, wherein the compression strategy comprises a compression method and a target compression ratio corresponding to the compression method; and (3) compressing the model iteratively by adopting a compression method according to a mode of increasing the compression ratio step by step until a cut-off condition is met, wherein the cut-off condition comprises that the model is compressed to a target compression ratio or the model precision meets a set condition. Therefore, through a gradual compression mode, the model is effectively compressed, and the precision of the model is ensured, so that the compressed model for image processing is obtained.

Description

Image processing method, device and equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to an image processing method, apparatus, and device.

Background

The complexity and portability of the deep neural network, which is one of the foundation stones of the current artificial intelligence, directly influence the application of the artificial intelligence in life.

In recent years, the number of layers of the deep neural network model is increased, and the computational complexity is increased, and the excessive computational complexity generally requires the use of a GPU or a high-performance CPU to operate the deep neural network model. In the practical application of deep learning, the devices such as mobile devices and embedded devices, which are limited in terms of computation, volume, power consumption and the like, also need to apply deep learning technology.

Due to the existing constraints of the devices, the existing high-performance deep neural network model cannot be effectively calculated and applied on the devices. For example, when the trained model is required to be used for performing classification recognition on an image, or the trained model is required to be used for performing natural language translation and part-of-speech tagging, or the trained model is required to be used for performing speech data recognition, if the trained model is too large in size, on one hand, the trained model occupies storage resources of the device, and on the other hand, the trained model also occupies too large computing resources of the device during running, so that normal running of other applications in the device is affected.

Disclosure of Invention

The embodiment of the invention provides an image processing method, device and equipment, which are used for processing an image through a compressed model.

In a first aspect, an embodiment of the present invention provides an image processing method, which processes an image by using a model, where the model may be obtained by:

obtaining an initial model;

determining a compression strategy corresponding to the model, wherein the compression strategy comprises a compression method and a target compression ratio corresponding to the compression method;

and iteratively compressing the model by adopting the compression method in a mode of gradually increasing the compression ratio until a cut-off condition is met, wherein the cut-off condition comprises that the model is compressed to the target compression ratio or the model precision meets a set condition.

In a second aspect, an embodiment of the present invention provides an image processing apparatus for processing an image using a model, including:

the acquisition module is used for acquiring an initial model;

the determining module is used for determining a compression strategy corresponding to the model, wherein the compression strategy comprises a compression method and a target compression ratio corresponding to the compression method;

and the processing module is used for iteratively compressing the model by adopting the compression method in a mode of gradually increasing the compression ratio until a cut-off condition is met, wherein the cut-off condition comprises that the model is compressed to the target compression ratio or the model precision meets a set condition.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a first processor and a first memory, where the first memory is used to store one or more computer instructions, and the one or more computer instructions, when executed by the first processor, implement the image processing method in the first aspect.

An embodiment of the present invention provides a computer storage medium for storing a computer program, where the computer program is used to enable a computer to implement the image processing method in the first aspect when executed.

In a fourth aspect, an embodiment of the present invention provides a speech processing method, which uses a model to identify speech data, where the model is obtained through the following steps:

obtaining an initial model;

In a fifth aspect, an embodiment of the present invention provides a natural language processing method, where a model is used to process natural language information, where the model is obtained through the following steps:

obtaining an initial model;

In summary, in order to run a model in a resource-limited device to implement applications such as processing an image, recognizing voice data, and processing natural language information through the model, after an initial model, a trained model needs to be compressed, for example, in a process of compressing the model from the initial size to a target size, a compression strategy may be set first, for example, what compression methods are sequentially used, and what target compression ratio is for each compression method. Then, a set compression method is adopted, starting from some initial compression ratio, the compression ratio is gradually increased, so that the model is compressed iteratively until the model is compressed to the target compression ratio or the model precision meets the set conditions. After each compression, the compressed model is trained again to enable the precision to meet the requirement, so that the model is effectively compressed and the precision of the model is guaranteed in a gradual compression mode.

In a sixth aspect, an embodiment of the present invention provides an image processing method, where the image processing method uses a model to process an image, where the model may be obtained by:

determining a loss function corresponding to the initial model;

determining gradient values corresponding to all weights of the model according to the loss function;

selecting the weight of the branches to be pruned according to the gradient value;

and pruning the weight to be pruned.

In a seventh aspect, an embodiment of the present invention provides an image processing apparatus, including:

the loss function determining module is used for determining a loss function corresponding to the initial model;

the gradient determining module is used for determining gradient values corresponding to all weights of the model according to the loss function;

the weight selection module is used for selecting the weight to be pruned according to the gradient value;

and the pruning processing module is used for pruning the weight to be pruned.

In an eighth aspect, an embodiment of the present invention provides an electronic device, which includes a second processor and a second memory, where the second memory is used to store one or more computer instructions, and the one or more computer instructions, when executed by the second processor, implement the image processing method in the sixth aspect.

An embodiment of the present invention provides a computer storage medium for storing a computer program, where the computer program is used to enable a computer to implement the image processing method in the sixth aspect when executed.

When the model is pruned by the scheme, the importance of the weight or the contribution degree of the weight to the precision of the model is measured by the gradient value corresponding to the weight, so that the weight contained in the model is pruned by combining the gradient values respectively corresponding to the weights, the weight with lower importance can be pruned, the weight with higher importance is reserved, and the size of the model is compressed under the condition of ensuring the precision of the model, so that the compressed model is operated in a device with limited resources, and the application of image processing and the like is possible.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a model compression method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of model state changes for iteratively compressing a model;

FIG. 3 is a flow chart of another model compression method provided in accordance with an embodiment of the present invention;

FIG. 4 is a schematic diagram of a weight gradient;

FIG. 5 is a flow chart of another model compression method provided in accordance with an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an electronic device corresponding to the model compression apparatus provided in FIG. 6

FIG. 8 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device corresponding to the model compression apparatus provided in fig. 8.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and "a" and "an" generally include at least two, but do not exclude at least one, unless the context clearly dictates otherwise.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a commodity or system that includes the element.

In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.

Fig. 1 is a flowchart of a model compression method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

101. an initial model is obtained.

102. And determining a compression strategy corresponding to the model, wherein the compression strategy comprises a compression method and a target compression ratio corresponding to the compression method.

103. By adopting the compression method, the model is iteratively compressed in a mode of gradually increasing the compression rate until a cut-off condition is met.

Wherein the cutoff condition comprises that the model is compressed to a target compression rate or the model precision meets a set condition.

The model compression method provided by the embodiment can be executed by devices such as a PC, a server, a cloud host, and the like, and the model needing to be compressed can be deployed on the devices.

The model mentioned in this document may be a deep neural network model, etc., and in practical applications, the model may be used in some situations of classification recognition, and in this case, the accuracy of the model may be regarded as the accuracy of the classification result output by the model.

In order to obtain a model with good accuracy, a large number of training samples are used to train the model until the model converges, i.e., the accuracy meets the requirement, wherein the training of the model is mainly to obtain each weight included in the model.

Based on this, the initial model obtained in step 101 is a converged model obtained by training a large number of samples.

It can be understood that, in order to obtain good model accuracy, the size of the model is generally large, for example, a deep neural network model often includes many hidden layers, each hidden layer includes many neurons, and thus, a deep neural network model with good accuracy often includes a large number of weights (which may also be referred to as parameters), which causes a large storage space to be occupied for storing the deep neural network model, and further, more computing resources are required for performing the classification and identification processing through the deep neural network model.

The size of the model mentioned herein can be simply considered as the size of the storage space required for storing the model, i.e. the data size of the model, such as 500 MB.

As mentioned above, the model with such a large size has a certain limitation in devices such as mobile phones, tablet computers, smart wearable devices, etc., which generates the power for compressing the model.

Assuming that a model needs to be compressed from an initial size (for example, 500MB) to a target size (for example, 100MB), when the model is compressed, first, a compression strategy of the model is determined, wherein the compression strategy includes a compression method to be adopted for compression and a target compression ratio corresponding to the compression method; and secondly, iteratively compressing the model by adopting the determined compression method according to a mode of gradually increasing the compression rate until a cut-off condition is met.

For example, in the example of compressing from 500MB to 100MB, the target compression ratio is 20%, that is, 400MB of the model of 500MB is compressed (calculated from 500 × 80% ═ 400).

Wherein, optionally, the compression policy may be set by a user or provided by the device computing. Furthermore, there may be one or more compression strategies.

Specifically, the device may be provided with a plurality of compression methods for compressing the model, such as a pruning method, a quantization method, a low-rank decomposition method, and the like, and the device may determine a total compression ratio according to the initial size and the target size, and further set a combination of different compression methods and a target compression ratio respectively allocated to each compression method under each combination. In short, the device gives possible various compression strategies, each including one or more compression methods and a target compression rate corresponding to each compression method.

Firstly, referring to fig. 2, the meaning of gradually compressing the model, for example, compressing by a pruning method, is intuitively understood, and fig. 2 illustrates that the initial model is sequentially compressed by a compression rate sequence [ a%, b% ], wherein a% is greater than b%.

For example, suppose that a model is compressed from 500MB to 100MB, and suppose that each weight in the model is represented by 32 bits currently, suppose that a certain compression strategy sequentially includes a pruning method and a quantization method, the target compression rate corresponding to the pruning method is 40%, and the target compression rate corresponding to the quantization method is 50%, so that 60% of the weights in the model are pruned by the pruning method, and then each weight in the model output by the pruning method is represented by 16 bits again by the quantization method.

However, it should be noted that, regardless of whether a compression strategy includes one compression method or multiple compression methods, for any one of the compression methods, the model is compressed in a gradual compression manner or an iterative compression manner, until a target compression ratio corresponding to the compression method is reached or the accuracy of the model reaches a set condition, for example, the set condition is greater than a certain threshold.

Still by way of example, the pruning method corresponds to a target compression rate of 40%, and instead of directly pruning 60% of the weight in the model at a time, for example, 10% of the weight in the model may be pruned first; then, the model is retrained until convergence; then cutting off 30% of the weight in the model; then, the model is retrained until convergence; then cutting off 40% of the weight in the model; then, the model is retrained until convergence; then, cutting off 60% of the weight in the model; then, the model is retrained until convergence; and then, performing subsequent compression treatment on the model output by the pruning method by adopting a quantization method, namely compressing the model output by the pruning method by 50% by using the quantization method, thereby achieving the purpose of compressing 80% of the original model.

Wherein, optionally, the weight to be cut off can be selected according to the size of the weight.

It should be noted that, in connection with the above example, the compression strategy is: 60% of the weight of the model is pruned by a pruning method, and then the execution scheme of the compression strategy can be further set under the condition that each weight in the pruned model is quantized from 32 bits to 16 bits by a quantization method. For example, the model may be gradually compressed by a pruning method according to a sequence of compression rates of 90%, 70%, 60%, and 40%, and then each weight in the model may be quantized from 32 bits to 16 bits by a quantization method. This implementation is merely an example, and is for two reasons: firstly, one or more compression methods can be adopted in the compression process of the model; secondly, for any compression method, the model can be compressed in an iteration mode, and the iteration process is mainly embodied as gradual improvement of the compression rate.

However, in the compression process, the execution of the execution plan is not necessarily strictly performed according to one of the set execution plans, but is appropriately fine-tuned according to the accuracy change of the model in the execution process.

Specifically, assuming that a certain compression policy includes a first compression method and a second compression method, the first compression method is executed first, and then the second compression method is executed. Based on this, in the process of iteratively compressing the model, if the model is compressed to the first compression ratio (the first compression ratio is lower than the target compression ratio corresponding to the first compression method) by using the first compression method and then the model accuracy meets the set condition, the compression step size corresponding to the first compression method is reduced, or the model before being compressed by using the first compression method is compressed by using the second compression method. Here, the accuracy of the model satisfying the setting condition means that the accuracy of the model is significantly reduced compared to before, but the accuracy is still higher than the threshold required for convergence.

Assuming that the model before being compressed by the first compression ratio is compressed by adopting the second compression method, and the precision of the model after being compressed to the second compression ratio (the second compression ratio is lower than or equal to the target compression ratio corresponding to the second compression method) meets the set condition, the model before being compressed by the first compression ratio is determined as the model compression result. Here, the accuracy of the model satisfying the setting condition may be, for example, a threshold value required for the accuracy of the model to decrease to converge on the model.

For example, assuming that the initial model which is not compressed yet is referred to as model a, the first compression method is a pruning method, the target compression rate of the pruning method is 40%, and the execution scheme of the pruning method is as follows: the model is progressively compressed according to a sequence of compression rates of 90%, 70%, 60%, 40%. Therefore, the model A is supposed to be compressed according to the compression rate of 90% to obtain a model B, the model B is trained, and the training reaches the convergence condition, such as the precision is higher than the preset threshold value by 0.7; further, the model B is compressed according to the compression rate of 70% to obtain a model C, the model C is trained, and the training reaches a convergence condition; further, the model C is compressed according to a compression ratio of 60% to obtain a model D, the model D is trained, and assuming that the precision of the trained model D is significantly reduced to reach a certain setting condition, for example, the precision is higher than a preset threshold value of 0.7, but the difference between the precision and the preset threshold value is smaller than a preset difference value, for example, 0.05, at this time, the first compression ratio is 60%, if the trained model D is subsequently compressed according to a compression ratio of 40%, the precision may be lower than the preset threshold value of 0.7, at this time, the compression step size may be optionally reduced, for example, the next compression ratio is adjusted to 50%, or the model D after being trained may be directly switched to a second compression method, for example, a quantization method to be compressed.

Suppose that we switch to the quantization method to compress the trained model D. Assuming that the bit number corresponding to the initial weight in the model a is 32 bits, the target compression ratio corresponding to the quantization method is 8 bits, and the execution scheme corresponding to the quantization method is, for example: the quantization is performed to 16 bits first and then to 8 bits. Based on the above assumptions, quantizing each weight in the model D from 32bit to 16bit, assuming that the model E is obtained after quantization, training the model E, assuming that the precision of the model E at this time meets a set condition, for example, the precision is lower than a preset threshold value of 0.7, it indicates that the model D is immediately converted into a compression by a quantization method, and the precision of the model E obtained after compression cannot be guaranteed, and at this time, it is determined that the model D is a compression result corresponding to the model a.

It can be understood that, as mentioned above, there may be a plurality of compression strategies, and after obtaining a plurality of compression results through the plurality of compression strategies, respectively, the best compression result may be selected from the plurality of compression results, i.e. the model has a small size and high accuracy.

In conclusion, the model is effectively compressed and the precision of the model is ensured by a gradual compression mode.

It is mentioned in the foregoing that in the process of compressing the model by the pruning method, the weight to be pruned may be optionally selected according to the size of the weight, and in the following, in conjunction with one model compression process shown in fig. 3, another scheme for determining the weight to be pruned is provided.

Fig. 3 is a flowchart of another model compression method according to an embodiment of the present invention, as shown in fig. 3, the method includes the following steps:

301. an initial model is obtained.

302. And determining a compression strategy corresponding to the model, wherein the compression strategy comprises a pruning method and a target compression ratio corresponding to the pruning method.

303. And determining a loss function corresponding to the model in the current iteration process, and determining gradient values corresponding to all weights of the model according to the loss function.

304. And selecting the weight to be pruned according to the gradient value and the compression ratio corresponding to the current iteration process.

305. And pruning the weight to be pruned.

306. And training the pruned model, and if the training is converged, continuing to execute the next iteration process.

In this embodiment, the selection of the weight to be pruned is performed by combining the gradient value obtained by solving the partial derivative of each weight by the loss function of the model.

Specifically, for example, the model is gradually compressed by a pruning method according to a sequence of compression rates of 90%, 70%, 60% and 40%, assuming that the initial model is referred to as model a, in this case, the loss function of model a is represented as function F1, function F1 is a function of each weight in model a, and function F1 calculates a partial derivative of each weight to obtain a gradient value corresponding to each weight. Since model a needs to be compressed by 90% compression ratio next, this means that 10% of the weight in model a needs to be cut.

The 10% of the weights to be pruned are selected by combining the gradient values corresponding to the weights, respectively, and can be implemented as follows:

calculating importance scores corresponding to the weights respectively, wherein the importance score corresponding to any weight is the weighted sum of the any weight and the corresponding gradient value;

sorting the weights according to the importance scores;

the weight with the lower importance score is selected from the weights according to the compression rate such as 90% as the weight to be pruned, that is, the weight with the lowest importance score of 10% is selected from all the weights as the weight to be pruned.

The weighting coefficients corresponding to the weight and the gradient value may be preset.

And after the weight to be pruned is selected, pruning the weight to be pruned, namely setting the weight to be pruned to be 0. And then training the pruned model hypothesis called model B, if the training is converged, continuing to execute the next iteration process, namely solving a partial derivative of each weight contained in the model B according to a loss function corresponding to the model B to obtain gradient values respectively corresponding to each weight in the model B, and selecting the weight with the minimum importance score of 30% from the weights contained in the model B by combining the gradient values to prune, and so on.

In this embodiment, the reason why the model is pruned in combination with the gradient values corresponding to the weights is that the weights are not necessarily very accurate to reflect the importance of the weights, as shown in fig. 4, the abscissa represents the weights, and the ordinate represents the gradient values, in the figure, the weight w1 is much smaller than the weight w2, but the gradient value T1 corresponding to the weight w1 is much larger than the gradient value T2 corresponding to the weight w 2. Compared with the weight, the gradient value can reflect the importance of the weight better, so that pruning is carried out based on the gradient information, and the precision of the model after pruning is better ensured.

Fig. 5 is a flowchart of another model compression method provided in accordance with an embodiment of the present invention, as shown in fig. 5, the method includes the following steps:

501. and determining a loss function corresponding to the initial model.

502. And determining gradient values corresponding to all weights of the model according to the loss function.

503. And selecting the weight to be pruned according to the gradient value.

504. And pruning the weight to be pruned.

The embodiment shown in fig. 4 describes a specific implementation of the pruning method when it is used, with iterative compression of the model. In this embodiment, the execution process of the pruning method is no longer based on the premise of iteratively compressing the model.

Alternatively, the weight to be pruned may be selected according to the gradient value corresponding to each weight and the set compression rate. The method can be specifically realized as follows:

sorting the weights according to the importance scores;

and selecting the weight with lower importance score from the weights according to the compression rate as the weight to be pruned.

Wherein the compression rate is determined according to the initial size and the target size of the model. The implementation process of the above-mentioned alternative mode can refer to the description in the embodiment shown in fig. 4, and is not described herein again.

In addition, optionally, the weight to be pruned may also be selected according to the gradient value corresponding to each weight, which is specifically implemented as: and for any weight in the weights, if the weight is smaller than a first threshold value and the gradient value corresponding to the weight is smaller than a second threshold value, selecting the weight as the weight to be pruned. That is, among all the weights, the weight itself is selected to be compared, and the corresponding gradient value is also smaller as the weight to be pruned.

In summary, the model compression scheme provided based on the foregoing several optional embodiments can implement compression on an initial model with a larger size, so that the compressed model can be used in a terminal device with limited resources. Several scenarios using the model compressed by the above scheme are exemplarily provided below.

In an alternative embodiment of the present invention, an image processing method for processing an image using a model is provided. The model may be obtained by compressing the initial model by the model compression method provided above.

Alternatively, the model can be used for classification recognition of images, such as recognition of whether an input image contains a human face. In this case, the initial model is obtained by training using a large number of images including faces and a large number of images not including faces as training samples, and after the initial model is obtained, a target compression rate is set in accordance with the resource condition of the terminal device to which the model is to be used, and the initial model is further compressed by the model compression method provided above.

In another alternative embodiment of the present invention, a speech processing method for processing speech data using a model is provided. The model may be obtained by compressing the initial model by the model compression method provided above. Alternatively, the model may be used to identify the voice data, in which case, the initial model is obtained by training using a large number of voice samples as training samples, after obtaining the initial model, a target compression rate is set according to the resource condition of the terminal device for which the model is to be used, and then the initial model is compressed by the model compression method provided above.

In yet another alternative embodiment of the present invention, a natural language processing method for processing natural language information using a model is provided. The model may be obtained by compressing the initial model by the model compression method provided above. Alternatively, the model may be used to translate input natural language text.

The model compression apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these model compression means can be constructed using commercially available hardware components and configured by the steps taught in the present scheme.

Fig. 6 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention, the image processing apparatus is configured to perform, for example, classification recognition processing on an image by using a model, and to obtain the model, as shown in fig. 6, the image processing apparatus includes: the device comprises an acquisition module 11, a determination module 12 and a processing module 13.

And an obtaining module 11, configured to obtain an initial model.

A determining module 12, configured to determine a compression policy corresponding to the model, where the compression policy includes a compression method and a target compression ratio corresponding to the compression method.

And the processing module 13 is configured to iteratively compress the model by using the compression method in a manner of gradually increasing a compression ratio until a cutoff condition is met, where the cutoff condition includes that the model is compressed to the target compression ratio or that the model accuracy meets a first set condition.

Optionally, the compression policy includes a plurality of compression methods and target compression rates corresponding to the plurality of compression methods.

Optionally, the processing module 13 may be configured to: if the accuracy of the model meets the set condition after the model is compressed to a first compression ratio by adopting a first compression method, and the first compression ratio is lower than the target compression ratio corresponding to the first compression method, reducing the compression step length corresponding to the first compression method, or switching to compressing the model before the model is compressed by adopting a second compression method.

Optionally, the processing module 13 may be further configured to: and if the accuracy of the model meets the set condition after the model is compressed to a second compression ratio by the second compression method, and the second compression ratio is lower than or equal to the target compression ratio corresponding to the second compression method, determining the model before being compressed by the first compression ratio as a model compression result.

Optionally, the compression method is a pruning method, and the processing module 13 may further be configured to:

determining a corresponding loss function of the model in the current iteration process; determining gradient values corresponding to all weights of the model according to the loss function; selecting the weight to be pruned according to the gradient value and the compression ratio corresponding to the current iteration process; and pruning the weight to be pruned.

Optionally, the processing module 13 may be further configured to: calculating importance scores corresponding to the weights respectively, wherein the importance score corresponding to any weight is the weighted sum of the any weight and the gradient value corresponding to the any weight; sorting the weights according to the importance scores; and selecting the weight with lower importance score from the weights according to the compression rate as the weight to be pruned.

The apparatus shown in fig. 6 can perform the method of the embodiment shown in fig. 1-3, and the detailed description of this embodiment can refer to the related description of the embodiment shown in fig. 1-3. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to fig. 3, and are not described herein again.

In one possible design, the structure of the image processing apparatus shown in fig. 6 may be implemented as an electronic device, which may be a PC, a server, a cloud host, or other various devices. As shown in fig. 7, the electronic device may include: a first processor 21 and a first memory 22. Wherein the first memory 22 is used for storing a program that supports an electronic device to execute the model compression method provided in the embodiments shown in fig. 1-3, and the first processor 21 is configured to execute the program stored in the first memory 22.

The program comprises one or more computer instructions which, when executed by the first processor 21, are capable of performing the steps of:

obtaining an initial model;

and iteratively compressing the model by adopting the compression method in a mode of gradually increasing the compression ratio until a cut-off condition is met, wherein the cut-off condition comprises that the model is compressed to the target compression ratio or the precision of the model meets a set condition.

Optionally, the first processor 21 is further configured to perform all or part of the steps in the embodiments shown in fig. 1 to 3.

In addition, the first processor 21 is further configured to process an input image according to a model obtained after compression.

The electronic device may further include a first communication interface 23, which is used for the electronic device to communicate with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the model compression method in the method embodiments shown in fig. 1 to 3.

Fig. 8 is a schematic structural diagram of another image processing apparatus according to an embodiment of the present invention, which is configured to perform, for example, classification recognition processing on an image by using a model, and to obtain the model, as shown in fig. 8, the apparatus includes: a loss function determination module 31, a gradient determination module 32, a weight selection module 33, and a pruning processing module 34.

And a loss function determining module 31, configured to determine a loss function corresponding to the initial model.

And a gradient determining module 32, configured to determine, according to the loss function, gradient values corresponding to the weights of the model respectively.

And a weight selection module 33, configured to select a weight to be pruned according to the gradient value.

And the pruning processing module 34 is configured to prune the weight to be pruned.

Optionally, the weight selection module 33 may be configured to: and selecting the weight to be pruned according to the gradient value and the set compression rate. At this time, optionally, the weight selecting module 33 may be specifically configured to: calculating importance scores corresponding to the weights respectively, wherein the importance score corresponding to any weight is the weighted sum of the any weight and the gradient value corresponding to the any weight; sorting the weights according to the importance scores; and selecting the weight with lower importance score from the weights according to the compression rate as the weight to be pruned.

Optionally, the weight selection module 33 may be configured to: and for any weight in the weights, if the weight is smaller than a first threshold value and the gradient value corresponding to the weight is smaller than a second threshold value, selecting the weight as the weight to be pruned.

The apparatus shown in fig. 8 can perform the method of the embodiment shown in fig. 5, and reference may be made to the related description of the embodiment shown in fig. 5 for a part of this embodiment that is not described in detail. The implementation process and technical effect of the technical solution are described in the embodiment shown in fig. 5, and are not described herein again.

In one possible design, the structure of the image processing apparatus shown in fig. 8 may be implemented as an electronic device, which may be a device such as a PC, a server, a cloud host, or the like. As shown in fig. 9, the electronic device may include: a second processor 41 and a second memory 42. Wherein the second memory 42 is used for storing a program that supports an electronic device to execute the model compression method provided in the embodiment shown in fig. 5, and the second processor 41 is configured to execute the program stored in the second memory 42.

The program comprises one or more computer instructions which, when executed by the second processor 41, are capable of performing the steps of:

determining a loss function corresponding to the model;

and pruning the weight to be pruned.

Optionally, the second processor 41 is further configured to perform all or part of the steps in the foregoing embodiment shown in fig. 5.

The electronic device may further include a second communication interface 43 for communicating with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for an electronic device, which includes a program for executing the model compression method in the embodiment of the method shown in fig. 5.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. An image processing method for processing an image using a model obtained by:

obtaining an initial model;

2. The method of claim 1, wherein the compression strategy comprises a plurality of compression methods and target compression rates corresponding to the plurality of compression methods.

3. The method of claim 2, the step of iteratively compressing the model, comprising:

if the model precision meets the set condition after the model is compressed to a first compression ratio by adopting a first compression method, and the first compression ratio is lower than the target compression ratio corresponding to the first compression method, the compression step length corresponding to the first compression method is reduced, or the model before being compressed by the first compression ratio is compressed by adopting a second compression method.

4. The method of claim 3, further comprising:

and if the model precision meets the set condition after the model is compressed to a second compression ratio by adopting the second compression method, and the second compression ratio is lower than or equal to the target compression ratio corresponding to the second compression method, determining the model before being compressed by the first compression ratio as a model compression result.

5. The method of claim 1, the compression method being a pruning method, the step of iteratively compressing the model comprising:

determining a corresponding loss function of the model in the current iteration process;

selecting the weight to be pruned according to the gradient value and the compression ratio corresponding to the current iteration process;

and pruning the weight to be pruned.

6. The method of claim 5, the step of selecting weights to prune comprising:

calculating importance scores corresponding to the weights respectively, wherein the importance score corresponding to any weight is the weighted sum of the any weight and the gradient value corresponding to the any weight;

sorting the weights according to the importance scores;

7. A method of speech processing, speech data being identified using a model, the model being obtained by:

obtaining an initial model;

8. A natural language processing method for processing natural language information using a model obtained by:

obtaining an initial model;

9. An image processing method for processing an image using a model obtained by:

determining a loss function corresponding to the initial model;

and pruning the weight to be pruned.

10. The method of claim 9, the selecting weights to prune according to the gradient values, comprising:

and selecting the weight to be pruned according to the gradient value and the set compression rate.

11. The method of claim 10, the selecting the weight to prune according to the gradient values and a set compression ratio, comprising:

sorting the weights according to the importance scores;

12. The method of claim 9, the selecting weights to prune according to the gradient values, comprising:

and for any weight in the weights, if the weight is smaller than a first threshold value and the gradient value corresponding to the weight is smaller than a second threshold value, selecting the weight as the weight to be pruned.

13. An image processing apparatus that processes an image using a model, comprising:

the acquisition module is used for acquiring an initial model;

14. An electronic device, comprising: a memory, a processor; wherein the memory is to store one or more computer instructions, wherein the one or more computer instructions, when executed by the processor, implement the image processing method of any of claims 1 to 6.