CN112734007A

CN112734007A - Method and device for acquiring compression model, storage medium and electronic device

Info

Publication number: CN112734007A
Application number: CN202011638666.1A
Authority: CN
Inventors: 潘威滔
Original assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Current assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-04-30

Abstract

The invention discloses a method and a device for acquiring a compression model, a storage medium and an electronic device. Wherein, the method comprises the following steps: acquiring a first parameter of a trained first recognition network model, wherein the first recognition network model is used for recognizing a target element contained in an input image, and the first parameter comprises the number of channels of the first recognition network model; compressing the first parameter according to a preset condition to obtain a second parameter; constructing a second recognition network model of the initial state based on the second parameters, wherein the network structure of the second recognition network model is the same as that of the first recognition network model; training a second recognition network model in an initial state by using N sample images, wherein N is an integer greater than or equal to 1; and determining to obtain a trained second recognition network model under the condition that the output training result reaches a convergence condition. The method solves the technical problem of low identification precision of the obtained compression model.

Description

Method and device for acquiring compression model, storage medium and electronic device

Technical Field

The invention relates to the field of computers, in particular to a method and a device for acquiring a compression model, a storage medium and an electronic device.

Background

The convolutional neural network model occupies a large amount of memory and CPU resources in the inference process due to the huge parameter quantity, so that the requirement of a real scene cannot be met when the convolutional neural network model is deployed at the end side. Such as: the parameter number of the multi-target convolutional neural network detection model is billion as a counting unit, the size of the model is usually over 100MB, but for a low-end model mobile phone or a refrigerator development board (memory <2GB), once the multi-target detection app is called, system resources are occupied, and the end side is blocked or crashed. Therefore, the end side generally requires that the compressed model is at least less than 1/3 of the original model, which means that the biggest difficulty in the end side deployment technology is to ensure that the original model is compressed as much as possible with little loss of accuracy of the convolutional neural network model.

The most dominant method for end-side model compression is now quantization, i.e. the transformation of the convolutional neural network float32 model into int8 model, where

float32 parameter value [ elongation coefficient int8 parameter value + offset value ]

The bias value here is the loss of accuracy of int8 compared to float32, i.e. the original parameter value of float32 is approximated by the stretch coefficient with int 8.

However, compared with the original model, the quantization model mainly reduces the parameter storage amount (int8 requires 8-bit storage, and float32 requires 32-bit storage), but because no change is made to the depth network structure, there is no essential improvement in the computation amount, in other words, the model compression method in the prior art cannot guarantee the use accuracy of the model on the premise of guaranteeing a certain compression strength.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a method and a device for acquiring a compression model, a storage medium and an electronic device, which at least solve the technical problem of low identification precision of the acquired compression model.

According to an aspect of the embodiments of the present invention, there is provided a method for obtaining a compression model, including: acquiring a first parameter of a trained first recognition network model, wherein the first recognition network model is used for recognizing a target element contained in an input image, and the first parameter comprises the number of channels of the first recognition network model; compressing the first parameter according to a preset condition to obtain a second parameter; constructing a second recognition network model of an initial state based on the second parameters, wherein the second recognition network model has the same network structure as the first recognition network model; training the second recognition network model in the initial state by using N sample images, wherein N is an integer greater than or equal to 1; and determining to obtain the trained second recognition network model under the condition that the output training result reaches the convergence condition.

According to another aspect of the embodiments of the present invention, there is also provided an apparatus for obtaining a compression model, including: a first obtaining unit, configured to obtain a first parameter of a trained first recognition network model, where the first recognition network model is used to recognize a target element included in an input image, and the first parameter includes a channel number of the first recognition network model; the compression unit is used for compressing the first parameter according to a preset condition to obtain a second parameter; a construction unit, configured to construct a second recognition network model in an initial state based on the second parameter, where the second recognition network model has a same network structure as the first recognition network model; a training unit, configured to train the second recognition network model in the initial state by using N sample images, where N is an integer greater than or equal to 1; and the determining unit is used for determining the trained second recognition network model under the condition that the output training result reaches the convergence condition.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned method for acquiring a compression model when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the above method for obtaining a compression model through the computer program.

In the embodiment of the present invention, a first parameter of a trained first recognition network model is obtained, where the first recognition network model is used to recognize a target element included in an input image, and the first parameter includes a channel number of the first recognition network model; compressing the first parameter according to a preset condition to obtain a second parameter; constructing a second recognition network model of an initial state based on the second parameters, wherein the second recognition network model has the same network structure as the first recognition network model; training the second recognition network model in the initial state by using N sample images, wherein N is an integer greater than or equal to 1; under the condition that the output training result reaches the convergence condition, the trained second recognition network model is determined, the compressed recognition network model obtained by compressing the number of channels in the first neural network model is utilized, and the purpose of ensuring the recognition precision of the compression model to a certain extent under the condition of reducing the resource occupation of the compression model is further achieved, so that the recognition precision effect of the compression model is improved, and the technical problem that the obtained recognition precision of the compression model is low is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of a flow chart of an alternative compression model acquisition method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of an alternative compression model acquisition method according to an embodiment of the invention;

FIG. 3 is a schematic diagram of an alternative compression model acquisition method according to an embodiment of the invention;

FIG. 4 is a schematic diagram of an alternative compression model acquisition method according to an embodiment of the invention;

FIG. 5 is a schematic diagram of an alternative compression model acquisition method according to an embodiment of the invention;

FIG. 6 is a schematic diagram of an alternative compression model acquisition device according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of an alternative compression model acquisition device according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of an alternative compression model acquisition device according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of an alternative compression model acquisition device according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Optionally, as an optional implementation manner, as shown in fig. 1, the method for obtaining the compression model includes:

s102, acquiring a first parameter of a trained first recognition network model, wherein the first recognition network model is used for recognizing a target element contained in an input image, and the first parameter comprises the number of channels of the first recognition network model;

s104, compressing the first parameter according to a preset condition to obtain a second parameter;

s106, constructing a second recognition network model in the initial state based on a second parameter, wherein the network structure of the second recognition network model is the same as that of the first recognition network model;

s108, training a second recognition network model in an initial state by using N sample images, wherein N is an integer greater than or equal to 1;

and S110, determining to obtain the trained second recognition network model under the condition that the output training result reaches the convergence condition.

Optionally, the obtaining method of the compression model may be, but is not limited to, applied in a scenario of compressing a convolutional neural network model, where the convolutional neural network model may be, but is not limited to, applied in identifying whether an image includes a target element, for example, identifying whether the image includes a target vehicle, and the like.

Optionally, the convolutional neural network may be, but not limited to, a type of feed-forward neural network that includes convolutional calculation and has a deep result, may be, but not limited to, one of the algorithms representing deep learning, has a capability of characterizing learning, and can perform translation invariant classification on input information according to its hierarchical structure. The parameters of the convolutional neural network may include, but are not limited to, intrinsic parameters and variation parameters, where the intrinsic parameters may include, but are not limited to, the number of channels, the number of filters, the number of neurons, and the like, and the variation parameters may include, but are not limited to, learnable weight values, such as the weight values of filters, and the like, the network structure of the convolutional neural network may be, but is not limited to, representing the number and combination of layers in the convolutional neural network, for example, the first layer includes two convolutional layers and one pooling layer, the second layer includes three convolutional layers and one pooling layer, the convolutional layers are provided with functions such as sigmoid, Tanh, ReLU, and the like, the pooling layer is provided with functions such as max pooling or mean pooling, the fully-connected layer is provided with functions such as sigmoid, Tanh, ReLU, and the last layer is provided with functions such as softmax, or cors entry, and the.

It should be noted that, a first parameter of a trained first recognition network model is obtained, where the first recognition network model is used to recognize a target element included in an input image, and the first parameter includes a channel number of the first recognition network model; compressing the first parameter according to a preset condition to obtain a second parameter; constructing a second recognition network model of the initial state based on the second parameters, wherein the network structure of the second recognition network model is the same as that of the first recognition network model; training a second recognition network model in an initial state by using N sample images, wherein N is an integer greater than or equal to 1; and determining to obtain a trained second recognition network model under the condition that the output training result reaches a convergence condition.

Optionally, in this embodiment, the first recognition network model may be, but is not limited to, the recognition network model 202 shown in fig. 2, where the recognition steps of the recognition network model 202 are as follows:

after the trained image input to the recognition network model 202 is calculated and operated by the convolutional layer, the pooling layer and the full link layer, the output layer of the recognition network model 202 finally outputs a recognition result, where the recognition result is used to indicate that the input image contains a target element or does not contain the target element, and optionally, in this embodiment, the target element may be, but is not limited to, a vehicle element.

Optionally, in this embodiment, continuing to take the scenario shown in fig. 2 as an example, the number of channels may be, but is not limited to, used to calculate a feature vector for identifying each layer output in the network model 202, for example, if the feature vector output by the first convolutional layer is "224 × 224 × 64", then "64" in the feature vector may be, but is not limited to, the number of channels of the first convolutional layer, and if the feature vector output by a fully-connected layer is "1 × 1 × 4098", then "4098" in the feature vector may be, but is not limited to, the number of channels (number of neurons) of the fully-connected layer;

further, taking the feature vector output by the first convolutional layer as "224 × 224 × 64" as an example, the feature vector may be, but is not limited to, expressed as "length × width × number of channels" corresponding to "224 × 224 × 64", the number of channels may be reduced, and the output value of the feature vector may be, but is not limited to, directly affected, but the "length × width" in the feature vector is not affected, and further, the effect of reducing the resource occupation space of the recognition network model by reducing the number of channels is achieved, and the effect of reducing the influence on the image recognition accuracy of the recognition network model due to the compression model is also reduced.

For further example, optionally, for example, taking the scenario shown in fig. 2 as an example, continuing with fig. 3, assuming that the recognition network model 302 in fig. 3 is, the above-mentioned obtaining method of the compression model is applied to perform a compression operation on the recognition network model 202, and then the obtained compression model, it is obvious that the network structures (such as the number and combination manner of the convolutional layers and the pooling layers) of the recognition network model 302 and the recognition network model 202 are consistent, and from the output result of the output layer, the feature vectors output by the output layers of both are "1 × 1 × 1000", which may but is not limited to indicate that the functions implemented by both are consistent;

in addition, taking the feature vector output by the first convolutional layer as an example, the feature vector output by the first convolutional layer in the network model 202 is identified as "224 × 224 × 64", and the feature vector output by the first convolutional layer in the network model 302 is identified as "224 × 224 × 32", compared with the feature vector output by the first convolutional layer in the network model 302, the feature map size represented by the feature vector output by the first convolutional layer is half of the feature vector output by the first convolutional layer in the network model 202, so that the resource occupancy of the network model is saved, specifically, the feature vector output by the first convolutional layer in the network model 202 is identified as "224 × 224 × 64", compared with the feature vector output by the first convolutional layer in the network model 302 as "224 × 224 × 32", the value used for representing the number of channels in the feature vector is changed, and the value used for representing the image size "length × width" is not changed, the image size parameters of the identification network model before compression are reserved so as to improve the identification precision of the compressed model;

further, taking the feature vector output by the fully-connected layer as an example, the feature vector output by the fully-connected layer in the recognition network model 202 is "1 × 1 × 4096", and the feature vector output by the fully-connected layer in the recognition network model 302 is "1 × 1 × 2048", and in comparison, the number of feature neurons represented by the feature vector output by the fully-connected layer in the recognition network model 302 is half of the number of feature vectors output by the fully-connected layer in the recognition network model 202, so that the resource occupancy rate of the recognition network model is saved.

According to the embodiment provided by the application, the first parameter of a trained first recognition network model is obtained, wherein the first recognition network model is used for recognizing the target element contained in the input image, and the first parameter comprises the channel number of the first recognition network model; compressing the first parameter according to a preset condition to obtain a second parameter; constructing a second recognition network model of the initial state based on the second parameters, wherein the network structure of the second recognition network model is the same as that of the first recognition network model; training a second recognition network model in an initial state by using N sample images, wherein N is an integer greater than or equal to 1; and under the condition that the output training result reaches the convergence condition, determining to obtain a trained second recognition network model, and utilizing the compressed recognition network model obtained by compressing the number of channels in the first neural network model, so that the aim of ensuring the recognition precision of the compression model to a certain extent under the condition of reducing the resource occupation of the compression model is fulfilled, and the effect of improving the recognition precision of the compression model is further achieved.

As an optional solution, obtaining a first parameter of the trained first recognition network model includes:

a first number of convolution kernels for each convolution layer in the first identified network model is obtained and used as the first parameter.

Optionally, the convolution kernel may be, but is not limited to, that in the image processing process, given an input image, each corresponding pixel in the output image is obtained by weighted averaging of pixels in a region of the input image, where the weight value is defined by a function, which may be, but is not limited to, the convolution kernel, and optionally, the convolution kernel may be, but is not limited to, a filter, in other words, the number of convolution kernels may be, but is not limited to, the number of filters, and the number of channels may be, but is not limited to, the output result of the filter, in other words, the number of channels may be, but is not limited to.

It should be noted that, a first number of convolution kernels of each convolution layer in the first identified network model is obtained, and the first number is used as the first parameter.

Further by way of example, optionally, for example, the scenario shown in fig. 2 is continued to be described with reference to fig. 4, and assuming that the identification network model 402 in fig. 4 is, the above-mentioned method for obtaining a compression model is applied to perform a compression operation on the identification network model 202, so as to obtain the compression model, it is obvious that the network structures (for example, the number and combination manner of the convolutional layers and the pooling layers) of the identification network model 402 and the identification network model 202 are consistent, and from the output result of the output layer, the feature vectors output by the output layers of both are "1 × 1 × 1000", which may but is not limited to indicate that the functions implemented by both are consistent;

further, the difference between the output feature vectors of the convolutional layers of the recognition network model 402 and the recognition network model 202 is that the number of output channels is significantly different, specifically, for example, the number of output channels of the convolutional layer of the first layer of the recognition network model 402 is "32", the number of output channels of the convolutional layer of the first layer of the recognition network model 202 is "64", the number of output channels of the convolutional layer of the second layer of the recognition network model 402 is "64", and the number of output channels of the convolutional layer of the second layer of the recognition network model 202 is "128".

By the embodiment provided by the application, the first number of the convolution kernels of each convolution layer in the first identification network model is obtained and is used as the first parameter, so that the purpose of compressing the number of the convolution kernels with smaller granularity is achieved, and the effect of refining the compression granularity in the compression model process is realized.

As an optional scheme, compressing the first parameter according to a preset condition to obtain the second parameter includes:

and calculating the product value of the first quantity and a preset compression value, taking the product value as the second quantity of the convolution kernels of each convolution layer in the second identification network model, and compressing the product value to be more than 0 and less than 1 in a preset mode.

It should be noted that, a product value of the first number and a preset compression value is calculated, and the product value is used as a second number of convolution kernels of each convolution layer in the second identification network model, and the preset compression is performed until the number is greater than 0 and smaller than 1. Alternatively, the product value may be, but is not limited to, a value obtained by multiplying the first number by the preset compression value.

For further example, as shown in fig. 4, optionally, assuming that the preset compression value is 0.5, the first number of the identification network model 202 is multiplied by 0.5 to achieve the compression effect, and the second number of the compressed identification network model 402 is a product value obtained by multiplying the first number by 0.5, in terms of the number of channels outputting the eigenvector, the number of channels output by the first convolution layer of the identification network model 202 before compression is "128", and the number of channels output by the first convolution layer of the compressed identification network model 402 is a product value obtained by multiplying the number of channels "128" by 0.5, that is "64".

According to the embodiment provided by the application, the product value of the first number and the preset compression value is calculated, the product value is used as the second number of the convolution kernels of each convolution layer in the second identification network model, and the convolution kernels are preset and compressed to be larger than 0 and smaller than 1, so that the purpose of reducing the number of kernels by a preset proportion is achieved, and the effect of improving the flexibility of obtaining the compression model is achieved.

As an alternative, before training the second recognition network model in the initial state by using a plurality of sample images, the method includes:

inputting the N sample images into the trained first recognition network model to obtain N feature vectors output by the first recognition network model, wherein the feature vectors are used for representing the probability that the input sample images contain target elements, and the N feature vectors correspond to the N sample images one to one.

It should be noted that N sample images are input into the trained first recognition network model to obtain N feature vectors output by the first recognition network model, where the feature vectors are used to represent the probability that the input sample images contain target elements, and the N feature vectors correspond to the N sample images one to one.

For further example, optionally, as shown in fig. 2, some original images are collected as a training set (for example, if the original recognition network model 202 is used to perform vehicle recognition on an image, that is, to recognize whether the image contains a vehicle, only a batch of pictures containing the vehicle need to be collected again as the training set);

further, by importing the recognition network model 202 and recognizing the training set, each image may obtain a feature vector of 1 × 1 × N (for example, a feature vector of 1 × 1 × 1000 output by the output layer of the recognition network model 202 in fig. 2), and the feature vector is used as a sample label of each picture, where each image in the training set has a corresponding sample label, and further, but not limited to, the plurality of images in the training set having corresponding sample labels may be used as a training sample set of the second recognition network model.

According to the embodiment provided by the application, N sample images are input into the trained first recognition network model to obtain N characteristic vectors output by the first recognition network model, wherein the characteristic vectors are used for representing the probability that the input sample images contain target elements, the N characteristic vectors correspond to the N sample images one to one, the characteristic labels of the sample images are obtained while the first recognition network model is trained, the purpose of continuously training the second recognition network model by using the characteristic labels is achieved, and the effect of improving the training efficiency of the recognition network model is achieved.

As an alternative, the training of the second recognition network model in the initial state by using N sample images includes:

s1, initializing a third parameter of the second identification network model, wherein the third parameter is a weight value in the second identification network model;

and S2, based on the back propagation algorithm, using the N characteristic vectors and the N sample images corresponding to the N characteristic vectors one by one, and iteratively updating the third parameter of the second identification network model.

Optionally, the back propagation algorithm may be, but is not limited to, a learning algorithm applicable to a multi-layer neuron network, and is established on the basis of a gradient descent method, where an input-output relationship is substantially a mapping relationship, and a function performed by an n-input-m-output neural network is continuous mapping from an n-dimensional euclidean space to a finite field in an m-dimensional euclidean space, where the mapping has high nonlinearity, and its information processing capability is derived from multiple compounding of simple nonlinear functions, so that it has a strong function recurrence capability.

It should be noted that a third parameter of the second recognition network model is initialized, where the third parameter is a weight value in the second recognition network model; and based on a back propagation algorithm, iteratively updating a third parameter of the second recognition network model by using the N characteristic vectors and the N sample images which are in one-to-one correspondence with the N characteristic vectors.

For further example, as shown in fig. 3, optionally, the feature vector output by the recognition network model 202 is used to obtain a training set composed of a plurality of images carrying corresponding sample labels, and based on a back propagation algorithm, the recognition network model 302 is iteratively updated by using the training set to obtain a third parameter, until an output result of the recognition network model 302 reaches a convergence condition, and the updating is stopped to obtain the trained recognition network model 302.

According to the embodiment provided by the application, a third parameter of the second identification network model is initialized, wherein the third parameter is a weight value in the second identification network model; based on a back propagation algorithm, the third parameters of the second recognition network model are updated iteratively by utilizing the N characteristic vectors and the N sample images which are in one-to-one correspondence with the N characteristic vectors, so that the purposes of improving the output stability of the training result of the second recognition network model and reducing the probability of sparse matrix occurrence are achieved, and the effect of improving the recognition accuracy of the trained second recognition network model is achieved.

As an alternative, after determining that the trained second recognition network model is obtained, the method includes:

inputting a target image into a second recognition network model to obtain a target feature vector, wherein the target feature vector is used for representing the target element in the target image.

It should be noted that, the target image is input to the second recognition network model to obtain a target feature vector, where the target feature vector is used to represent the target element in the target image.

For further example, optionally, for example, the trained second recognition network model has the same network structure as the first recognition network model, except that the number of channels of the vector features output by each layer in the model has a proportional relationship (the number of channels of the vector features output by each layer of the second recognition network model is lower than the number of channels of the vector features output by each layer of the first recognition network model), and the weights of each layer in the second recognition network model are obtained by iterative updating and optimization based on the feature vector output by the first recognition network model and a training set used for training the first recognition network model, and furthermore, the consistency of the second recognition network model and the first recognition network model in function is ensured, the resource occupation amount of the second recognition network model is lower than that of the first recognition network model, and on the basis, the recognition precision of the second recognition network model is also ensured.

According to the embodiment provided by the application, the target image is input into the second recognition network model to obtain the target characteristic vector, wherein the target characteristic vector is used for representing the target element in the target image, and the effect of improving the recognition accuracy of the second recognition network model is achieved on the premise that the functional consistency of the second recognition network model and the first recognition network model is ensured and the resource occupation amount of the second recognition network model is lower than that of the first recognition network model.

s1, setting a target activation function at the last layer of the neural network model, wherein the target activation function is used for converting the feature vectors into a recognition result, and the recognition result is used for indicating an image containing target elements;

and S2, inputting the target image into the second recognition network model to obtain a target recognition result, wherein the target recognition result is used for indicating whether the target image contains the target element or not.

It should be noted that, a target activation function is set in the last layer of the neural network model, where the target activation function is used to convert the feature vector into a recognition result, and the recognition result is used to indicate an image containing a target element; and inputting the target image into the second recognition network model to obtain a target recognition result, wherein the target recognition result is used for indicating whether the target image contains the target element or not. Alternatively, the activation function may be, but is not limited to, a function running on a neuron of the artificial neural network responsible for mapping the input of the neuron to an output, such as sigmoid, Tanh, ReLU, etc.

For further example, optionally, for example, taking the scene shown in fig. 3 as an example, the description that is continued with the scenario shown in fig. 5 includes an image import interface 502 and a result export interface 504, where the image import interface 502 and the result export interface 504 may be, but are not limited to, displayed on an end device, specifically, the image import interface 502 may be, but is not limited to, used for importing an input file, where the input file may include, but is not limited to, a video stream file or an image file, and if the input file is a video file, the imported video stream file is first converted into a picture format, and if the input identification network model 302 is an image file, the image file is directly input into the identification network model 302;

further, the result export interface 504 may be, but is not limited to, obtaining a file imported at the image import interface 502, and exporting an output result after the process of identifying the network model 302, such as the number of images in the file that include the target element, or more detailed image data.

According to the embodiment provided by the application, a target activation function is arranged at the last layer of a neural network model, wherein the target activation function is used for converting the feature vector into a recognition result, and the recognition result is used for indicating an image containing a target element; and inputting the target image into the second recognition network model to obtain a target recognition result, wherein the target recognition result is used for indicating whether the target image contains target elements or not, so that the aim of scene application of recognition by the second recognition network model image with strong association with the first recognition network model is fulfilled, and the effect of improving the practicability of the compression model is realized.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the present invention, there is also provided an apparatus for acquiring a compression model, which is used for implementing the method for acquiring a compression model. As shown in fig. 6, the apparatus includes:

a first obtaining unit 602, configured to obtain a first parameter of a trained first recognition network model, where the first recognition network model is used to recognize a target element included in an input image, and the first parameter includes a channel number of the first recognition network model;

a compressing unit 604, configured to compress the first parameter according to a preset condition to obtain a second parameter;

a constructing unit 606, configured to construct a second recognition network model of the initial state based on the second parameter, where the network structure of the second recognition network model is the same as that of the first recognition network model;

a training unit 608, configured to train a second recognition network model in an initial state by using N sample images, where N is an integer greater than or equal to 1;

the determining unit 610 is configured to determine that the trained second recognition network model is obtained when the output training result reaches the convergence condition.

Optionally, the obtaining device of the compression model may be applied, but not limited to, in a scenario of compressing a convolutional neural network model, where the convolutional neural network model may be, but not limited to, applied to identify whether the image includes the target element, for example, identify whether the image includes the target vehicle, or the like.

For further example, optionally, for example, taking the scenario shown in fig. 2 as an example, continuing with fig. 3, assuming that the recognition network model 302 in fig. 3 is a compression model obtained by applying the above-mentioned compression model obtaining device to perform a compression operation on the recognition network model 202, it is obvious that the network structures (such as the number and combination manner of the convolutional layers and the pooling layers) of the recognition network model 302 and the recognition network model 202 are consistent, and from the output result of the output layer, the feature vectors output by the output layers of the two are both "1 × 1 × 1000", which may but is not limited to indicate that the functions implemented by the two are consistent;

As an alternative, as shown in fig. 7, the first obtaining unit 602 includes:

an obtaining module 702 is configured to obtain a first number of convolution kernels of each convolution layer in the first identified network model, and use the first number as the first parameter.

As an alternative, as shown in fig. 8, the compressing unit 604 includes:

the calculating module 802 is configured to calculate a product value of the first number and a preset compression value, and use the product value as a second number of convolution kernels of each convolution layer in the second recognition network model, where the preset compression is greater than 0 and smaller than 1.

It should be noted that, a product value of the first number and a preset compression value is calculated, and the product value is used as a second number of convolution kernels of each convolution layer in the second identification network model, and the preset compression is performed until the number is greater than 0 and smaller than 1.

As an alternative, as shown in fig. 9, the method includes:

a second obtaining unit 902, configured to, before training a second recognition network model in an initial state by using a plurality of sample images, input N sample images into the trained first recognition network model to obtain N feature vectors output by the first recognition network model, where the feature vectors are used to indicate a probability that an input sample image includes a target element, and the N feature vectors are in one-to-one correspondence with the N sample images.

As an alternative, the training unit includes:

the initialization module is used for initializing a third parameter of the second identification network model, wherein the third parameter is a weight value in the second identification network model;

and the updating module is used for iteratively updating the third parameter of the second identification network model by utilizing the N characteristic vectors and the N sample images which are in one-to-one correspondence with the N characteristic vectors based on a back propagation algorithm.

As an alternative, the method comprises the following steps:

and the first input module is used for inputting the target image into the second recognition network model after the trained second recognition network model is determined to obtain a target feature vector, wherein the target feature vector is used for representing the target element in the target image.

As an alternative, the method comprises the following steps:

the setting module is used for setting a target activation function at the last layer of the neural network model after the trained second recognition network model is determined, wherein the target activation function is used for converting the characteristic vector into a recognition result, and the recognition result is used for indicating an image containing target elements;

and the second input module is used for inputting the target image into the second recognition network model after the trained second recognition network model is determined to obtain a target recognition result, wherein the target recognition result is used for indicating whether the target image contains the target elements.

According to another aspect of the embodiment of the present invention, there is also provided an electronic apparatus for implementing the method for acquiring a compression model, as shown in fig. 10, the electronic apparatus includes a memory 1002 and a processor 1004, the memory 1002 stores a computer program, and the processor 1004 is configured to execute the steps in any one of the method embodiments through the computer program.

Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring first parameters of a trained first recognition network model, wherein the first recognition network model is used for recognizing target elements contained in an input image, and the first parameters comprise the number of channels of the first recognition network model;

s2, compressing the first parameter according to preset conditions to obtain a second parameter;

s3, constructing a second recognition network model of the initial state based on the second parameters, wherein the network structure of the second recognition network model is the same as that of the first recognition network model;

s4, training a second recognition network model in an initial state by using N sample images, wherein N is an integer greater than or equal to 1;

and S5, determining to obtain the trained second recognition network model under the condition that the output training result reaches the convergence condition.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 10 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 10 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 10, or have a different configuration than shown in FIG. 10.

The memory 1002 may be configured to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for acquiring a compression model in the embodiment of the present invention, and the processor 1004 executes various functional applications and data processing by running the software programs and modules stored in the memory 1002, that is, implementing the above-described method for acquiring a compression model. The memory 1002 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1002 may further include memory located remotely from the processor 1004, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1002 may be specifically, but not limited to, used to store information such as a first recognition network model, a first parameter, a second parameter, and a second recognition network model. As an example, as shown in fig. 10, the memory 1002 may include, but is not limited to, a first obtaining unit 602, a compressing unit 604, a constructing unit 606, a training unit 608, and a determining unit 610 in the obtaining apparatus of the compression model. In addition, the device may further include, but is not limited to, other module units in the above-mentioned compression model obtaining device, which is not described in this example again.

Optionally, the above-mentioned transmission device 1006 is used for receiving or sending data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1006 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices so as to communicate with the internet or a local area Network. In one example, the transmission device 1006 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In addition, the electronic device further includes: a display 1008 for displaying information such as the first identified network model, the first parameter, the second parameter, and the second identified network model; and a connection bus 1010 for connecting the respective module parts in the above-described electronic apparatus.

According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be substantially or partially implemented in the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, or network devices) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for obtaining a compression model is characterized by comprising the following steps:

acquiring first parameters of a trained first recognition network model, wherein the first recognition network model is used for recognizing target elements contained in an input image, and the first parameters comprise the number of channels of the first recognition network model;

compressing the first parameter according to a preset condition to obtain a second parameter;

constructing a second recognition network model of the initial state based on the second parameters, wherein the second recognition network model has the same network structure as the first recognition network model;

training the second recognition network model in the initial state by using N sample images, wherein N is an integer greater than or equal to 1;

and determining to obtain the trained second recognition network model under the condition that the output training result reaches a convergence condition.

2. The method of claim 1, wherein obtaining the first parameters of the trained first recognition network model comprises:

and acquiring a first number of convolution kernels of each convolution layer in the first identification network model, and taking the first number as the first parameter.

3. The method of claim 2, wherein the compressing the first parameter according to the preset condition to obtain the second parameter comprises:

and calculating a product value of the first number and a preset compression value, and taking the product value as a second number of convolution kernels of each convolution layer in the second identification network model, wherein the preset compression is greater than 0 and less than 1.

4. The method of claim 1, wherein prior to training the second recognition network model for the initial state using the plurality of sample images, comprising:

inputting the N sample images into the trained first recognition network model to obtain N feature vectors output by the first recognition network model, wherein the feature vectors are used for representing the probability that the input sample images contain the target elements, and the N feature vectors are in one-to-one correspondence with the N sample images.

5. The method of claim 4, wherein the training of the second recognition network model of the initial state using the N sample images comprises:

initializing a third parameter of the second recognition network model, wherein the third parameter is a weight value in the second recognition network model;

iteratively updating the third parameter of the second recognition network model based on a back propagation algorithm using the N feature vectors and the N sample images in one-to-one correspondence with the N feature vectors.

6. The method of claim 5, wherein after the determining results in the trained second recognition network model, comprising:

inputting a target image to the second recognition network model to obtain a target feature vector, wherein the target feature vector is used for representing the target element in the target image.

7. The method of claim 5, wherein after the determining results in the trained second recognition network model, comprising:

setting a target activation function at the last layer of the neural network model, wherein the target activation function is used for converting the feature vector into a recognition result, and the recognition result is used for indicating an image containing the target element;

inputting a target image into the second recognition network model to obtain a target recognition result, wherein the target recognition result is used for indicating whether the target image contains the target element or not.

8. An apparatus for obtaining a compression model, comprising:

a first obtaining unit, configured to obtain a first parameter of a trained first recognition network model, where the first recognition network model is used to recognize a target element included in an input image, and the first parameter includes a channel number of the first recognition network model;

the compression unit is used for compressing the first parameter according to a preset condition to obtain a second parameter;

a construction unit, configured to construct a second recognition network model in an initial state based on the second parameter, where the second recognition network model has a same network structure as the first recognition network model;

the training unit is used for training the second recognition network model in the initial state by utilizing N sample images, wherein N is an integer greater than or equal to 1;

and the determining unit is used for determining the trained second recognition network model under the condition that the output training result reaches a convergence condition.

9. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 7.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 7 by means of the computer program.