WO2022057262A1 - 图像识别方法、装置及计算机可读存储介质 - Google Patents

图像识别方法、装置及计算机可读存储介质 Download PDF

Info

Publication number
WO2022057262A1
WO2022057262A1 PCT/CN2021/089861 CN2021089861W WO2022057262A1 WO 2022057262 A1 WO2022057262 A1 WO 2022057262A1 CN 2021089861 W CN2021089861 W CN 2021089861W WO 2022057262 A1 WO2022057262 A1 WO 2022057262A1
Authority
WO
WIPO (PCT)
Prior art keywords
input channel
image recognition
input
current layer
feature map
Prior art date
Application number
PCT/CN2021/089861
Other languages
English (en)
French (fr)
Inventor
尹文枫
董刚
赵雅倩
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US18/011,512 priority Critical patent/US20230334632A1/en
Publication of WO2022057262A1 publication Critical patent/WO2022057262A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present application relates to the technical field of image processing, and in particular, to an image recognition method, apparatus, and computer-readable storage medium.
  • machine vision is the use of machines instead of the human eye for measurement and judgment. It uses machine vision products, namely image capture devices such as CMOS (Complementary Metal Oxide Semiconductor) and CCD (Charge-coupled Device, charge-coupled device). components) to convert the captured target into an image signal, and transmit it to a special image processing system to obtain the morphological information of the captured target, and convert it into a digital signal according to the pixel distribution, brightness, color and other information; the image system performs various The operation extracts the characteristics of the target, and then controls the operation of the equipment in the field according to the result of the judgment. It can be seen that a large part of the work of machine vision in the realization process is image processing, and the recognition accuracy and recognition efficiency of the images collected by the image capture device have a great impact on the performance of machine vision.
  • image capture devices such as CMOS (Complementary Metal Oxide Semiconductor) and CCD (Charge-coupled Device, charge-coupled device). components
  • the present application provides an image recognition method, device and computer-readable storage medium, which effectively improve the image recognition efficiency and reduce the computing resources consumed in the image recognition process.
  • One aspect of the embodiments of the present invention provides an image recognition method, including:
  • the acquired image to be recognized is input into the image recognition model, and an image recognition result of the image to be recognized is obtained.
  • using the kernel set construction method to first obtain the update weight of the convolution kernel includes:
  • the sampling probability of each input channel multiple rounds of sampling are performed on the corresponding input channel, and each round according to the sampling probability, the input channel set of the current layer is sampled multiple times to obtain a core set, and the corresponding channel core set is calculated and accumulated.
  • the update value of the convolution kernel weight of the current layer is obtained by calculating an optimization function that minimizes the feature map reconstruction error.
  • determining the sampling probability of each input channel by calculating the weighted importance function of each input channel and its sum function includes:
  • s i (x) is the weighted importance function of the ith input channel
  • wi (x) is the importance weighting coefficient of the ith input channel
  • gi (x) is the initial value of the ith input channel importance function
  • m l-1 is the 1-1th convolutional layer of the original neural network model.
  • total number of output channels is the mean value of the maximum value of the Frobenius norm of the feature maps of all input channels
  • a l is the number of compressed input channels to be achieved by the target of the first convolutional layer of the original neural network model
  • t is the sum function
  • obtaining the update value of the convolution kernel weight of the current layer by calculating an optimization function that minimizes the reconstruction error of the feature map includes:
  • Y k is the output feature map of the uncompressed convolution kernel in the kth output channel
  • K is the total number of output channels of the current layer convolution kernel
  • the feature map reconstruction errors are separately calculated and summarized,
  • S is the input of the sample image data
  • a kernel set consisting of a input channels sampled from the C input channels of the set is the sum of the feature map x i of each input channel in the kernel set S in the output feature map of the kth output channel of the corresponding channel of the convolution kernel
  • * is the convolution operation.
  • calculating the combination of input channels with the smallest reconstruction error and trimming redundant input channels includes:
  • the output feature map of the compressed convolution kernel is calculated, so that the convolution result of the compressed convolution kernel of the current layer and the sample image data set is used as the input data of the next convolution layer of the current layer.
  • calculating the input channel combination with the smallest reconstruction error of the output feature map based on the hit probability set includes:
  • the pre-stored optimization function relationship is called to calculate the input channel combination ⁇ with the smallest reconstruction error of the output feature map.
  • the optimization function relationship is:
  • Y is the output feature map of the original neural network model in the current layer
  • K is the total number of output channels of the convolution kernel of the current layer
  • ⁇ i is whether the ith channel is selected
  • ⁇ i is 0
  • ⁇ i
  • 1 ⁇ i ⁇ C ⁇ is the best sampling result of sampling a input channel from C input channels, satisfying the condition
  • 0 a
  • X i is the sample the feature map of the i-th input channel of the image dataset in the current layer
  • (1-q) ⁇ is a vector constructed by using the sampling probability of each input channel as a penalty factor and adds the penalty term in the optimization objective function
  • q is the hit probability set.
  • determining the probability that each input channel of the current layer is selected according to the updated weight of the convolution kernel of the current layer includes:
  • w i (x) is the importance weighting coefficient of the ith input channel
  • m l-1 is the total number of output channels of the 1-1th convolutional layer of the original neural network model
  • X is the sample image data set
  • x i is the feature map of each sample data in the i-th input channel in the sample image data set
  • K is the total number of output channels of the current layer convolution kernel
  • n l is the total number of input channels in the l layer of the original neural network model.
  • an image recognition device including:
  • the neural network model compression module is used to input the sample image data set into the original neural network model; for each convolutional layer of the original neural network model, the feature map of the sample image data set in the current layer is used as the reconstruction
  • the goal is to use the kernel set construction method to first obtain the update weights of the convolution kernels, then calculate the input channel combination with the smallest reconstruction error and cut the redundant input channels as the compression result of the current layer;
  • the compression results are spliced to generate an image recognition model;
  • the image recognition module is used for inputting the acquired image to be recognized into the image recognition model to obtain the image recognition result of the image to be recognized.
  • An embodiment of the present invention further provides an image recognition apparatus, including a processor, which is configured to implement the steps of the image recognition method described in any preceding item when executing the computer program stored in the memory.
  • the embodiments of the present invention further provide a computer-readable storage medium, where an image recognition program is stored on the computer-readable storage medium, and when the image recognition program is executed by a processor, the image recognition method described in any preceding item is implemented A step of.
  • the advantage of the technical solution provided by the present application is that the network compression processing including the convolution kernel weight estimation process and the channel clipping process is performed on each convolutional layer of the original neural network model in turn, so as to obtain an image for performing the image recognition task. Identify the model. Since the image recognition model is obtained by compressing the original neural network model, it can effectively reduce the redundancy of the original neural network model parameters, reduce the amount of data processed by the model to perform tasks, effectively reduce the computational resources consumed by image classification and recognition, and improve the image quality. Classification and recognition speed; the compressed network model does not need to be retrained, and the operation is more convenient, and the entire compression process runs in the process of inputting the classified image into the neural network for forward reasoning.
  • the output feature map of each layer of the network is the reconstruction target, and the new weights of the convolution kernel of each layer and the redundant convolution kernel channels are obtained through the construction of the kernel set, so as to avoid the difference between the construction result of the kernel set and the images of different classifications.
  • the distribution is correlated, which effectively improves the generalization ability of the model.
  • the embodiments of the present invention also provide a corresponding implementation device and a computer-readable storage medium for the image recognition method, which further makes the method more practical, and the device and the computer-readable storage medium have corresponding advantages.
  • FIG. 1 is a schematic diagram of a compression process flow of a neural network model in the prior art provided by an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for performing an image recognition task using a network compression method in the prior art provided by an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of an image recognition method according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of another image recognition method provided by an embodiment of the present invention.
  • FIG. 5 is a structural diagram of a specific implementation manner of an image recognition device provided by an embodiment of the present invention.
  • FIG. 6 is a structural diagram of another specific implementation manner of an image recognition apparatus provided by an embodiment of the present invention.
  • the deep neural network can be compressed, and then the compressed neural network model can be used to perform image classification and recognition tasks, which can effectively improve the efficiency of model output results.
  • Neural network compression solves the application bottleneck of deep learning in devices with limited computing and storage resources. Neural network compression and pruning can reduce the amount of parameters and computation of neural network models, thereby reducing the amount of storage occupied by neural network storage and improving the computational speed of neural network inference.
  • the neural network pruning method is divided into coarse-grained pruning and fine-grained pruning according to the processing object.
  • Coarse-grained pruning also known as structured pruning, compresses the filter-level, channel-level, row- or column-level structure of a neural network.
  • Fine-grained pruning is also known as unstructured pruning, which can filter and remove individual weights.
  • the advantage is that the accuracy of the neural network can be preserved to the greatest extent.
  • the disadvantage is that the sparse matrix calculation of unstructured pruning relies on specialized runtime libraries. and hardware devices.
  • the structured pruning method has received more attention and application due to its hardware friendliness.
  • the traditional algorithm flow of neural network compression pruning includes the iteration of pruning and fine-tuning, as shown in Figure 1, the operation of this algorithm is very time-consuming.
  • some studies have proposed some neural network compression pruning methods that do not rely on the fine-tuning process to compensate for the accuracy loss caused by compression.
  • the existing pruning method CPA Choannel Pruning for Accelerating Very Deep Neural Networks
  • CPA Completes channel compression and weight parameter reconstruction during the forward inference process of the neural network. Small loss of precision.
  • the input channel of the convolutional layer of the neural network is screened layer by layer based on LASSO (Least absolute shrinkage and selection operator, lasso algorithm) regression, and then a new convolution is obtained by using the least squares method to optimize the feature map reconstruction error.
  • the kernel weight parameters are directly used as the convolution kernel weight parameters of the compressed network without fine-tuning to obtain new convolution kernel weight parameters.
  • the compression effect is dependent on the input data, and the difference of the input data set will cause the change of the screening result based on the importance measurement rule of magnitude.
  • a data-independent network compression method DINP (Data-Independent Neural Pruning via Coresets) has been developed, which is based on the neural network VC dimension (Vapnik-Chervonenkis dimension) and kernel set theory.
  • the trade-off provides a theoretical basis.
  • the DINP method uses the OSCC algorithm (offline and streaming corset constructions, offline streaming kernel set construction algorithm) to construct the kernel set for the hidden layer neurons of the fully connected layer.
  • the OSCC algorithm provides the calculation formula of the kernel set size based on the data VC dimension.
  • the DINP method gives the neuron importance measurement rule and the sampling probability of the kernel set based on the upper bound of the activation function value.
  • the kernel set construction result of DINP has the characteristic of Data-Independent.
  • the DINP method is only for the fully connected layer and not suitable for the convolutional layer, so it has a limited compression rate for the deep convolutional neural network, and the range of applicable network models is also limited.
  • the network is not yet applicable, such as the yolo (You Only Look Once) series of neural networks and the fully convolutional neural network R-FCN (Region-based Fully Convolutional Networks).
  • the early application of kernel set theory in neural network compression is data-dependent.
  • there are also methods to apply kernel set theory to low-rank decomposition to achieve compression of convolutional layers and fully connected layers the construction of The kernel set itself does not compress the neural network.
  • the method flow is shown in Figure 2. It is necessary to use the sample data to train the original neural network model, and then use the above neural network compression method to compress the original neural network to generate The compressed model is then used to train the compressed model again with the training sample data to restore the image recognition accuracy, and finally the trained compressed model is used to perform the image classification task.
  • the training process of neural network compression itself needs to cyclically repeat the process of compressing the network layer by layer and retraining the network.
  • the two model training processes undoubtedly increase a large amount of data processing.
  • the model training process requires a lot of computing resources and is cumbersome to operate.
  • this application proposes a neural network compression process that does not require retraining and does not rely on input data for structured pruning.
  • the convolution kernel parameter estimation and compression pruning based on the kernel set theory are performed layer by layer in sequence. This process runs in the process of inputting the classified image into the neural network for forward reasoning.
  • the output feature map of the input classified image in each layer of the original neural network is used as the reconstruction target, and the method based on the construction of the kernel set is used to obtain each image.
  • New weights for layer convolution kernels and pruning redundant convolution kernel channels This process designs a new way of kernel set construction to avoid the correlation between the kernel set construction results and the distribution of different classified images, and to improve the generalization ability.
  • FIG. 3 is a schematic flowchart of an image recognition method provided by an embodiment of the present invention.
  • An embodiment of the present invention may include the following contents:
  • S301 Input the sample image data set into the original neural network model in advance, take the feature map of the sample image data set in the current layer as the reconstruction target for each convolutional layer of the original neural network model, and use the kernel set construction method to first obtain The updated weight of the convolution kernel is calculated, and the combination of input channels with the smallest reconstruction error is calculated and the redundant input channels are cropped as the compression result of the current layer; the compression results of each convolution layer are spliced to generate an image recognition model.
  • the original neural network model is any type of trained neural network model, for example, a neural network model of a supervised learning network type, a neural network model of an unsupervised learning network type, or a neural network model of an unsupervised learning network type. It is a neural network model of an associative learning network type, or a neural network model of an optimal application network type.
  • the original neural network model can be, for example, a Hopfield network (Hopfield network), a convolutional neural network (CNN) Networks), Inverse Graph Network DN (Deconvolutional Networks), Generative Adversarial Networks GAN (Generative Adversarial Networks), Periodic Neural Network RNN (Recurrent Neural Networks), etc., all of which do not affect the implementation of this application.
  • the sample image data set may be the training data used in the training process of the original neural network model, or may not be the data used in the training process, which does not affect the implementation of the present application.
  • the image recognition model is a compressed network model of the original neural network model.
  • the neural network layers are compressed layer by layer, and the convolution kernel weight estimation operation is first performed in the compression process of each layer. Directly obtain the weight parameters of the complete convolution kernel of the current layer, and then perform the channel cropping operation, filter the input channel and crop the corresponding convolution kernel output channel of the previous layer, and combine the compressed convolution kernel of the current layer with the input data. The convolution result is used as the input data of the next layer. After the compression operation is performed on all the convolutional layers of the original neural network model, the resulting compressed network layers are spliced together to form a complete compressed neural network model.
  • the compression process of the original neural network model is limited by the order of execution, that is, the order in which the convolution kernel weight estimation operation is performed first and then the channel clipping operation is performed, and the convolution kernel weight estimation operation and the channel clipping operation are performed.
  • the core set construction theory is used in all of them. The core set construction theory can be found in the relevant literature records, and will not be repeated here.
  • S302 Input the acquired image to be recognized into an image recognition model to obtain an image recognition result of the image to be recognized.
  • the image to be recognized may be one image or multiple images, and the image type is not limited.
  • the image recognition model is used to perform image classification and recognition operations, and the image recognition model will match the input data, that is, the image to be recognized. label, and then output the label type as the image recognition result.
  • each convolutional layer of the original neural network model is sequentially subjected to network compression processing, including a convolution kernel weight estimation process and a channel clipping process, to obtain the image recognition task.
  • Image recognition model because the image recognition model is obtained after the original neural network model is compressed, it can effectively reduce the redundancy of the original neural network model parameters, reduce the amount of data processed by the model to perform tasks, and effectively reduce the computational cost of image classification and recognition. resources, improve the speed of image classification and recognition; the compressed network model does not need to be retrained, and the operation is more convenient, and the entire compression process runs in the process of classifying images into the neural network for forward reasoning.
  • the output feature map of the image in each layer of the original neural network is the reconstruction target, and the new weights of the convolution kernels of each layer and the redundant convolution kernel channels are obtained through the method based on the kernel set construction, so as to avoid the kernel set construction results and the convolution kernel channel.
  • the distribution of different classified images is correlated, which effectively improves the generalization ability of the model.
  • the image recognition model is compressed in the image recognition process to reduce the amount of data processing, reduce the computing resources consumed by the image classification and recognition, and improve the image recognition efficiency.
  • the technical solution of the present application can also be used for, but not limited to, the image feature extraction stage in the process of executing the target segmentation task or the target detection task, thereby improving the target segmentation efficiency, improving the target detection efficiency, and reducing the computer consumption during the execution of the task. resource.
  • the image is two-dimensional data
  • the convolution kernel of the neural network model in the above-mentioned embodiment is described by taking the two-dimensional convolution kernel as an example, and the present application is also applicable to the convolution of one-dimensional convolution.
  • Kernel compression applies the extended and compressed deep neural network model to the field of one-dimensional sequence data processing, such as the abnormal classification task of one-dimensional data such as medical ECG signals. That is to say, in the process of performing classification and recognition tasks of one-dimensional data, such as medical ECG and heart sound signals, the one-dimensional physiological signal data can be used as a sample data set to train the original neural network model, and each volume of the original neural network model can be trained.
  • the accumulation layer takes the feature map of the sample data set in the current layer as the reconstruction target, and uses the kernel set construction method to first obtain the update weights of the one-dimensional convolution kernel, and then calculate the input channel combination with the smallest reconstruction error and trim the redundant ones.
  • the input channel is used as the compression result of the current layer; the compression results of each convolutional layer are spliced to generate a physiological signal recognition model.
  • the acquired one-dimensional physiological signal data is input into the physiological signal identification model, and the identification and classification result of the physiological signal to be identified is obtained.
  • the present application can be further extended to convolution kernel compression of three-dimensional convolution.
  • the expanded and compressed deep neural network model is applied to the field of three-dimensional sequence data processing, for example, it can be applied to three-dimensional data such as medical three-dimensional CT images.
  • it can also be applied to application scenarios such as action recognition in the field of video processing.
  • the present application can also introduce a multi-round sampling processing mechanism in the convolution kernel weight estimation operation, and the input channel is sampled according to the probability to construct the channel kernel set, On the results of multiple rounds of sampling, the goal is to minimize the feature map reconstruction error to generate weights for the compressed convolution kernel. Due to the randomness of sampling, the convolution kernel generated in the case of multiple channels without repeated sampling The weights have adaptability to different channel selection results, that is, an implementation of S301 may include the following:
  • the object constructed by the kernel set is the input channel of the convolutional layer, not a single neuron; 2) The VC dimension of the constructed kernel set can be reduced to 1, without the need to be equal to the neural network of the fully connected layer of the l-1th layer.
  • the number of elements d or other high-dimensional values specifically, when constructing a kernel set S for the lth convolutional layer, the present application constructs a to-be-sampled set P whose VC dimension is equal to 1, and directly converts the four-dimensional convolution kernel of the lth layer
  • the parameter tensor of the tensor along an input channel or the 3D feature map tensor output by the l-1 layer along the parameter tensor of an output channel is used as a sample of the set P, then the number of samples of the set P is equal to the l-th layer convolution
  • the importance calculation formula of each sample in the set P specifies the target compression channel number a as the dimension of the target kernel set S, and then The importance sum function t is associated with the target compression channel number a, that is, the dimension of the target kernel set S, so that the upper bound of the importance sum
  • A1 Determine the importance of each input channel of the current layer according to the feature map of each input channel of each sample data in the sample image data set in the current layer.
  • A2 Set the importance weighting coefficient for each input channel according to the importance of each input channel
  • A3 Determine the sampling probability of each input channel by calculating the weighted importance function of each input channel and its sum function
  • A4 Perform multiple rounds of sampling on the corresponding input channel according to the sampling probability of each input channel, sample the input channel set of the current layer multiple times according to the sampling probability in each round to obtain a core set, calculate and accumulate the feature maps corresponding to the channel core set Reconstruction error, by calculating the optimization function that minimizes the reconstruction error of the feature map to obtain the updated value of the convolution kernel weight of the current layer.
  • the initial importance function g i (x) of the input channel is calculated according to the input data of the lth layer, that is, the sample image data set. Then, according to the convolution kernel of the first layer, the importance weighting coefficient wi (x) is assigned to the importance of each input channel of the first layer, and wi (x) is the weighting coefficient constructed for the non-uniform sampling of each input channel.
  • the operation of assigning the weighting coefficients is to calculate the L 1 norm
  • the construction result of the kernel set S will have the advantage of not depending on the input data, because the currently constructed kernel set S is not selected under a specific data distribution result.
  • calculate the weighted importance function s i (x) of all input channels and its sum function t that is, you can call the pre-stored importance function relationship to calculate the weighted importance function of each input channel, the ith input channel.
  • the importance function relation can be expressed as:
  • s i (x) is the weighted importance function of the ith input channel
  • wi (x) is the importance weighting coefficient of the ith input channel
  • gi (x) is the initial value of the ith input channel importance function
  • a l is the number of compressed input channels to be achieved by the lth convolutional layer of the original neural network model
  • m l-1 is the number of output channels of the l-1th layer of the neural network.
  • t is a very important parameter in the transformation of the core set construction algorithm from theory to practical operation, and t affects the lower bound
  • the kernel set construction algorithm OSCC proves that when a subset S is randomly sampled from the set P according to the probability, if the c ⁇ 1, c is a constant, g i (x) is a non-negative function, the error parameter ⁇ (0,1), where d represents the VC dimension of the set P, then the kernel set S becomes the set P query with probability 1- ⁇
  • the definition of ⁇ -coreset can be found in the original text of OSCC algorithm. This theorem shows that the value of t has a guiding role in the setting of the dimension
  • the dimension mh 2 of the input data of the l-1 layer activation function m is the number of output channels, h is the size of the output feature map. If the different input channels of the first convolutional layer do not share the convolution kernel, that is, the convolution kernel is a four-dimensional tensor n ⁇ m ⁇ k ⁇ k, for different channels of the input data, the parameter values of the same output channel of the convolution kernel If different, the dimension of the convolution kernel parameter is equal to nmk 2 .
  • the complexity of the core set construction algorithm is roughly equal to the amount of parameters required to define a query
  • the amount of parameters required to perform a query for a set P with a VC dimension of d is at least d+1
  • reducing the VC dimension of the set P can reduce
  • the method proposed in this application is used to compress the input channel of the first convolution layer
  • the VC dimension d l-1 of the activation function of the l-1 layer and the VC dimension d of the convolution kernel parameter of the lth layer will be affected at the same time.
  • l which requires the kernel set dimension
  • the sum of importance functions t constructed in this application not only has adjustable upper and lower bounds, but also enables the target number of compressed channels a to satisfy the constraints of the two convolution layers on the dimension of the target kernel set at the same time.
  • the approximate error ⁇ between the currently constructed target kernel set S and set P can also be estimated, which can be used as a reference for side evaluation of the compression effect index.
  • each round samples the input channel set P of the lth convolutional layer a times according to the probability p i to obtain a kernel set S, calculate and accumulate the corresponding features of the channel kernel set S
  • the graph reconstructs the error and solves the new weights for the full convolution kernel according to the following optimization function
  • This optimization function aims to minimize the sum of the weight estimation errors of each convolution kernel, and the pre-stored weight update relationship can be called to obtain the update value of the convolution kernel weight of the current layer.
  • the weight update relationship can be expressed as for:
  • Y k is the output feature map of the uncompressed convolution kernel in the kth output channel
  • K is the total number of output channels of the current layer convolution kernel
  • the feature map reconstruction errors are separately calculated and summarized,
  • Frobenius norm is the update value of the weight tensor of the current layer convolution kernel in the i-th input channel and the k-th output channel, as the solution target of the kernel set-based convolution kernel weight estimation operation
  • S is the C input sample image data set
  • the kernel set consisting of a input channels sampled in the input channel is the sum of the feature maps x i of each input channel in the kernel set S in the corresponding channel of the convolution kernel, that is, the sum of the output feature maps of the kth output channel
  • * is the convolution operation.
  • the processing object is the uncompressed convolution kernel, and multiple rounds of sampling are performed on the input channel of the convolution kernel when the channel kernel set is constructed.
  • the average value of the reconstruction error of the feature map of the sampling channel makes the parameter estimation result adaptable to the random selection result of the channel; in addition, the process adopts a unique calculation formula when calculating the importance of each input channel.
  • the importance sum function is transformed into a function directly related to the number of target compression channels, and then the channel importance sum function is constrained to a controllable value range, which makes the kernel set theory's constraint on the lower bound of the kernel set dimension practical. .
  • This embodiment does not limit how to perform the channel cropping operation in step S301.
  • This embodiment also provides a channel cropping method, that is, using the kernel set construction method to calculate the input channel combination with the smallest reconstruction error and crop the redundancy.
  • An implementation of the input channel for could be:
  • B1 Determine the probability that each input channel of the current layer is selected according to the updated weight of the convolution kernel of the current layer, and form a hit probability set with the probability of each input channel being selected.
  • B3 Calculate the output feature map of the compressed convolution kernel, and use the convolution result of the compressed convolution kernel of the current layer and the sample image dataset as the input data of the next convolution layer of the current layer.
  • the pre-stored selection probability relationship can be called to calculate the probability that each input channel of the current layer is selected.
  • the selection probability relationship is:
  • w i (x) is the importance weighting coefficient of the i-th input channel
  • m l-1 is the total number of output channels of the l-1th convolutional layer of the original neural network model
  • X is the sample image dataset
  • x i is the sample
  • K is the total number of output channels of the current layer convolution kernel
  • n l is the total number of input channels of the original neural network model in layer l, that is, the current layer.
  • 1 ⁇ i ⁇ C ⁇ . Then, the pre-stored optimization function relationship is called to calculate the input channel combination ⁇ with the smallest reconstruction error of the output feature map, and the unselected input channel is removed according to the solved ⁇ , that is, the input channel corresponding to ⁇ i 0 is removed.
  • the optimization function relation can be expressed as:
  • Y is the output feature map of the original neural network model in the current layer
  • K is the total number of output channels of the current layer convolution kernel
  • ⁇ i is whether the ith channel is selected
  • ⁇ i is 0 or 1
  • ⁇ ⁇ i
  • 1 ⁇ i ⁇ C ⁇ is the best sampling result of sampling a input channel from C input channels, satisfying the condition
  • 0 a
  • X i is the first sample image dataset in the current layer.
  • the feature map of the i input channel x i is the feature map of the i-th input channel of the current layer of the single sample data of the sample image dataset, is the updated value of the weight tensor of the current layer convolution kernel in the i-th input channel and the k-th output channel, Represents the Frobenius norm, (1-q) ⁇ is a vector constructed by using the sampling probability of each input channel as a penalty factor and adds the penalty term in the optimization objective function, and q is the hit probability set.
  • a unique calculation function is designed for the importance of each channel, and an optimization objective function for solving the combination of input channels that minimizes the reconstruction error of the output feature map is designed, and each channel is further calculated.
  • the vector composed of the sampling probabilities of is added to the optimization objective function as a penalty factor.
  • the penalty factor is usually a scalar, but this application uses a vector to provide different constraints for different channels.
  • the neural network compression method can realize channel pruning in the forward reasoning process of the neural network, cancel the step of fine-tuning the network after compression, simplify the neural network compression process, and reduce the computational load and delay of the neural network compression process.
  • a theoretical analysis tool is provided for the trade-off between channel compression ratio and performance loss of convolutional layers.
  • the embodiment of the present invention also provides a corresponding device for the image recognition method, which further makes the method more practical.
  • the device can be described from the perspective of functional modules and the perspective of hardware.
  • the following describes the image recognition apparatus provided by the embodiments of the present invention, and the image recognition apparatus described below and the image recognition method described above may refer to each other correspondingly.
  • FIG. 5 is a structural diagram of an image recognition apparatus provided by an embodiment of the present invention in a specific implementation manner, and the apparatus may include:
  • the neural network model compression module 501 is used to input the sample image data set into the original neural network model; for each convolutional layer of the original neural network model, the feature map of the sample image data set in the current layer is used as the reconstruction target, using The kernel set construction method first obtains the updated weights of the convolution kernel, then calculates the combination of input channels with the smallest reconstruction error and cuts the redundant input channels as the compression result of the current layer; splicing the compression results of each convolution layer to generate an image Identify the model.
  • the image recognition module 502 is configured to input the acquired image to be recognized into the image recognition model to obtain an image recognition result of the image to be recognized.
  • the neural network model compression module 501 may include a weight update sub-module, and the weight update sub-module includes:
  • an importance calculation unit configured to determine the importance of each input channel of the current layer according to the feature map of each input channel of each sample data in the current layer in the sample image data set;
  • a weighting coefficient calculation unit used for setting an importance weighting coefficient for each input channel according to the importance of each input channel
  • a sampling probability calculation unit used to determine the sampling probability of each input channel by calculating the weighted importance function of each input channel and its sum function
  • the weight update unit is used to perform multiple rounds of sampling on the corresponding input channel according to the sampling probability of each input channel, and each round samples the input channel set of the current layer for multiple times according to the sampling probability to obtain a kernel set, calculates and accumulates the channel kernels
  • the feature map reconstruction error corresponding to the set is obtained, and the update value of the convolution kernel weight of the current layer is obtained by calculating the optimization function that minimizes the feature map reconstruction error.
  • the sampling probability calculation unit may be a unit that calculates the weighted importance function of each input channel by invoking a pre-stored importance function relationship, and the importance function relationship may be expressed as:
  • s i (x) is the weighted importance function of the ith input channel
  • wi (x) is the importance weighting coefficient of the ith input channel
  • gi (x) is the initial value of the ith input channel importance function
  • m l-1 is the total number of output channels of the l-1th convolutional layer of the original neural network model
  • t is the sum function
  • the weight update unit may be a unit that invokes a pre-stored weight update relationship to obtain the update value of the convolution kernel weight of the current layer, and the weight update relationship is:
  • Y k is the output feature map of the uncompressed convolution kernel in the kth output channel
  • K is the total number of output channels of the current layer convolution kernel
  • the feature map reconstruction errors are separately calculated and summarized,
  • Frobenius norm is the update value of the weight tensor of the current layer convolution kernel in the i-th input channel and the k-th output channel, as the solution target of the kernel set-based convolution kernel weight estimation operation
  • S is the C input sample image data set
  • the kernel set consisting of a input channels sampled in the input channel is the sum of the feature map x i of each input channel in the kernel set S in the output feature map of the corresponding output channel of the convolution kernel
  • * is the convolution operation.
  • the neural network model compression module 501 may include a channel clipping submodule, and the channel clipping submodule may include, for example:
  • the probability calculation unit is used for determining the probability of each input channel of the current layer being selected according to the updated weight of the convolution kernel of the current layer, and forming the probability of each input channel being selected into a hit probability set;
  • the channel selection unit is used to calculate the input channel combination with the smallest reconstruction error of the output feature map based on the hit probability set, and remove the unselected input channels according to the input channel combination;
  • the feature map calculation unit is used to calculate the output feature map of the compressed convolution kernel, so as to use the convolution result of the compressed convolution kernel of the current layer and the sample image dataset as the input data of the next convolution layer of the current layer.
  • the channel selection unit may be a unit that invokes a pre-stored optimization function relationship to calculate the input channel combination ⁇ with the smallest reconstruction error of the output feature map, and the optimization function relationship may be expressed as:
  • Y is the output feature map of the original neural network model in the current layer
  • K is the total number of output channels of the convolution kernel of the current layer
  • ⁇ i is whether the i-th channel is selected
  • ⁇ i is 0 or 1
  • ⁇ i
  • 1 ⁇ i ⁇ C ⁇ is the best sampling result of sampling a input channel from C input channels, satisfying the condition
  • 0 a
  • X i is the sample image dataset in the current layer
  • the feature map of the i-th input channel is the updated value of the weight tensor of the current layer convolution kernel in the i-th input channel and the k-th output channel
  • (1-q) ⁇ is a vector constructed by using the sampling probability of each input channel as a penalty factor and adds the penalty term in the optimization objective function
  • q is the hit probability set.
  • the probability calculation unit may be a unit that calculates the probability that each input channel of the current layer is selected by invoking a pre-stored selection probability relational expression, and the selection probability relational expression may be expressed as:
  • w i (x) is the importance weighting coefficient of the i-th input channel
  • m l-1 is the total number of output channels of the l-1th convolutional layer of the original neural network model
  • X is the sample image dataset
  • x i is the sample
  • K is the total number of output channels of the current layer convolution kernel
  • n l is the total number of input channels in the l layer of the original neural network model.
  • the embodiment of the present invention effectively improves the image recognition efficiency and reduces the computing resources consumed in the image recognition process.
  • FIG. 6 is a structural diagram of another image recognition apparatus provided by an embodiment of the present application. As shown in FIG. 6 , the apparatus includes a memory 60 for storing a computer program; a processor 61 for implementing the steps of the image recognition method mentioned in any of the above embodiments when executing the computer program.
  • the processor 61 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 61 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • the processor 61 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
  • the processor 61 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 61 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 60 may include one or more computer-readable storage media, which may be non-transitory. Memory 60 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash storage devices. In this embodiment, the memory 60 is at least used to store the following computer program 601, where, after the computer program is loaded and executed by the processor 61, the relevant steps of the image recognition method disclosed in any of the foregoing embodiments can be implemented. In addition, the resources stored in the memory 60 may also include an operating system 602, data 603, etc., and the storage mode may be short-term storage or permanent storage. The operating system 602 may include Windows, Unix, Linux, and the like. The data 603 may include, but is not limited to, data corresponding to the test results, and the like.
  • the image recognition device may further include a display screen 62 , an input/output interface 63 , a communication interface 64 , a power supply 65 and a communication bus 66 .
  • FIG. 6 does not constitute a limitation on the image recognition device, and may include more or less components than the one shown, for example, a sensor 67 may also be included.
  • the embodiment of the present invention effectively improves the image recognition efficiency and reduces the computing resources consumed in the image recognition process.
  • the image recognition method in the above embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , to execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electrically erasable programmable ROM, registers, hard disks, programmable Various media that can store program codes, such as removable disks, CD-ROMs, magnetic disks, or optical disks.
  • an embodiment of the present invention further provides a computer-readable storage medium storing an image recognition program, and when the image recognition program is executed by a processor, the steps of the image recognition method described in any one of the above embodiments are performed.
  • the embodiment of the present invention effectively improves the image recognition efficiency and reduces the computing resources consumed in the image recognition process.
  • the image recognition device of the present application can be deployed in an FPGA-based neural network acceleration application or a software platform of an AI acceleration chip, and realizes structured compression and pruning in the neural network forward inference process without additional
  • the simplified compression process can reduce the calculation amount and delay of the compression process itself, which is conducive to the deployment of neural network compression technology, and further promotes the application, implementation and promotion of FPGA-based deep learning in resource-constrained scenarios such as edge computing. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种图像识别方法、装置及计算机可读存储介质。其中,方法包括预先将样本图像数据集输入至原始神经网络模型;对原始神经网络模型的每一个卷积层,以样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值,再计算重构误差最小的输入通道组合并裁剪冗余的输入通道,从而得到当前卷积层的压缩结果,最后将各卷积层的压缩结果拼接生成图像识别模型。获取待识别图像,将待识别图像输入至图像识别模型中,并将图像识别模型输出结果作为待识别图像的图像识别结果,可有效提高图像识别效率,降低图像识别过程中消耗的计算资源。

Description

图像识别方法、装置及计算机可读存储介质
本申请要求于2020年09月17日提交中国专利局、申请号为202010980176.3、发明名称为“图像识别方法、装置及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别是涉及一种图像识别方法、装置及计算机可读存储介质。
背景技术
随着人工智能技术的快速发展,机器视觉作为人工智能的一个分支,也得到相应的发展。简单来说,机器视觉就是用机器代替人眼进行测量和判断,其通过机器视觉产品即图像摄取装置如CMOS(Complementary Metal Oxide Semiconductor,互补金属氧化物半导体)和CCD(Charge-coupled Device,电荷耦合元件)等将被摄取目标转换成图像信号,传送给专用图像处理系统,得到被摄目标的形态信息,根据像素分布和亮度、颜色等信息,转变成数字化信号;图像系统对这些信号进行各种运算来抽取目标的特征,进而根据判别的结果来控制现场的设备动作。由此可见,机器视觉在实现过程中很大一部分工作就是图像处理,对图像摄取装置采集图像的识别准确程度和识别效率对机器视觉性能有很大影响。
相关技术通常采用人工神经网络模型执行图像识别分类任务,而应用于图像分类识别任务的深度神经网络参数具有很大的冗余性,在执行图像分类识别任务时不仅会消耗大量的计算资源,而且图像识别效率还不高。
鉴于此,如何提高图像识别效率,降低图像识别过程中消耗的计算资源,是所属领域技术人员需要解决的技术问题。
发明内容
本申请提供了一种图像识别方法、装置及计算机可读存储介质,有效提高了图像识别效率,降低图像识别过程中消耗的计算资源。
为解决上述技术问题,本发明实施例提供以下技术方案:
本发明实施例一方面提供了一种图像识别方法,包括:
预先将样本图像数据集输入至原始神经网络模型;对所述原始神经网络模型的每一个卷积层,以所述样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值,再计算重构误差最小的输入通道组合并裁剪冗余的输入通道,作为所述当前层的压缩结果;将各卷积层的压缩结果拼接生成图像识别模型;
将获取的待识别图像输入至所述图像识别模型中,得到所述待识别图像的图像识别结果。
可选的,所述以所述样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值包括:
根据所述样本图像数据集中各样本数据在所述当前层的各输入通道的特征图确定所述当前层各输入通道的重要性;
根据各输入通道的重要性为各输入通道设置重要性加权系数;
通过计算各输入通道加权后的重要性函数及其和函数确定每个输入通道的采样概率;
按照每个输入通道的采样概率对相应输入通道进行多轮采样,每轮依照所述采样概率对所述当前层的输入通道集合进行多次采样得到一个核集,计算并累加通道核集对应的特征图重构误差,通过计算最小化特征图重构误差的优化函数以获取所述当前层的卷积核权重的更新值。
可选的,所述通过计算各输入通道加权后的重要性函数及其和函数确定每个输入通道的采样概率包括:
调用预先存储的重要性函数关系式计算各输入通道加权后的重要性函数,所述重要性函数关系式为:
s i(x)=w i(x)·g i(x);
第i个输入通道的采样概率p i为p i=s i(x)/t;
其中,s i(x)为第i个输入通道加权后的重要性函数,w i(x)为第i个输入通道的重要性加权系数,g i(x)为第i个输入通道的初始重要性函数,
Figure PCTCN2021089861-appb-000001
Figure PCTCN2021089861-appb-000002
为所述样本图像数据集X中各样本数据在第i输入通道的特征图x i的Frobenius范数的最大值,m l-1为所述原始神经网络模型的第l-1卷积层的输出通道总数,
Figure PCTCN2021089861-appb-000003
为所有输入通道的特征图Frobenius范数的最大值的均值,a l为所述原始神经网络模型的第l卷积层目标要达到的压缩后输入通道数;t为所述和函数,
Figure PCTCN2021089861-appb-000004
可选的,所述通过计算最小化特征图重构误差的优化函数以获取所述当前层的卷积核权重的更新值包括:
调用预先存储的权值更新关系式得到所述当前层的卷积核权重的更新值,所述权值更新关系式为:
Figure PCTCN2021089861-appb-000005
式中,Y k为未压缩卷积核在第k输出通道的输出特征图,K为所述当前层卷积核输出通道总数,
Figure PCTCN2021089861-appb-000006
为对卷积核的K个输出通道分别计算特征图重构误差并汇总,
Figure PCTCN2021089861-appb-000007
为对输入所述样本图像数据集的输入通道组合进行R轮独立采样并累加每次采样结果的特征图重构误差,
Figure PCTCN2021089861-appb-000008
代表Frobenius范数,
Figure PCTCN2021089861-appb-000009
为所述当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值、作为基于核集的卷积核权值估计操作的求解目标,S为输入所述样本图像数据集的C个输入通道中采样到的a个输入通道组成的核集,
Figure PCTCN2021089861-appb-000010
为核集S中每个输入通道的特征图x i在卷积核对应通道第k个输出通道的输出特征图之和,*为卷积操作。
可选的,所述计算重构误差最小的输入通道组合并裁剪冗余的输入通道包括:
根据所述当前层卷积核更新后的权值确定所述当前层各输入通道被选中的概率,并将各输入通道被选中的概率组成命中概率集;
基于所述命中概率集计算输出特征图重构误差最小的输入通道组合,并根据所述输入通道组合移除未被选择的输入通道;
计算压缩后卷积核的输出特征图,以将所述当前层压缩后的卷积核与所述样本图像数据集的卷积结果作为所述当前层的下一个卷积层的输入数据。
可选的,所述基于所述命中概率集计算输出特征图重构误差最小的输入通道组合包括:
调用预先存储的优化函数关系式计算输出特征图重构误差最小的输入通道组合γ,所述优化函数关系式为:
Figure PCTCN2021089861-appb-000011
式中,Y为所述原始神经网络模型在所述当前层的输出特征图,K为所述当前层卷积核输出通道总数,γ i为第i通道是否被选择,γ i取值为0或1,γ={γ i|1≤i≤C}为由C个输入通道采样出a个输入通道的最佳采样结果,满足条件||γ|| 0=a,X i为所述样本图像数据集在所述当前层的第i输入通道的特征图,
Figure PCTCN2021089861-appb-000012
为所述当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值,
Figure PCTCN2021089861-appb-000013
代表Frobenius范数,(1-q)γ为将各个输入通道的采样概率构造的向量作为惩罚因子加入了优化目标函数中的惩罚项,q为所述命中概率集。
可选的,所述根据所述当前层卷积核更新后的权值确定所述当前层各输入通道被选中的概率包括:
调用预先存储的选择概率关系式计算所述当前层各输入通道被选中的概率,所述选择概率关系式为:
Figure PCTCN2021089861-appb-000014
式中,
Figure PCTCN2021089861-appb-000015
w i(x)为第i个输入通道的重要性加权系数,m l-1为所述原始神经网络模型的第l-1卷积层的输出通道总数,X为所述样本图像数据集,x i为所述样本图像数据集中各样本数据在第i输入通道的特征图,K为所述当前层卷积核输出通道总数,
Figure PCTCN2021089861-appb-000016
为所述当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值,
Figure PCTCN2021089861-appb-000017
代表Frobenius范数,n l为所述原始神经网络模型在l层的输入通道总数。
本发明实施例另一方面提供了一种图像识别装置,包括:
神经网络模型压缩模块,用于将样本图像数据集输入至原始神经网络模型;对所述原始神经网络模型的每一个卷积层,以所述样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值,再计算重构误差最小的输入通道组合并裁剪冗余的输入通道,作为所述当前层的压缩结果;将各卷积层的压缩结果拼接生成图像识别模型;
图像识别模块,用于将获取的待识别图像输入至所述图像识别模型中,得到所述待识别图像的图像识别结果。
本发明实施例还提供了一种图像识别装置,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如前任一项所述图像识别方法的步骤。
本发明实施例最后还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有图像识别程序,所述图像识别程序被处理器执行时实现如前任一项所述图像识别方法的步骤。
本申请提供的技术方案的优点在于,依次对原始神经网络模型的各个卷积层进行包括卷积核权值估计过程和通道裁剪过程在内的网络压缩处理,得到用于执行图像识别任务的图像识别模型。由于该图像识别模型为原始神经网络模型经过压缩后所得模型,可有效降低原始神经网络模型参数的冗余性,降低模型执行任务处理的数据量,有效减少图像分类识别消 耗的计算资源,提高图像分类识别速度;压缩后的网络模型不需要进行重新训练,操作更加便捷,而且整个压缩流程运行在分类图像输入神经网络进行前向推理的过程中,为了保证分类识别精度以输入分类图像在原始神经网络各个层的输出特征图为重构目标,并通过基于核集构建的方式获取每层卷积核的新权值以及裁剪冗余的卷积核通道,避免核集构建结果与不同分类图像的分布相关,有效提高模型泛化能力。
此外,本发明实施例还针对图像识别方法提供了相应的实现装置及计算机可读存储介质,进一步使得所述方法更具有实用性,所述装置及计算机可读存储介质具有相应的优点。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
为了更清楚的说明本发明实施例或相关技术的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例提供的现有技术中的神经网络模型的压缩流程示意图;
图2为本发明实施例提供的采用现有技术的网络压缩方式执行图像识别任务的方法流程示意图;
图3为本发明实施例提供的一种图像识别方法的流程示意图;
图4为本发明实施例提供的另一种图像识别方法的流程示意图;
图5为本发明实施例提供的图像识别装置的一种具体实施方式结构图;
图6为本发明实施例提供的图像识别装置的另一种具体实施方式结构图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等是用于区别不同的对象,而不是用于描述特定的顺序。此外术语“包括”和“具有”以及他们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可包括没有列出的步骤或单元。
为了减少图像分类识别消耗的计算资源,并且提高图像分类识别速度,可对深度神经网络进行压缩,然后采用压缩后的神经网络模型执行图像分类识别任务,可有效提高模型输出结果的效率。神经网络压缩解决了深度学习在计算和存储资源有限的设备中的应用瓶颈。神经网络压缩剪枝能够减少神经网络模型的参数量、计算量,从而减少神经网络存储时占用的存储量,提高神经网络推理时的计算速度。神经网络剪枝方法依照处理对象划分为粗粒度剪枝和细粒度剪枝。粗粒度剪枝也称为结构化剪枝,其面向神经网络的滤波器级、通道级、行或列级结构进行压缩操作。细粒度剪枝也称为非结构化剪枝,其可以筛选并移除单个权重,优势在于能最大程度地保留神经网络的精度,缺点是非结构化剪枝的稀疏矩阵计算依赖于专门的运行库和硬件设备。结构化剪枝方法凭借其硬件友好性,得到较多关注和应用。
神经网络压缩剪枝的传统算法流程包含剪枝、微调的迭代,如图1所示,这种算法的运行非常耗时。针对此问题,有研究提出了一些不依赖微调过程来补偿压缩产生的精度损失的神经网络压缩剪枝方法。例如,已有剪枝方法CPA(Channel Pruning for Accelerating Very Deep Neural Networks)在神经网络的前向推理过程中完成通道压缩和权重参数重构,在不进行微调的情况下压缩后网络就能获得较小的精度损失。该方法逐层对神经网络 卷积层的输入通道进行基于LASSO(Least absolute shrinkage and selection operator,套索算法)回归的筛选,随后通过采用最小二乘法优化特征图重构误差来获取新的卷积核权重参数,并将其直接作为压缩后网络的卷积核权重参数,而不需要经过微调来获取新的卷积核权重参数。但是,基于特征图重构误差优化的压缩方法,其压缩效果对输入数据具有依赖性,输入数据集的不同会引起基于幅值的重要性衡量规则的筛选结果变化。已有研究开发出与数据无关的网络压缩方法DINP(Data-Independent Neural Pruning via Coresets),其在神经网络VC维(Vapnik-Chervonenkis dimension)和核集理论的基础上为压缩率和估计误差之间的权衡提供了理论依据。DINP方法采用OSCC算法(offline and streaming corset constructions,离线流式核集构建算法)为全连接层的隐层神经元构建核集,OSCC算法提供了以数据VC维为基础的核集尺寸计算公式的推导,DINP方法在此基础上给出了以激活函数值上界为依据的神经元重要性衡量规则与核集采样概率,由于核集的构建过程中每个样本的加权系数的计算和分配都与输入数据无关,所以DINP的核集构建结果具有与数据无关(Data-Independent)的特性。但是DINP方法只针对全连接层而不适用于卷积层,因而其对深度卷积神经网络的压缩率有局限,并且适用的网络模型范围也是有限的,对不包含全连接层的卷积神经网络尚不适用,比如yolo(You Only Look Once)系列神经网络和全卷积神经网络R-FCN(Region-based Fully Convolutional Networks)。除了DINP方法之外,核集理论在神经网络压缩中的早期应用均具有数据依赖性,虽然也有方法将核集理论应用于低秩分解中实现对卷积层和全连接层的压缩,但构建核集本身并不能压缩神经网络。
如果采用上述神经网络压缩方法所处理的神经网络模型执行图像分类识别任务的方法流程如图2所示,需要先采用样本数据训练原始神经网络模型,然后采用上述神经网络压缩方法压缩原始神经网络生成压缩后的模型,然后再利用训练样本数据再一次训练压缩后的模型以恢复图像识别精度,最后利用训练好的压缩模型执行图像分类任务。神经网络压缩本身的训练过程就需要循环重复执行逐层压缩网络和重新训练网络的过程,两次 模型训练过程无疑增加了大量数据处理,模型训练过程需要耗费大量计算资源,操作繁琐。为了简化神经网络压缩的训练过程,降低图像分类识别任务中获取压缩后模型的成本,本申请提出了一种不需要进行重新训练且不依赖输入数据的结构化剪枝的神经网络压缩流程,在神经网络前向推理时逐层按顺序先后执行进行基于核集理论的卷积核参数估计与压缩剪枝。该流程运行在分类图像输入神经网络进行前向推理的过程中,为了保证分类精度以输入分类图像在原始神经网络各个层的输出特征图为重构目标,并通过基于核集构建的方式获取每层卷积核的新权值以及裁剪冗余的卷积核通道。该流程设计了核集构建的新方式,以避免核集构建结果与不同分类图像的分布相关,提高泛化能力。
在介绍了本发明实施例的技术方案后,下面详细的说明本申请的各种非限制性实施方式。
首先参见图3及图4,图3为本发明实施例提供的一种图像识别方法的流程示意图,本发明实施例可包括以下内容:
S301:预先将样本图像数据集输入至原始神经网络模型,对原始神经网络模型的每一个卷积层,以样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值,再计算重构误差最小的输入通道组合并裁剪冗余的输入通道,作为当前层的压缩结果;将各卷积层的压缩结果拼接生成图像识别模型。
在本步骤中,原始神经网络模型为任何一种类型且训练好的神经网络模型,例如可为监督式学习网络类型的神经网络模型,还可为非监督学习网络类型的神经网络模型,还可为联想式学习网络类型的神经网络模型,或者是最适化应用网络类型的神经网络模型,原始神经网络模型例如可为霍普菲尔网络HN(Hopfield network)、卷积神经网络CNN(Convolutional Neural Networks)、逆图形网络DN(Deconvolutional networks)、生成对抗网络GAN(Generative Adversarial Networks)、周期神经网络RNN(Recurrent Neural Networks)等等,这均不影响本申请的实现。样本图像数据集可为原始神经网络模型训练过程中使用的训练数据,也可不为训练过程中使用的数据,这均不影响本申请的实现。图像识别模型为原始神经 网络模型被压缩后的网络模型,在原始神经网络模型的前向推理过程中逐层压缩神经网络层,在每层的压缩过程中首先执行卷积核权值估计操作,直接获取当前层完整卷积核的权重参数,然后执行通道裁剪操作,筛选输入通道并裁剪与其对应的上一层的卷积核输出通道,并将当前层的压缩后卷积核与输入数据的卷积结果作为下一层的输入数据,对原始神经网络模型所有卷积层均执行完压缩操作后,将产生的各个压缩后网络层相互拼接在一起,组成完整的压缩后神经网络模型。
需要说明的是,原始神经网络模型的压缩过程具有先后执行顺序的限定,即首先执行卷积核权值估计操作再执行通道裁剪操作的先后次序,且卷积核权值估计操作和通道裁剪操作均采用了核集构建理论,核集构建理论可参阅相关文献记载,此处,便不再赘述。
S302:将获取的待识别图像输入至图像识别模型中,得到待识别图像的图像识别结果。
在本步骤中,待识别图像可为一张图像,也可为多张图像,图像类型不限定,图像识别模型用于执行图像分类识别操作,图像识别模型会为输入数据即待识别图像匹配合适的标签,然后将标签类型输出作为图像识别结果。
在本发明实施例提供的技术方案中,依次对原始神经网络模型的各个卷积层进行包括卷积核权值估计过程和通道裁剪过程在内的网络压缩处理,得到用于执行图像识别任务的图像识别模型,由于该图像识别模型为原始神经网络模型经过压缩后所得模型,可有效降低原始神经网络模型参数的冗余性,降低模型执行任务处理的数据量,有效减少图像分类识别消耗的计算资源,提高图像分类识别速度;压缩后的网络模型不需要进行重新训练,操作更加便捷,而且整个压缩流程运行在分类图像输入神经网络进行前向推理的过程中,为了保证分类识别精度以输入分类图像在原始神经网络各个层的输出特征图为重构目标,并通过基于核集构建的方式获取每层卷积核的新权值以及裁剪冗余的卷积核通道,避免核集构建结果与不同分类图像的分布相关,有效提高模型泛化能力。
需要说明的是,上述实施例在图像识别过程中通过对图像识别模型进行压缩操作降低数据处理量,减少图像分类识别消耗的计算资源,提升图像识别效率。本申请技术方案例如还可用于但并不限制于在执行目标分割任务或目标检测任务过程中的图像特征提取阶段,从而可以提升目标分割效率,提升目标检测效率,降低执行任务过程中消耗的计算机资源。
此外,可以理解的是,图像为二维数据,上述实施例中的神经网络模型的卷积核是以二维卷积核为例进行阐述,本申请还同样适用于一维卷积的卷积核压缩,相应的,将扩展压缩后深度神经网络模型应用于一维序列数据处理领域,例如医学心电信号等一维数据的异常分类任务等。也就是说,在执行一维数据的分类识别任务过程中,例如医学心电心音信号等,可利用一维生理信号数据作为样本数据集训练原始神经网络模型,对原始神经网络模型的每一个卷积层,以样本数据集在当前层的特征图为重构目标,利用核集构建方法先获取一维卷积核的更新权值,再计算重构误差最小的输入通道组合并裁剪冗余的输入通道,作为当前层的压缩结果;将各卷积层的压缩结果拼接生成生理信号识别模型。将获取的一维生理信号数据输入至生理信号识别模型中,得到待识别生理信号的识别分类结果。同样的,本申请还可进一步拓展至三维卷积的卷积核压缩,相应的,将扩展压缩后深度神经网络模型应用于三维序列数据处理领域,例如可应用于对医学三维CT影像等三维数据的分类、识别、目标检测等,例如还可应用于视频处理领域的动作识别等应用场景。
为了减小输入数据分布变化对核集构建结果的影响,本申请在卷积核权值估计操作中还可引入了多轮采样的处理机制,依概率对输入通道进行采样来构建通道核集,在多轮采样的结果上以最小化特征图重构误差为目标来为压缩后卷积核生成权值,由于采样具有一定随机性,在多次通道不重复采样的情况下生成的卷积核权值对不同的通道选择结果均具有适应力,也即S301的一种实施方式可包括下述内容:
首先,需要解释的是,本申请在采用核集构建方法进行卷积核的更新过程中,相比传统核集理论,有如下特点:
1)核集构建的对象是卷积层的输入通道,而不是单个神经元;2)所构建的核集VC维可以减小至1,而不需要等于第l-1层全连接层的神经元个数d或者其他高维度值,具体地讲,本申请在为第l卷积层构建核集S时,构造VC维等于1的待采样集合P,直接将第l层的四维卷积核张量沿着某一输入通道的参数张量或者第l-1层输出的三维特征图张量沿着某一输出通道的参数张量作为集合P的一个样本,则集合P的样本数等于第l层卷积核的输入通道数或者第l-1层卷积层的特征图输出通道数;3)集合P中每个样本的重要性计算式将目标压缩通道数a指定为目标核集S的维度,进而将重要性之和函数t与目标压缩通道数a即目标核集S的维度联系在一起,使得重要性之和函数t的上界被约束在可控的取值范围内。
其次,“以样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值”的一种实施方式如下所述:
A1:根据样本图像数据集中各样本数据在当前层的各输入通道的特征图确定当前层各输入通道的重要性。
A2:根据各输入通道的重要性为各输入通道设置重要性加权系数;
A3:通过计算各输入通道加权后的重要性函数及其和函数确定每个输入通道的采样概率;
A4:按照每个输入通道的采样概率对相应输入通道进行多轮采样,每轮依照采样概率对当前层的输入通道集合进行多次采样得到一个核集,计算并累加通道核集对应的特征图重构误差,通过计算最小化特征图重构误差的优化函数以获取当前层的卷积核权重的更新值。
首先,依据第l层的输入数据即样本图像数据集来计算输入通道的初始重要性函数g i(x)。然后依据第l层卷积核为第l层各个输入通道的重要性分配重要性加权系数w i(x),w i(x)是为每个输入通道的非均匀采样构造的加权系数,具体分配加权系数的操作是计算第l层卷积核在不同输入通道的参数张量W i的L 1范数||W i|| 1,再按照||W i|| 1的值降序排序,为排序前a l个输入通道分配较大的权值w i(x)=1/(a l+1),为其他输入通道分配较小的权值w i(x)=1/(a l+1)(m l-1-a l)。在这样加权系数的计算和分配均与输入数据无关的处理方式下,核集S的构建结果会具有不依赖于输入数据的优势,因为当 前构造的核集S并不是在特定数据分布下选择的结果。最后计算所有输入通道加权后的重要性函数s i(x)及其和函数t,也即可调用预先存储的重要性函数关系式计算各输入通道加权后的重要性函数,第i个输入通道的采样概率p i可表示为p i=s i(x) /t。重要性函数关系式可表示为:
s i(x)=w i(x)·g i(x);
其中,s i(x)为第i个输入通道加权后的重要性函数,w i(x)为第i个输入通道的重要性加权系数,g i(x)为第i个输入通道的初始重要性函数,
Figure PCTCN2021089861-appb-000018
Figure PCTCN2021089861-appb-000019
并且其中a l为原始神经网络模型的第l卷积层目标要达到的压缩后输入通道数,m l-1是神经网络第l-1层的输出通道数。
Figure PCTCN2021089861-appb-000020
为样本图像数据集X中各样本数据在第i通道的特征图x i的Frobenius范数的最大值,m l-1为原始神经网络模型的第l-1卷积层的输出通道总数,
Figure PCTCN2021089861-appb-000021
为所有通道的特征图Frobenius范数的最大值的均值,a l为原始神经网络模型的第l卷积层目标要达到的压缩后输入通道数;t为和函数,
Figure PCTCN2021089861-appb-000022
本申请在此处构造的带有权重的重要性函数s i(x)能够为重要性之和t的取值范围提供约束,即保证
Figure PCTCN2021089861-appb-000023
当a l=m l-1-1时等号成立。并且由于计算公式中包含体现压缩比的计算因子a l,t的上界与下界可以通过修改目标压缩通道数来灵活地调控。
在其他现有基于核集的神经网络压缩技术中,没有关于t取值上界的讨论。然而t又是核集构造算法从理论到实际操作的转化中非常重要的一个参 数,t影响着需要构造的目标核集的维度下界|S|。核集构造算法OSCC证明了由集合P中依概率随机采样一个子集S的时候,若满足
Figure PCTCN2021089861-appb-000024
c≥1,c是一个常数,g i(x)是非负函数,误差参数ε∈(0,1),其中d代表集合P的VC维,那么核集S以概率1-δ成为集合P查询空间的ε-核集,即ε-coreset。ε-coreset的定义可以参见OSCC算法原文。此定理表明,t值对核集的维度|S|的设定有指导作用。
第l-1层激活函数的输入数据的维度mh 2,m是输出通道数,h是输出特征图的尺寸。若第l卷积层的不同输入通道之间不共享卷积核,即卷积核为四维张量n×m×k×k,针对输入数据的不同通道,卷积核同一输出通道的参数值不同,则卷积核参数维度等于nmk 2。在本申请所提方法的场景中,以卷积核在某个输入通道的参数为压缩操作的处理单元,或者以卷积层输出特征图在某个输出通道的参数为处理单元,即以m×k×k个卷积核参数或者h×h个特征图参数为处理单元,所以第l卷积层卷积核的数据维度可以进一步简化为n l,而第l-1层激活函数的数据维度也简化为m l-1,第l卷积层和第l-1层相应的VC维都简化为1,即d l=1且d l-1=1。考虑到核集构建算法的复杂度大致等于定义一次查询所需要的参数量,对于VC维为d的集合P进行一次查询所需要的参数量至少为d+1,减少集合P的VC维可以减少核集构建算法的复杂度。采用本申请所提方法在对第l卷积层进行输入通道压缩时,会同时影响到第l-1层激活函数的VC维d l-1和第l层的卷积核参数的VC维d l,这要求为第l卷积层构造的输入通道的核集维度|S|同时满足d l-1和d l确定的限制。当第l卷积层的目标压缩通道数为a时,第l-1层激活函数对应的目标核集维度|S l-1|和第l层的卷积核参数对应的目标核集维度|S l| 均等于a。由于本申请构造的重要性函数之和t的上界sup(t)在第l-1层取值
Figure PCTCN2021089861-appb-000025
和在第l层取值
Figure PCTCN2021089861-appb-000026
相等,即sup(t l-1)=sup(t l),所以a可以同时满足以下两个不等式的要求:
Figure PCTCN2021089861-appb-000027
Figure PCTCN2021089861-appb-000028
综上所述,本申请所构造的重要性函数之和t不但具有可调控的上下界,而且可以使得目标压缩通道数a同时满足两个卷积层对目标核集维度的约束。此外,当目标压缩通道数a以及重要性函数之和t已知时,当前构建的目标核集S与集合P之间的近似误差ε也可以得到估计值,这可以作为侧面评价压缩效果的参考指标。
独立地对输入数据的输入通道进行R轮采样,每轮依照概率p i对第l卷积层的输入通道集合P进行a次采样得到一个核集S,计算并累加通道核集S对应的特征图重构误差,并按照以下优化函数来求解完整卷积核的新权值
Figure PCTCN2021089861-appb-000029
此优化函数以最小化各个卷积核的权值估计误差之和为目标,即可调用预先存储的权值更新关系式得到当前层的卷积核权重的更新值,权值更新关系式可表示为:
Figure PCTCN2021089861-appb-000030
式中,Y k为未压缩卷积核在第k输出通道的输出特征图,K为当前层卷积核输出通道总数,
Figure PCTCN2021089861-appb-000031
为对卷积核的K个输出通道分别计算特征图重构误差并汇总,
Figure PCTCN2021089861-appb-000032
为对输入样本图像数据集的输入通道组合进行R轮独立采样并累加每次采样结果的特征图重构误差,
Figure PCTCN2021089861-appb-000033
代表Frobenius范数,
Figure PCTCN2021089861-appb-000034
为当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值、作为基于核集的卷积核权值估计操作的求解目标,S为输入样本图像 数据集的C个输入通道中采样到的a个输入通道组成的核集,
Figure PCTCN2021089861-appb-000035
为核集S中每个输入通道的特征图x i在卷积核对应通道即第k个输出通道的输出特征图之和,*为卷积操作。
由上可知,本实施例基于核集的卷积核参数估计过程,处理对象是未压缩的卷积核,并且在构建通道核集时对卷积核输入通道进行多轮采样,通过优化多轮采样通道的特征图重构误差的平均值,使得参数估计结果对通道的随机选择结果均具有适应力;此外,该过程在计算每个输入通道的重要性时采用了独特的计算式,将通道重要性之和函数转变为与目标压缩通道数直接相关的函数,进而将通道重要性之和函数约束到可调控的取值范围内,使得核集理论对核集维度下界的约束具有了实用意义。
上述实施例并未对如何执行S301步骤中的通道裁剪操作进行限定,本实施例还提供了一种通道裁剪方式,也即利用核集构建方法计算重构误差最小的输入通道组合并裁剪冗余的输入通道的一种实施过程可为:
B1:根据当前层卷积核更新后的权值确定当前层各输入通道被选中的概率,并将各输入通道被选中的概率组成命中概率集。
B2:基于命中概率集计算输出特征图重构误差最小的输入通道组合,并根据输入通道组合移除未被选择的输入通道。
B3:计算压缩后卷积核的输出特征图,以将当前层压缩后的卷积核与样本图像数据集的卷积结果作为当前层的下一个卷积层的输入数据。
可调用预先存储的选择概率关系式计算当前层各输入通道被选中的概率,选择概率关系式为:
Figure PCTCN2021089861-appb-000036
式中,
Figure PCTCN2021089861-appb-000037
w i(x)为第i个输入通道的重要性加权系数,m l-1为原始神经网络模型的第l-1卷积层的输出通道总数,X为样本图像数据集,x i为样本图像数据集中各样本数据 在第i通道的特征图,K为当前层卷积核输出通道总数,
Figure PCTCN2021089861-appb-000038
为当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值,
Figure PCTCN2021089861-appb-000039
代表Frobenius范数,n l为原始神经网络模型在l层即当前层的输入通道总数。每个输入通道被选中的概率组成的命中概率集可表示为q={q i|1≤i≤C}。然后调用预先存储的优化函数关系式计算输出特征图重构误差最小的输入通道组合γ,依据求解出的γ来移除未被选择的输入通道,即移除γ i=0对应的输入通道。优化函数关系式可表示为:
Figure PCTCN2021089861-appb-000040
式中,Y为原始神经网络模型在当前层的输出特征图,K为当前层卷积核输出通道总数,γ i为第i通道是否被选择,γ i取值为0或1,γ={γ i|1≤i≤C}为由C个输入通道采样出a个输入通道的最佳采样结果,满足条件||γ|| 0=a,X i为样本图像数据集在当前层的第i输入通道的特征图,x i为样本图像数据集的单个样本数据在当前层的第i输入通道的特征图,
Figure PCTCN2021089861-appb-000041
为当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值,
Figure PCTCN2021089861-appb-000042
代表Frobenius范数,(1-q)γ为将各个输入通道的采样概率构造的向量作为惩罚因子加入了优化目标函数中的惩罚项,q为命中概率集。
本实施例在核集理论的基础上为每个通道的重要性设计了独特的计算函数,并设计了求解最小化输出特征图重构误差的输入通道组合的优化目标函数,进一步将每个通道的采样概率组成的向量作为惩罚因子加入到优化目标函数中,在其他现有压缩方法中惩罚因子通常是标量,而本申请则采用了向量为不同的通道提供不同的约束。神经网络压缩方法可以在神经网络的前向推理过程中实现通道剪枝,取消了压缩后微调网络的步骤,简化了神经网络压缩流程,减少了神经网络压缩过程的计算量和时延,并且在核集构建算法的基础上为卷积层的通道压缩比和性能损失之间的权衡提供了理论分析工具。
需要说明的是,本申请中各步骤之间没有严格的先后执行顺序,只要 符合逻辑上的顺序,则这些步骤可以同时执行,也可按照某种预设顺序执行,图3和图4只是一种示意方式,并不代表只能是这样的执行顺序。
本发明实施例还针对图像识别方法提供了相应的装置,进一步使得所述方法更具有实用性。其中,装置可从功能模块的角度和硬件的角度分别说明。下面对本发明实施例提供的图像识别装置进行介绍,下文描述的图像识别装置与上文描述的图像识别方法可相互对应参照。
基于功能模块的角度,参见图5,图5为本发明实施例提供的图像识别装置在一种具体实施方式下的结构图,该装置可包括:
神经网络模型压缩模块501,用于将样本图像数据集输入至原始神经网络模型;对原始神经网络模型的每一个卷积层,以样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值,再计算重构误差最小的输入通道组合并裁剪冗余的输入通道,作为当前层的压缩结果;将各卷积层的压缩结果拼接生成图像识别模型。
图像识别模块502,用于将获取的待识别图像输入至图像识别模型中,得到待识别图像的图像识别结果。
可选的,在本实施例的一些实施方式中,所述神经网络模型压缩模块501可以包括权值更新子模块,所述权重更新子模块包括:
重要性计算单元,用于根据样本图像数据集中各样本数据在当前层的各输入通道的特征图确定当前层各输入通道的重要性;
加权系数计算单元,用于根据各输入通道的重要性为各输入通道设置重要性加权系数;
采样概率计算单元,用于通过计算各输入通道加权后的重要性函数及其和函数确定每个输入通道的采样概率;
权值更新单元,用于按照每个输入通道的采样概率对相应输入通道进行多轮采样,每轮依照采样概率对当前层的输入通道集合进行多次采样得到一个核集,计算并累加通道核集对应的特征图重构误差,通过计算最小化特征图重构误差的优化函数以获取当前层的卷积核权重的更新值。
在本发明实施例的一些实施方式中,所述采样概率计算单元可为调用预先存储的重要性函数关系式计算各输入通道加权后的重要性函数的单元,重要性函数关系式可表示为:
s i(x)=w i(x)·g i(x);
第i个输入通道的采样概率p i为p i=s i(x)/t;
其中,s i(x)为第i个输入通道加权后的重要性函数,w i(x)为第i个输入通道的重要性加权系数,g i(x)为第i个输入通道的初始重要性函数,
Figure PCTCN2021089861-appb-000043
Figure PCTCN2021089861-appb-000044
为样本图像数据集X中各样本数据在第i通道的特征图x i的Frobenius范数的最大值,m l-1为原始神经网络模型的第l-1卷积层的输出通道总数,
Figure PCTCN2021089861-appb-000045
为所有通道的特征图Frobenius范数的最大值的均值,a l为原始神经网络模型的第l卷积层目标要达到的压缩后输入通道数;t为和函数,
Figure PCTCN2021089861-appb-000046
在本发明实施例的另一些实施方式中,所述权值更新单元可为调用预先存储的权值更新关系式得到当前层的卷积核权重的更新值的单元,权值更新关系式为:
Figure PCTCN2021089861-appb-000047
式中,Y k为未压缩卷积核在第k输出通道的输出特征图,K为当前层卷积核输出通道总数,
Figure PCTCN2021089861-appb-000048
为对卷积核的K个输出通道分别计算特征图重构误差并汇总,
Figure PCTCN2021089861-appb-000049
为对输入样本图像数据集的输入通道组合进行R轮独立采样并累加每次采样结果的特征图重构误差,
Figure PCTCN2021089861-appb-000050
代表Frobenius范数,
Figure PCTCN2021089861-appb-000051
为当前层卷积核在第i输入通道和第k输出通道的权重张量的更新 值、作为基于核集的卷积核权值估计操作的求解目标,S为输入样本图像数据集的C个输入通道中采样到的a个输入通道组成的核集,
Figure PCTCN2021089861-appb-000052
为核集S中每个输入通道的特征图x i在卷积核对应输出通道的输出特征图之和,*为卷积操作。
可选的,在本实施例的一些实施方式中,所述神经网络模型压缩模块501可以包括通道裁剪子模块,所述通道裁剪子模块例如可包括:
概率计算单元,用于根据当前层卷积核更新后的权值确定当前层各输入通道被选中的概率,并将各输入通道被选中的概率组成命中概率集;
通道选择单元,用于基于命中概率集计算输出特征图重构误差最小的输入通道组合,并根据输入通道组合移除未被选择的输入通道;
特征图计算单元,用于计算压缩后卷积核的输出特征图,以将当前层压缩后的卷积核与样本图像数据集的卷积结果作为当前层的下一个卷积层的输入数据。
在本实施例的一些实施方式中,所述通道选择单元可为调用预先存储的优化函数关系式计算输出特征图重构误差最小的输入通道组合γ的单元,优化函数关系式可表示为:
Figure PCTCN2021089861-appb-000053
式中,Y为原始神经网络模型在所述当前层的输出特征图,K为当前层卷积核输出通道总数,γ i为第i通道是否被选择,γ i取值为0或1,γ={γ i|1≤i≤C}为由C个输入通道采样出a个输入通道的最佳采样结果,满足条件||γ|| 0=a,X i为样本图像数据集在当前层的第i输入通道的特征图,
Figure PCTCN2021089861-appb-000054
为当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值,
Figure PCTCN2021089861-appb-000055
代表Frobenius范数,(1-q)γ为将各个输入通道的采样概率构造的向量作为惩罚因子加入了优化目标函数中的惩罚项,q为命中概率集。
在本发明实施例的一些其他实施方式中,所述概率计算单元可为调用预先存储的选择概率关系式计算当前层各输入通道被选中的概率的单元,选择概率关系式可表示为:
Figure PCTCN2021089861-appb-000056
式中,
Figure PCTCN2021089861-appb-000057
w i(x)为第i个输入通道的重要性加权系数,m l-1为原始神经网络模型的第l-1卷积层的输出通道总数,X为样本图像数据集,x i为样本图像数据集中各样本数据在第i输入通道的特征图,K为当前层卷积核输出通道总数,
Figure PCTCN2021089861-appb-000058
为当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值,
Figure PCTCN2021089861-appb-000059
代表Frobenius范数,n l为原始神经网络模型在l层的输入通道总数。
本发明实施例所述图像识别装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例有效提高图像识别效率,降低图像识别过程中消耗的计算资源。
上文中提到的图像识别装置是从功能模块的角度描述,进一步的,本申请还提供一种图像识别装置,是从硬件角度描述。图6为本申请实施例提供的另一种图像识别装置的结构图。如图6所示,该装置包括存储器60,用于存储计算机程序;处理器61,用于执行计算机程序时实现如上述任一实施例提到的图像识别方法的步骤。
其中,处理器61可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器61可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器61也可以包括主处理器和协处理器,主处理器是用于对在唤 醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器61可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器61还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器60可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器60还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。本实施例中,存储器60至少用于存储以下计算机程序601,其中,该计算机程序被处理器61加载并执行之后,能够实现前述任一实施例公开的图像识别方法的相关步骤。另外,存储器60所存储的资源还可以包括操作系统602和数据603等,存储方式可以是短暂存储或者永久存储。其中,操作系统602可以包括Windows、Unix、Linux等。数据603可以包括但不限于测试结果对应的数据等。
在一些实施例中,图像识别装置还可包括有显示屏62、输入输出接口63、通信接口64、电源65以及通信总线66。
本领域技术人员可以理解,图6中示出的结构并不构成对图像识别装置的限定,可以包括比图示更多或更少的组件,例如还可包括传感器67。
本发明实施例所述图像识别装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例有效提高图像识别效率,降低图像识别过程中消耗的计算资源。
可以理解的是,如果上述实施例中的图像识别方法以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体 现出来,该计算机软件产品存储在一个存储介质中,执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、磁碟或者光盘等各种可以存储程序代码的介质。
基于此,本发明实施例还提供了一种计算机可读存储介质,存储有图像识别程序,所述图像识别程序被处理器执行时如上任意一实施例所述图像识别方法的步骤。
本发明实施例所述计算机可读存储介质的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例有效提高图像识别效率,降低图像识别过程中消耗的计算资源。
此外,还需要说明的是,本申请的图像识别装置可部署于基于FPGA的神经网络加速应用或者AI加速芯片的软件平台中,在神经网络前向推理过程中实现结构化压缩剪枝,无需额外的微调步骤,其简化的压缩流程可以减少压缩过程本身的计算量和时延,有利于神经网络压缩技术的部署,进而促进基于FPGA的深度学习在边缘计算等资源受限场景中应用落实与推广。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可 以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
以上对本申请所提供的一种图像识别方法、装置及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。

Claims (10)

  1. 一种图像识别方法,其特征在于,包括:
    预先将样本图像数据集输入至原始神经网络模型;对所述原始神经网络模型的每一个卷积层,以所述样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值,再计算重构误差最小的输入通道组合并裁剪冗余的输入通道,作为所述当前层的压缩结果;将各卷积层的压缩结果拼接生成图像识别模型;
    将获取的待识别图像输入至所述图像识别模型中,得到所述待识别图像的图像识别结果。
  2. 根据权利要求1所述的图像识别方法,其特征在于,所述以所述样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值包括:
    根据所述样本图像数据集中各样本数据在所述当前层的各输入通道的特征图确定所述当前层各输入通道的重要性;
    根据各输入通道的重要性为各输入通道设置重要性加权系数;
    通过计算各输入通道加权后的重要性函数及其和函数确定每个输入通道的采样概率;
    按照每个输入通道的采样概率对相应输入通道进行多轮采样,每轮依照所述采样概率对所述当前层的输入通道集合进行多次采样得到一个核集,计算并累加通道核集对应的特征图重构误差,通过计算最小化特征图重构误差的优化函数以获取所述当前层的卷积核权重的更新值。
  3. 根据权利要求2所述的图像识别方法,其特征在于,所述通过计算各输入通道加权后的重要性函数及其和函数确定每个输入通道的采样概率包括:
    调用预先存储的重要性函数关系式计算各输入通道加权后的重要性函数,所述重要性函数关系式为:
    s i(x)=w i(x)·g i(x);
    第i个输入通道的采样概率p i为p i=s i(x)/t;
    其中,s i(x)为第i个输入通道加权后的重要性函数,w i(x)为第i个输入 通道的重要性加权系数,g i(x)为第i个输入通道的初始重要性函数,
    Figure PCTCN2021089861-appb-100001
    Figure PCTCN2021089861-appb-100002
    为所述样本图像数据集X中各样本数据在第i输入通道的特征图x i的Frobenius范数的最大值,m l-1为所述原始神经网络模型的第l-1卷积层的输出通道总数,
    Figure PCTCN2021089861-appb-100003
    为所有输入通道的特征图Frobenius范数的最大值的均值,a l为所述原始神经网络模型的第l卷积层目标要达到的压缩后输入通道数;t为所述和函数,
    Figure PCTCN2021089861-appb-100004
  4. 根据权利要求3所述的图像识别方法,其特征在于,所述通过计算最小化特征图重构误差的优化函数以获取所述当前层的卷积核权重的更新值包括:
    调用预先存储的权值更新关系式得到所述当前层的卷积核权重的更新值,所述权值更新关系式为:
    Figure PCTCN2021089861-appb-100005
    式中,Y k为未压缩卷积核在第k输出通道的输出特征图,K为所述当前层卷积核输出通道总数,
    Figure PCTCN2021089861-appb-100006
    为对卷积核的K个输出通道分别计算特征图重构误差并汇总,
    Figure PCTCN2021089861-appb-100007
    为对输入所述样本图像数据集的输入通道组合进行R轮独立采样并累加每次采样结果的特征图重构误差,
    Figure PCTCN2021089861-appb-100008
    代表Frobenius范数,
    Figure PCTCN2021089861-appb-100009
    为所述当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值、作为基于核集的卷积核权值估计操作的求解目标,S为输入所述样本图像数据集的C个输入通道中采样到的a个输入通道组成的核集,
    Figure PCTCN2021089861-appb-100010
    为核集S中每个输入通道的特征图x i在卷积核对应输出通道的输出特征图之和,*为卷积操作。
  5. 根据权利要求1至4任意一项所述的图像识别方法,其特征在于,所述计算重构误差最小的输入通道组合并裁剪冗余的输入通道包括:
    根据所述当前层卷积核更新后的权值确定所述当前层各输入通道被选中的概率,并将各输入通道被选中的概率组成命中概率集;
    基于所述命中概率集计算输出特征图重构误差最小的输入通道组合,并根据所述输入通道组合移除未被选择的输入通道;
    计算压缩后卷积核的输出特征图,以将所述当前层压缩后的卷积核与所述样本图像数据集的卷积结果作为所述当前层的下一个卷积层的输入数据。
  6. 根据权利要求5所述的图像识别方法,其特征在于,所述基于所述命中概率集计算输出特征图重构误差最小的输入通道组合包括:
    调用预先存储的优化函数关系式计算输出特征图重构误差最小的输入通道组合γ,所述优化函数关系式为:
    Figure PCTCN2021089861-appb-100011
    式中,Y为所述原始神经网络模型在所述当前层的输出特征图,K为所述当前层卷积核输出通道总数,γ i为第i通道是否被选择,γ i取值为0或1,γ={γ i|1≤i≤C}为由C个输入通道采样出a个输入通道的最佳采样结果,满足条件||γ|| 0=a,X i为所述样本图像数据集在所述当前层的第i输入通道的特征图,
    Figure PCTCN2021089861-appb-100012
    为所述当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值,
    Figure PCTCN2021089861-appb-100013
    代表Frobenius范数,(1-q)γ为将各个输入通道的采样概率构造的向量作为惩罚因子加入了优化目标函数中的惩罚项,q为所述命中概率集。
  7. 根据权利要求5所述的图像识别方法,其特征在于,所述根据所述当前层卷积核更新后的权值确定所述当前层各输入通道被选中的概率包括:
    调用预先存储的选择概率关系式计算所述当前层各输入通道被选中的概率,所述选择概率关系式为:
    Figure PCTCN2021089861-appb-100014
    式中,
    Figure PCTCN2021089861-appb-100015
    w i(x)为第i个输入通道的重要性加权系数,m l-1为所述原始神经网络模型的第l-1卷积层的输出通道总数,X为所述样本图像数据集,x i为所述样本图像数据集中各样本数据在第i输入通道的特征图,K为所述当前层卷积核输出通道总数,
    Figure PCTCN2021089861-appb-100016
    为所述当前层卷积核在第i输入通道和第k输出通道的权重张量的更新值,
    Figure PCTCN2021089861-appb-100017
    代表Frobenius范数,n l为所述原始神经网络模型在l层的输入通道总数。
  8. 一种图像识别装置,其特征在于,包括:
    神经网络模型压缩模块,用于将样本图像数据集输入至原始神经网络模型;对所述原始神经网络模型的每一个卷积层,以所述样本图像数据集在当前层的特征图为重构目标,利用核集构建方法先获取卷积核的更新权值,再计算重构误差最小的输入通道组合并裁剪冗余的输入通道,作为所述当前层的压缩结果;将各卷积层的压缩结果拼接生成图像识别模型;
    图像识别模块,用于将获取的待识别图像输入至所述图像识别模型中,得到所述待识别图像的图像识别结果。
  9. 一种图像识别装置,其特征在于,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如权利要求1至7任一项所述图像识别方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有图像识别程序,所述图像识别程序被处理器执行时实现如权利要求1至7任一项所述图像识别方法的步骤。
PCT/CN2021/089861 2020-09-17 2021-04-26 图像识别方法、装置及计算机可读存储介质 WO2022057262A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/011,512 US20230334632A1 (en) 2020-09-17 2021-04-26 Image recognition method and device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010980176.3 2020-09-17
CN202010980176.3A CN112116001B (zh) 2020-09-17 2020-09-17 图像识别方法、装置及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2022057262A1 true WO2022057262A1 (zh) 2022-03-24

Family

ID=73799926

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/089861 WO2022057262A1 (zh) 2020-09-17 2021-04-26 图像识别方法、装置及计算机可读存储介质

Country Status (3)

Country Link
US (1) US20230334632A1 (zh)
CN (1) CN112116001B (zh)
WO (1) WO2022057262A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116206188A (zh) * 2023-05-04 2023-06-02 浪潮电子信息产业股份有限公司 一种图像识别方法、系统、设备及存储介质
WO2023217263A1 (zh) * 2022-05-13 2023-11-16 北京字跳网络技术有限公司 数据处理方法、装置、设备及介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112116001B (zh) * 2020-09-17 2022-06-07 苏州浪潮智能科技有限公司 图像识别方法、装置及计算机可读存储介质
CN113197582B (zh) * 2021-04-27 2022-03-25 浙江大学 一种高通用性的心电数据压缩感知系统、终端和存储介质
CN113328755B (zh) * 2021-05-11 2022-09-16 内蒙古工业大学 一种面向边缘计算的压缩数据传输方法
CN113255907B (zh) * 2021-05-20 2024-05-14 广州广电运通金融电子股份有限公司 一种网络模型经裁剪以进行图像识别的方法
CN113705775A (zh) * 2021-07-29 2021-11-26 浪潮电子信息产业股份有限公司 一种神经网络的剪枝方法、装置、设备及存储介质
CN114154545B (zh) * 2021-12-07 2022-08-05 中国人民解放军32802部队 强互干扰条件下无人机测控信号智能识别方法
WO2023193169A1 (en) * 2022-04-07 2023-10-12 Huawei Technologies Co.,Ltd. Method and apparatus for distributed inference

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100529A1 (en) * 2013-10-08 2015-04-09 Qualcomm Incorporated Compiling network descriptions to multiple platforms
CN107680044A (zh) * 2017-09-30 2018-02-09 福建帝视信息科技有限公司 一种图像超分辨率卷积神经网络加速计算方法
CN109978142A (zh) * 2019-03-29 2019-07-05 腾讯科技(深圳)有限公司 神经网络模型的压缩方法和装置
CN112116001A (zh) * 2020-09-17 2020-12-22 苏州浪潮智能科技有限公司 图像识别方法、装置及计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230296B (zh) * 2017-11-30 2023-04-07 腾讯科技(深圳)有限公司 图像特征的识别方法和装置、存储介质、电子装置
CN110008961B (zh) * 2019-04-01 2023-05-12 深圳华付技术股份有限公司 文字实时识别方法、装置、计算机设备及存储介质
CN110363086A (zh) * 2019-06-11 2019-10-22 中国科学院自动化研究所南京人工智能芯片创新研究院 图数据识别方法、装置、计算机设备和存储介质
CN110298394B (zh) * 2019-06-18 2024-04-05 中国平安财产保险股份有限公司 一种图像识别方法和相关装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150100529A1 (en) * 2013-10-08 2015-04-09 Qualcomm Incorporated Compiling network descriptions to multiple platforms
CN107680044A (zh) * 2017-09-30 2018-02-09 福建帝视信息科技有限公司 一种图像超分辨率卷积神经网络加速计算方法
CN109978142A (zh) * 2019-03-29 2019-07-05 腾讯科技(深圳)有限公司 神经网络模型的压缩方法和装置
CN112116001A (zh) * 2020-09-17 2020-12-22 苏州浪潮智能科技有限公司 图像识别方法、装置及计算机可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023217263A1 (zh) * 2022-05-13 2023-11-16 北京字跳网络技术有限公司 数据处理方法、装置、设备及介质
CN116206188A (zh) * 2023-05-04 2023-06-02 浪潮电子信息产业股份有限公司 一种图像识别方法、系统、设备及存储介质

Also Published As

Publication number Publication date
US20230334632A1 (en) 2023-10-19
CN112116001A (zh) 2020-12-22
CN112116001B (zh) 2022-06-07

Similar Documents

Publication Publication Date Title
WO2022057262A1 (zh) 图像识别方法、装置及计算机可读存储介质
CN109949255B (zh) 图像重建方法及设备
CN108133188B (zh) 一种基于运动历史图像与卷积神经网络的行为识别方法
US20190228268A1 (en) Method and system for cell image segmentation using multi-stage convolutional neural networks
CN109685819B (zh) 一种基于特征增强的三维医学图像分割方法
CN112150821B (zh) 轻量化车辆检测模型构建方法、系统及装置
US20220108157A1 (en) Hardware architecture for introducing activation sparsity in neural network
CN110765860A (zh) 摔倒判定方法、装置、计算机设备及存储介质
EP4080416A1 (en) Adaptive search method and apparatus for neural network
CN112613581A (zh) 一种图像识别方法、系统、计算机设备和存储介质
CN110826596A (zh) 一种基于多尺度可变形卷积的语义分割方法
JP5591178B2 (ja) テスト画像内のオブジェクトを分類するための方法
CN113065645B (zh) 孪生注意力网络、图像处理方法和装置
CN113420651B (zh) 深度卷积神经网络的轻量化方法、系统及目标检测方法
CN113240079A (zh) 一种模型训练方法及装置
CN113283368B (zh) 一种模型训练方法、人脸属性分析方法、装置及介质
CN114897782B (zh) 基于生成式对抗网络的胃癌病理切片图像分割预测方法
CN114882278A (zh) 一种基于注意力机制和迁移学习的轮胎花纹分类方法和装置
CN114298289A (zh) 一种数据处理的方法、数据处理设备及存储介质
CN115294405B (zh) 农作物病害分类模型的构建方法、装置、设备及介质
CN116740808A (zh) 基于深度学习目标检测和图像分类的动物行为识别方法
CN111860601A (zh) 预测大型真菌种类的方法及装置
CN116246110A (zh) 基于改进胶囊网络的图像分类方法
WO2022127819A1 (en) Sequence processing for a dataset with frame dropping
CN113836804A (zh) 基于卷积神经网络的动物识别模型建立方法及其应用系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21868100

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21868100

Country of ref document: EP

Kind code of ref document: A1