CN117911794A

CN117911794A - Model obtaining method and device for image classification, electronic equipment and storage medium

Info

Publication number: CN117911794A
Application number: CN202410302535.8A
Authority: CN
Inventors: 范亮; 汤坚; 张磊; 王秋媚; 柯燕萍
Original assignee: Guangzhou Zhongke Zhi Tour Technology Co ltd
Current assignee: Guangzhou Zhongke Zhi Tour Technology Co ltd
Priority date: 2024-03-15
Filing date: 2024-03-15
Publication date: 2024-04-19
Anticipated expiration: 2044-03-15
Also published as: CN117911794B

Abstract

The application provides a model obtaining method, a model obtaining device, an electronic device and a storage medium for image classification. Extracting the characteristics of images in an image dataset to obtain a characteristic diagram of each image; normalizing the mean value and the variance of the feature map of each channel by using a normalized preset weight range and an initial step length to obtain normalized image data; quantizing the continuous pixel values of the normalized image data, and converting the quantized continuous pixel values into discrete pixel values; comparing the original image with the quantized image to determine the signal-to-noise ratio of the image; under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the signal-to-noise ratio of the image meets the image quality condition, and a discrete value of the image is output; and inputting the discrete values of the images into the image classification model to be trained for training, and obtaining the trained image classification model. The model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.

Description

Model obtaining method and device for image classification, electronic equipment and storage medium

Technical Field

The present application relates to the field of image classification technologies, and in particular, to a method and apparatus for obtaining a model for image classification, an electronic device, and a storage medium.

Background

As deep neural network technology is becoming mature in the field of model acquisition methods for image classification, some deep neural networks are currently used as image classification models. In the model obtaining method for image classification, the discrete value obtained after the input image is discretized is used for obtaining the image classification result by using the image classification model. Because the discrete value of the image in the current model obtaining method for image classification is inaccurate, the accuracy of the subsequent image classification result is further affected.

Disclosure of Invention

The application provides an improved model obtaining method, device, electronic equipment and storage medium for image classification.

The application provides a model obtaining method for image classification, which comprises the following steps:

extracting the characteristics of images in the image dataset to obtain a characteristic diagram of each image;

Normalizing the mean and variance of the feature map of each channel by using a preset weight range to obtain normalized image data;

quantizing continuous pixel values of the normalized image data by using an initial step length and the preset weight range, and converting the continuous pixel values into discrete pixel values to obtain quantized images;

comparing the original image with the quantized image to determine an image signal-to-noise ratio;

under the condition that the image signal-to-noise ratio does not meet the image quality condition, adjusting the initial step length, returning to continue to execute the use of the preset weight range, normalizing the mean value and the variance of the feature image of each channel, and obtaining normalized image data until the image signal-to-noise ratio meets the image quality condition, and outputting the discrete value of the image;

Inputting the discrete value of the image into an image classification model to be trained for training, and obtaining a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.

Further, the step of adjusting the initial step size when the image signal-to-noise ratio does not meet the image quality condition until the image signal-to-noise ratio meets the image quality condition, the step of outputting the discrete value of the image includes:

gradually reducing the value of the step length;

And updating the initial step length by using the reduced step length, re-executing the preset weight range, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.

Further, the normalizing the mean and the variance of the feature map of each channel by using a preset weight range to obtain normalized image data includes:

Determining the mean value and variance of the feature map through the pixel value of each channel of the feature map of each image;

Normalizing the feature map by using a normalized preset weight range, an initial step length, a mean value and a variance to obtain normalized image data;

determining whether the normalized image data is in a different range;

when the normalized image data contains image data in different ranges, the image data in different ranges are mapped into a preset weight range by linear transformation, and the data range of the image data is adjusted.

Further, the extracting the features of the images in the image dataset to obtain feature graphs of the images includes:

Extracting features of images in the image dataset by using MobileNetV to obtain an image feature map; wherein the MobileNetV depth-separable convolutions include a first layer depth-separable convolution, a second layer depth-separable convolution, and a third depth-separable convolution; the first layer depth separable convolution is a1 x1 convolution with a nonlinear function, the second layer depth separable convolution is a depth separable layer for filtering, and the third depth separable convolution is a point-wise separable layer for combining operations.

Further, the quantizing the continuous pixel values of the normalized image data using the initial step size and the preset weight range, converting the quantized continuous pixel values into discrete pixel values, and obtaining a quantized image, including:

Quantizing each pixel value in the normalized image data by using a quantization step length and a quantization level to obtain quantized pixel values; the quantization step size is obtained by dividing the range of consecutive pixel values by the quantization level; the quantization level is used for mapping the pixel value of the image to a preset discrete pixel value range;

determining whether the quantized pixel values are in different ranges; and

When the quantized pixel values contain pixel values in different ranges, the quantized pixel values in different ranges are subjected to linear transformation, the data range of the quantized pixel values in different ranges is adjusted, and the quantized pixel values in different ranges are mapped into a preset discrete pixel value range.

Further, the step of inputting the discrete value of the image into the image classification model to be trained to perform training, to obtain a trained image classification model, includes:

inputting the discrete value of the image into an image classification model to be trained so as to output an image classification result;

verifying and testing the image classification model to be trained by using the image classification result to obtain the reasoning precision of the image classification model to be trained;

When the reasoning precision is smaller than the preset reasoning precision, the model parameters of the image classification model to be trained are adjusted, the discrete values of the images are continuously input into the adjusted image classification model to be trained, so that an image classification result is output, and the trained image classification model is obtained until the reasoning precision is larger than the preset reasoning precision.

The present application provides a model obtaining apparatus for image classification, comprising:

the feature extraction module is used for extracting the features of the images in the image dataset and obtaining feature graphs of the images;

The normalization module is used for normalizing the mean and the variance of the feature map of each channel by using a preset weight range to obtain normalized image data;

The quantization module is used for quantizing the continuous pixel values of the normalized image data by using the initial step length and the preset weight range, and converting the continuous pixel values into discrete pixel values to obtain quantized images;

the comparison module is used for comparing the original image with the quantized image to determine the signal-to-noise ratio of the image;

The adjustment module is used for adjusting the initial step length under the condition that the image signal-to-noise ratio does not meet the image quality condition, returning to continuously execute the using preset weight range, normalizing the mean value and the variance of the feature image of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the image signal-to-noise ratio meets the image quality condition; and

The training module is used for inputting the discrete value of the image into the image classification model to be trained to train so as to obtain a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.

Further, the adjusting module is specifically configured to: gradually reducing the value of the step length; and updating the initial step length by using the reduced step length, re-executing the preset weight range, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.

The present application provides a model acquisition system for image classification comprising one or more processors for implementing a method as claimed in any one of the preceding claims.

The present application provides a computer readable storage medium having stored thereon a program which, when executed by a processor, implements a method as claimed in any one of the preceding claims.

The present application provides a computer program product comprising a computer program/instruction which, when executed by a processor, implements a method as claimed in any one of the preceding claims.

In some embodiments, the model obtaining method for image classification of the present application extracts features of images in an image dataset to obtain feature maps of the images; normalizing the mean value and the variance of the feature map of each channel by using a normalized preset weight range and an initial step length to obtain normalized image data; quantizing the continuous pixel values of the normalized image data, and converting the quantized continuous pixel values into discrete pixel values to obtain quantized images; comparing the original image with the quantized image to determine the signal-to-noise ratio of the image; under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the signal-to-noise ratio of the image meets the image quality condition, and a discrete value of the image is output; inputting the discrete value of the image into an image classification model to be trained for training, and obtaining a trained image classification model; the image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.

In the embodiment of the application, under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition. Thus, the initial step length is gradually adjusted, so that a more accurate discrete value of the image is obtained, and a more accurate model for image classification is obtained, so that a more accurate image classification result is obtained. Thus, the discrete accuracy of the image is higher, and the accuracy of the image classification result is higher.

Drawings

FIG. 1 is a flow chart of a model acquisition method for image classification according to an embodiment of the present application;

FIG. 2 is a schematic diagram showing a structure of a model obtaining apparatus for image classification according to an embodiment of the present application;

Fig. 3 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the following exemplary embodiments are not intended to represent all embodiments consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.

It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.

In order to solve the technical problem that the dispersion accuracy of the related technology on images is low, the embodiment of the application provides a model obtaining method for image classification. Extracting the characteristics of images in the image dataset to obtain a characteristic diagram of each image; normalizing the mean value and the variance of the feature map of each channel by using a normalized preset weight range and an initial step length to obtain normalized image data; quantizing the continuous pixel values of the normalized image data, and converting the quantized continuous pixel values into discrete pixel values to obtain quantized images; comparing the original image with the quantized image to determine the signal-to-noise ratio of the image; under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the signal-to-noise ratio of the image meets the image quality condition, and a discrete value of the image is output; inputting the discrete value of the image into an image classification model to be trained for training, and obtaining a trained image classification model; the image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.

In the embodiment of the application, under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the signal-to-noise ratio of the image meets the image quality condition, and the discrete value of the image is output. Thus, the initial step length is gradually adjusted, so that a more accurate discrete value of the image is obtained, and a more accurate model for image classification is obtained, so that a more accurate image classification result is obtained. Thus, the discrete accuracy of the image is higher, and the accuracy of the image classification result is higher.

Fig. 1 is a flowchart of a model obtaining method for image classification according to an embodiment of the present application.

As shown in fig. 1, the model obtaining method for image classification may include, but is not limited to, the following steps 110 to 160:

and 110, extracting the characteristics of the images in the image data set, and obtaining a characteristic diagram of each image.

In some embodiments of the present step 110, features of images in the image dataset may be directly extracted to obtain feature maps for each image. These images in the image dataset may also be referred to as original images. Further, step 110 may use MobileNetV to extract the features of the image in the image dataset to obtain an image feature map; wherein the MobileNetV depth-separable convolutions include a first layer depth-separable convolution, a second layer depth-separable convolution, and a third depth-separable convolution; the first layer depth separable convolution is a 1 x1 convolution with a nonlinear function, the second layer depth separable convolution is a depth separable layer for filtering, and the third depth separable convolution is a point-wise separable layer for combining operations. As such, it significantly reduces computation and model size by preserving feature representation.

The above-described depth convolution consists of the same number of filters as the number of channels in the provided input, with a separate filter applied for each channel, the resulting output image preserving its depth.

The point-by-point convolution of the above-described point-by-point separable layers is typically used to increase or decrease the depth of the image, the kernel of any convolution filter being defined by [ p ]q/>N ] where p is the height, q is the width of the filter kernel, and N is the number of channels. The point-by-point convolution filter is composed of kernel size [1/>1/>N represents that each time a convolution is performed using the point-wise filter output, a kernel is generated.

The depth separable convolution of MobileNetV is specifically implemented as follows: 1. deep convolution: a convolution kernel, known as a depth convolution or depth separable convolution, is first applied to each channel of the input image. It performs a convolution operation on each channel and then integrates the information between the channels. This helps to reduce the number of parameters and improve efficiency. 2. Following the depth convolution MobileNet uses a point-wise convolution to merge the feature maps of the different channels and generate the final output feature map. Wherein the point-by-point convolution uses a 1x1 convolution kernel to help learn the relationship between channels. Through these components, the network is aided in learning the features and patterns of the image.

MobileNetV2 herein has a core layer consisting of a depth convolution and a point-by-point convolution, each convolution operation being followed by a batch normalization layer and nonlinear activated ReLU6 in MobileNetV and H-swish in MobileNetV3, respectively.

Considering the input original image x within a small batch, with channel (d) and element (m) in each channel, the batch normalization transform in the deep convolutional layer will be applied to each channel independently and expressed as:

wherein, And/>Variance and mean over small lot k,/>, respectivelyRepresenting the scale after the iterative layers,Representing the displacement after iterative multilayers,/>Represents the average value after iteration k times of batchization,/>Representing variance after k-time batch iteration,/>Representing error value,/>Representing the normalized value of x,/>Representing an independent representation of each channel after a batch normalization transformation, a being the scale, b being the displacement, taking care to avoid dividing anomalies by zeros, and/>Representing the normalized value of x.

The batch normalization equation can be restated herein as follows:

wherein, Representing convolution weights,/>，/>The normalized deviation is indicated as such,。

To reduce the computational cost, for each channel k,Will be shared with weights and folded into a single convolution operation.

In the TensorFlow implementation, the minimum and maximum weight values for the computation scale are obtained jointly from all channels. Without correlation crossings in the channels, deep convolutions in the mobile network are more likely to produce zero values in one channel, which results in zero variance for that particular channel. From the above equation, it is noted that zero variance increases significantlyThe value, which will increase the output y. This in turn provides a discrete set of points within the quantization range for rarely occurring outliers and results in large quantization errors.

ReLU6 in MobileNetV2 architecture is used as a nonlinear activation function, given by the following equation:

In ReLU6, 6 is a very arbitrary number that introduces nonlinearity. While ReLU6 can validate the model to learn sparse features earlier, clipping signals from layers at an early stage may result in a signal distribution that is not quantization friendly. Thus, outliers are handled using the ReLU activation function instead of ReLU6 in depth and point-wise convolution blocks.

Of course, in some embodiments of the above step 110, the image in the image dataset may be preprocessed, such as denoising, and features of the preprocessed image may be extracted, so as to obtain a feature map of each image.

And 120, normalizing the mean and the variance of the feature map of each channel by using a preset weight range to obtain normalized image data.

The preset weight range is used for adjusting errors in the feature extraction process. The preset weight range may be, for example, -1, 1.

And 130, quantizing the continuous pixel values of the normalized image data by using the initial step length and the preset weight range, and converting the quantized pixel values into discrete pixel values to obtain a quantized image.

The initial step length and the preset weight range can be set according to the user requirement. The initial step size may be a large value, for example 0.1.

And 140, comparing the original image with the quantized image to determine the signal-to-noise ratio of the image.

The above image signal-to-noise ratio is used to represent the relationship of the image signal and noise. A higher signal-to-noise ratio indicates a relatively stronger signal component in the image and a relatively weaker noise component, generally corresponding to better image quality.

The original image refers to an original real image. The original image captured by the image capturing apparatus is subjected to no preprocessing such as denoising, and the image of the original information is retained.

The method further comprises the steps of: and judging whether the signal-to-noise ratio of the image meets the image quality condition. If not, that is, if the image signal-to-noise ratio does not meet the image quality condition, step 150 is performed; if yes, namely, if the signal-to-noise ratio of the image meets the image quality condition, outputting the discrete value of the image.

Step 150, in the case that the image signal-to-noise ratio does not meet the image quality condition, the initial step is adjusted, and the step 120 is continuously executed until the discrete value of the image is output in the case that the image signal-to-noise ratio meets the image quality condition.

The step 150 may further include reducing the value of the step size; and updating the initial step length by using the reduced step length, re-executing the normalized preset weight range and the initial step length, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.

Step 160, inputting the discrete value of the image into the image classification model to be trained for training, and obtaining a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.

The method further comprises the following steps: and continuing to acquire the input image to be classified, processing the discrete value of the image to be classified by using the method, and inputting the discrete value of the image to be classified into a trained image classification model to output an image classification result of the image to be classified. The images to be classified are images input according to the requirements of users.

In the embodiment of the application, the signal-to-noise ratio of the image does not meet the feedback result of the image quality condition, and the normalized data is adjusted to obtain more accurate discrete values of the image, so as to obtain more accurate models for image classification, and obtain more accurate image classification results. Thus, the discrete accuracy of the image is higher, and the accuracy of the image classification result is higher.

As shown in connection with fig. 1, the above step 120 further includes the steps of ①-④ of:

① . And determining the mean value and the variance of the feature map through the pixel value of each channel of the feature map of each image. Further, for each channel (e.g., red, green, and blue channels in an RGB image), the mean and variance of the feature map are calculated. These values may be obtained by calculating the mean and standard deviation of the pixel values for each channel in the feature map.

② . And normalizing the feature map by using a normalized preset weight range, an initial step length, a mean value and a variance to obtain normalized image data.

In step ② above, the feature map is normalized using the mean and variance. For each pixel, some existing formulas may be used for normalization.

③ . It is determined whether the normalized image data is within a different range.

④ . When the normalized image data contains image data in different ranges, the image data in different ranges are mapped into a preset weight range by linear transformation, and the data range of the image data is adjusted. Through the above step ④, the normalized image data, which may still be in a different range from the normalized image data, may be adjusted by linear transformation, and the normalized value is mapped into a preset weight range.

In this way, normalized image data can be obtained and adjusted to within a preset weight range by the above steps of the calculation step ① of the mean and variance of each channel, the step ② of the normalization process, and the adjustment steps ③ and ④ of the data range.

The batch normalization layer is removed in the depth convolution block, preventing large quantization loss due to zero variance problem in the depth convolution. The above step 120 may further include handling outliers by enabling L2 regularization during training, rather than applying clipping after training to handle anomalies. Regularization penalizes any large magnitude outlier weight values during training. It implements a more uniform weight distribution during training, thereby preventing any outliers from occurring. The training process refers to the normalization process.

Wherein, the L2 regularization formula is as follows:

Where cost is an outlier processed by L2 regularization, yi is a nonlinear function output value, xi is a feature value of an input original image, wi is a regularization weight for xi, and i is an accumulated amount.

And processing outliers using a ReLU activation function in place of ReLU6 in the depth and point-wise convolution blocks. Due to the nature of the depth convolution and its combined effect on the quantization loss of the batch normalization, the quantization model will have an accuracy error. By regularizing during training to handle outliers, regularization penalizes any large magnitude outlier weights during training, which enforces a more uniform weight distribution during training, thereby preventing any outlier occurrence.

In the embodiment of the application, the original image characteristics are mapped into the preset weight range by normalizing and adjusting the data range, so that the stability and the convergence rate of the model are improved. Normalization can make the distribution of the feature map more similar to the standard normal distribution, and helps the model learn the features better. The data range can be adjusted to map the values of the feature map to a proper range, so that the model is easier to converge in the training process, and the performance and training effect of the model are improved.

For image quality, the more quantization levels, the better the image quality. Higher quantization levels generally result in lower image quality. Because higher quantization levels may combine or discard detail information in the original image, thereby blurring or distorting the image. Lower quantization levels generally result in higher image quality. Because the lower quantization levels can more accurately preserve detail information in the original image, the image is made visually clearer and closer to the original image.

As shown in fig. 1, the quantization process of step 130 further includes the steps of (1) - (4) below:

(1) Quantizing each pixel value in the normalized image data by using a quantization step length and the target quantization level to obtain quantized pixel values; the quantization step size is obtained by dividing the range of consecutive pixel values by the quantization level; the quantization level is used to map pixel values of the image to a preset discrete range of pixel values.

The application specifically determines the initial step size and quantization level as follows: for an input original image with a uniform probability density function (Probability Density Function, PDF) feature, the initial step size "δ" for the quantizer to achieve the minimum distortion error is:

where delta is the step size depending on the weight range, And/>Respectively the maximum and minimum weights in the weight matrix, k is the number of bits required for the fixed point representation of the weights, the input original image x is obtained from q (x), q (x) is an integer representation of the corresponding floating point value of x, z is the offset value recovering the 0.f weight (the weight close to zero) in the quantization range of zero. z is the offset value to recover 0. f is a weight (a weight close to zero) in the quantization range, also referred to as zero.

When uniformly symmetric quantized, the ranges on both sides of 0 need to be the same. Thus, it will be calculated by considering half the floating point range.

The present quantization model aims at quantizing 8 bits (k), in unified symmetric metric quantization,Is in the range of 127 and,And thus zero (z) is 0, respectively. Typically the quantization level has 8 bits (256 discrete values), 4 bits (16 discrete values), etc. Optionally, the quantization level of the application selects 8 bits, which not only can provide high-precision images and lower noise and distortion, but also has less storage and transmission overhead requirements.

(2) It is determined whether the quantized pixel values are in different ranges.

(3) And when the quantized pixel values contain pixel values in different ranges, the quantized pixel values in different ranges are subjected to linear transformation, the data range of the quantized pixel values in different ranges is adjusted, and the quantized pixel values in different ranges are mapped into a preset discrete pixel value range.

The quantized image of x is obtained using the following formula:

wherein, Is the quantized image of x, clip is a uniform symmetric quantization function, N is the height of the output tensor, a is the minimum value of the quantization range, and b is the maximum value of the quantization range.

Thus, the scale factor and the weight distribution range，/>) Directly proportional. The quantized value of x is represented by/>Or (b)Outliers that represent and are significantly different from the weight distribution average will affect the scaling factor. The effect of such a contrast factor will result in a set of discrete points being provided for rarely occurring outliers within the quantization range.

In the embodiment of the application, the high-precision image has low noise and distortion and low storage and transmission overhead requirements.

As shown in fig. 1, the step 140 may further include inputting the original image x and the quantized image thereofThe difference between them is called quantization error, in an ideal analog-to-digital converter, the input original image has a uniform (Probability Density Function, PDF), the quantization error is uniformly distributed, so the image signal-to-noise ratio (Signal Noise Ratio, SQNR) can be obtained according to the following formula:

where Q is the number of quantization bits. The signal source in the image signal-to-noise ratio reasoning process can be a symmetrical source The symmetric source/>It is possible to use: /(I)Or other 0,/>Is a modeling function of a symmetric source,/>Is the largest weight in the weight matrix.

When the obtained initial step length isThe formula for obtaining the image signal-to-noise ratio is as follows through the following reasoning process:

wherein, For scaling factor,/>As a scale factor, M is a fixed length code, and for a fixed length code using K bits, m=2k.

As shown in connection with fig. 1, the above step 160 may further include the following steps <1> to <3 >;

And (1), inputting the discrete value of the image into an image classification model to be trained so as to output an image classification result.

And 2, verifying and testing the image classification model to be trained by using the image classification result to obtain the reasoning precision of the image classification model to be trained.

And 3, when the reasoning precision is smaller than the preset reasoning precision, adjusting the model parameters of the image classification model to be trained, and returning to continuously execute the step of inputting the discrete value of the image into the adjusted image classification model to be trained so as to output an image classification result until the reasoning precision is larger than the preset reasoning precision, thereby obtaining the trained image classification model. Thus, by adjusting parameters of the image classification model to be trained, an image classification model with more accurate reasoning precision is obtained, and the trained image classification model can also be called a trained image classification model.

The preset reasoning precision can be set according to the user requirements.

The model parameters of the image classification model may include, but are not limited to, initial step size delta including minimum distortion error, convolution weightsNormalized deviation/>. The image classification model herein may be a VGG (Visual Geometry Group, set of visual geometries) deep convolutional neural network. Further, the VGG deep convolutional neural network herein may be a VGG-19 neural network, which is used to implement a model acquisition method for image classification. VGG-19 neural network size 599mb, alexnet (Alex et al 2012) is a very powerful model that can achieve high accuracy, but deleting any convolution layers greatly reduces performance, resNet aims to overcome the gradient vanishing problem in large DNNs with jump connections (Deep Neural Networks ), squeezeNet introduces a squeeze and expand module with 1 x1 convolutions to reduce the number of parameters. In contrast, the MobileNet architecture utilizes a jump connection (Res Net) and a 1 x1 convolution (Squeeze Net) to build a DNN model that is much smaller in size than the other models.

The above image classification model is ultimately aimed at reducing image size and loss, and the number of MACs (MultiAction Computer, multi-function computers) that are needed to compute the multiply-accumulate needed for inference on a single image plays a critical role in running DNN on an embedded platform, due to processing and memory constraints, the number of feature parameters determines the size of the model. The fully connected layer is a conventional method of neural networks, typically used in all DNNs in the final stage to introduce classification tasks into the network. Convolution operations involve the process of shifting a filter over an input image in specific steps to extract advanced features. Here, the number of MACs depends on the size of the filter used and the corresponding image input. And taking the result in the final model prediction test set as the model inference precision of the model obtained by the method.

In embodiments of the present application, the present application discusses various DNN architectures and their dependencies with ports on embedded platforms. The model acquisition method task for image classification runs on MobileNet architecture, and shows better results, namely less power consumption and memory occupation, by keeping better reasoning accuracy. The QF MobileNet architecture is proposed over the baseline MobileNetV and MobileNetV3 architectures to overcome quantization loss and improve inference accuracy. With the proposed QF MobileNet architecture, promising improvements are made to feature parameters, MAC operation, and less quantization loss.

Based on the same application concept as the above method, as shown in fig. 2, an embodiment of the present application provides a model obtaining apparatus for image classification, which may include the following modules:

A feature extraction module 31, configured to extract features of images in the image dataset, and obtain feature graphs of the images;

The normalization module 32 is configured to normalize the mean and the variance of the feature map of each channel by using a preset weight range, so as to obtain normalized image data;

A quantization module 33, configured to quantize the continuous pixel values of the normalized image data using an initial step size and the preset weight range, and convert the quantized pixel values into discrete pixel values, so as to obtain a quantized image;

a comparison module 34, configured to compare the original image with the quantized image, and determine an image signal-to-noise ratio;

An adjustment module 35, configured to adjust the initial step size if the image signal-to-noise ratio does not meet an image quality condition, until a discrete value of the image is output if the image signal-to-noise ratio meets the image quality condition;

The training module 36 is configured to input the discrete value of the image to the image classification model to be trained to obtain a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.

In some embodiments, the adjusting module is specifically configured to: gradually reducing the value of the step length; and updating the initial step length by using the reduced step length, re-executing the normalized preset weight range and the initial step length, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.

The implementation process of the functions and actions of each module in the device is specifically detailed in the implementation process of the corresponding steps in the method, so that the same technical effects can be achieved, and the detailed description is omitted here.

Fig. 3 is a block diagram of an electronic device 50 according to an embodiment of the present application.

As shown in fig. 3, the electronic device 50 comprises one or more processors 51 for implementing the model acquisition method for image classification as described above.

In some embodiments, electronic device 50 may include storage medium 59. For example, the computer-readable storage medium may store a program that can be called by the processor 51, and may include a nonvolatile storage medium. In some embodiments, electronic device 50 may include memory 58 and interface 57. In some embodiments, electronic device 50 may also include other hardware depending on the actual application.

The computer-readable storage medium of the embodiment of the present application has stored thereon a program for implementing the model obtaining method for image classification as described above when executed by the processor 51.

The present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, magnetic disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-readable storage media include both non-transitory and non-transitory, removable and non-removable media, and information storage may be implemented in any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer readable storage media include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.

The model obtaining method for image classification is applied to electronic equipment. The electronic device may be a PC (Personal Computer ) terminal device. The PC-side device may include, but is not limited to, a desktop computer, a tablet computer, or a notebook computer.

The foregoing description of the preferred embodiments is provided for the purpose of illustration only, and is not intended to limit the scope of the disclosure, since any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the disclosure are intended to be included within the scope of the disclosure.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the phrase "comprising one … …" does not exclude the presence of additional identical elements in a process, method, article, or apparatus that comprises the depicted element.

Claims

1. A model acquisition method for image classification, comprising:

2. The model obtaining method for image classification according to claim 1, wherein said adjusting the initial step size in a case where the image signal-to-noise ratio does not satisfy an image quality condition until the image signal-to-noise ratio satisfies an image quality condition, outputting a discrete value of the image, comprises:

gradually reducing the value of the step length;

3. The model obtaining method for image classification according to claim 1 or 2, wherein the normalizing the feature map of each channel with a preset weight range to obtain normalized image data includes:

determining whether the normalized image data is in a different range;

4. The model obtaining method for image classification according to claim 1, wherein the extracting features of images in the image dataset to obtain feature maps of the respective images includes:

5. The method for obtaining the model for image classification according to claim 1, wherein said quantizing the continuous pixel values of the normalized image data into discrete pixel values using the initial step size and the preset weight range, and obtaining the quantized image, comprises:

determining whether the quantized pixel values are in different ranges; and

6. The method for obtaining a model for image classification according to claim 1, wherein the inputting the discrete values of the image into the image classification model to be trained for training, to obtain a trained image classification model, comprises:

7. A model obtaining apparatus for image classification, comprising:

8. The model acquisition device for image classification as claimed in claim 7, wherein the adjustment module is specifically configured to: gradually reducing the value of the step length; and updating the initial step length by using the reduced step length, re-executing the preset weight range, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.

9. An electronic device comprising one or more processors configured to implement the model acquisition method for image classification of any one of claims 1-6.

10. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements the model obtaining method for image classification according to any one of claims 1 to 6.