CN117911794A - Model obtaining method and device for image classification, electronic equipment and storage medium - Google Patents

Model obtaining method and device for image classification, electronic equipment and storage medium Download PDF

Info

Publication number
CN117911794A
CN117911794A CN202410302535.8A CN202410302535A CN117911794A CN 117911794 A CN117911794 A CN 117911794A CN 202410302535 A CN202410302535 A CN 202410302535A CN 117911794 A CN117911794 A CN 117911794A
Authority
CN
China
Prior art keywords
image
model
image classification
pixel values
trained
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410302535.8A
Other languages
Chinese (zh)
Other versions
CN117911794B (en
Inventor
范亮
汤坚
张磊
王秋媚
柯燕萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhongke Zhi Tour Technology Co ltd
Original Assignee
Guangzhou Zhongke Zhi Tour Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Zhongke Zhi Tour Technology Co ltd filed Critical Guangzhou Zhongke Zhi Tour Technology Co ltd
Priority to CN202410302535.8A priority Critical patent/CN117911794B/en
Publication of CN117911794A publication Critical patent/CN117911794A/en
Application granted granted Critical
Publication of CN117911794B publication Critical patent/CN117911794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a model obtaining method, a model obtaining device, an electronic device and a storage medium for image classification. Extracting the characteristics of images in an image dataset to obtain a characteristic diagram of each image; normalizing the mean value and the variance of the feature map of each channel by using a normalized preset weight range and an initial step length to obtain normalized image data; quantizing the continuous pixel values of the normalized image data, and converting the quantized continuous pixel values into discrete pixel values; comparing the original image with the quantized image to determine the signal-to-noise ratio of the image; under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the signal-to-noise ratio of the image meets the image quality condition, and a discrete value of the image is output; and inputting the discrete values of the images into the image classification model to be trained for training, and obtaining the trained image classification model. The model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.

Description

Model obtaining method and device for image classification, electronic equipment and storage medium
Technical Field
The present application relates to the field of image classification technologies, and in particular, to a method and apparatus for obtaining a model for image classification, an electronic device, and a storage medium.
Background
As deep neural network technology is becoming mature in the field of model acquisition methods for image classification, some deep neural networks are currently used as image classification models. In the model obtaining method for image classification, the discrete value obtained after the input image is discretized is used for obtaining the image classification result by using the image classification model. Because the discrete value of the image in the current model obtaining method for image classification is inaccurate, the accuracy of the subsequent image classification result is further affected.
Disclosure of Invention
The application provides an improved model obtaining method, device, electronic equipment and storage medium for image classification.
The application provides a model obtaining method for image classification, which comprises the following steps:
extracting the characteristics of images in the image dataset to obtain a characteristic diagram of each image;
Normalizing the mean and variance of the feature map of each channel by using a preset weight range to obtain normalized image data;
quantizing continuous pixel values of the normalized image data by using an initial step length and the preset weight range, and converting the continuous pixel values into discrete pixel values to obtain quantized images;
comparing the original image with the quantized image to determine an image signal-to-noise ratio;
under the condition that the image signal-to-noise ratio does not meet the image quality condition, adjusting the initial step length, returning to continue to execute the use of the preset weight range, normalizing the mean value and the variance of the feature image of each channel, and obtaining normalized image data until the image signal-to-noise ratio meets the image quality condition, and outputting the discrete value of the image;
Inputting the discrete value of the image into an image classification model to be trained for training, and obtaining a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.
Further, the step of adjusting the initial step size when the image signal-to-noise ratio does not meet the image quality condition until the image signal-to-noise ratio meets the image quality condition, the step of outputting the discrete value of the image includes:
gradually reducing the value of the step length;
And updating the initial step length by using the reduced step length, re-executing the preset weight range, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.
Further, the normalizing the mean and the variance of the feature map of each channel by using a preset weight range to obtain normalized image data includes:
Determining the mean value and variance of the feature map through the pixel value of each channel of the feature map of each image;
Normalizing the feature map by using a normalized preset weight range, an initial step length, a mean value and a variance to obtain normalized image data;
determining whether the normalized image data is in a different range;
when the normalized image data contains image data in different ranges, the image data in different ranges are mapped into a preset weight range by linear transformation, and the data range of the image data is adjusted.
Further, the extracting the features of the images in the image dataset to obtain feature graphs of the images includes:
Extracting features of images in the image dataset by using MobileNetV to obtain an image feature map; wherein the MobileNetV depth-separable convolutions include a first layer depth-separable convolution, a second layer depth-separable convolution, and a third depth-separable convolution; the first layer depth separable convolution is a1 x1 convolution with a nonlinear function, the second layer depth separable convolution is a depth separable layer for filtering, and the third depth separable convolution is a point-wise separable layer for combining operations.
Further, the quantizing the continuous pixel values of the normalized image data using the initial step size and the preset weight range, converting the quantized continuous pixel values into discrete pixel values, and obtaining a quantized image, including:
Quantizing each pixel value in the normalized image data by using a quantization step length and a quantization level to obtain quantized pixel values; the quantization step size is obtained by dividing the range of consecutive pixel values by the quantization level; the quantization level is used for mapping the pixel value of the image to a preset discrete pixel value range;
determining whether the quantized pixel values are in different ranges; and
When the quantized pixel values contain pixel values in different ranges, the quantized pixel values in different ranges are subjected to linear transformation, the data range of the quantized pixel values in different ranges is adjusted, and the quantized pixel values in different ranges are mapped into a preset discrete pixel value range.
Further, the step of inputting the discrete value of the image into the image classification model to be trained to perform training, to obtain a trained image classification model, includes:
inputting the discrete value of the image into an image classification model to be trained so as to output an image classification result;
verifying and testing the image classification model to be trained by using the image classification result to obtain the reasoning precision of the image classification model to be trained;
When the reasoning precision is smaller than the preset reasoning precision, the model parameters of the image classification model to be trained are adjusted, the discrete values of the images are continuously input into the adjusted image classification model to be trained, so that an image classification result is output, and the trained image classification model is obtained until the reasoning precision is larger than the preset reasoning precision.
The present application provides a model obtaining apparatus for image classification, comprising:
the feature extraction module is used for extracting the features of the images in the image dataset and obtaining feature graphs of the images;
The normalization module is used for normalizing the mean and the variance of the feature map of each channel by using a preset weight range to obtain normalized image data;
The quantization module is used for quantizing the continuous pixel values of the normalized image data by using the initial step length and the preset weight range, and converting the continuous pixel values into discrete pixel values to obtain quantized images;
the comparison module is used for comparing the original image with the quantized image to determine the signal-to-noise ratio of the image;
The adjustment module is used for adjusting the initial step length under the condition that the image signal-to-noise ratio does not meet the image quality condition, returning to continuously execute the using preset weight range, normalizing the mean value and the variance of the feature image of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the image signal-to-noise ratio meets the image quality condition; and
The training module is used for inputting the discrete value of the image into the image classification model to be trained to train so as to obtain a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.
Further, the adjusting module is specifically configured to: gradually reducing the value of the step length; and updating the initial step length by using the reduced step length, re-executing the preset weight range, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.
The present application provides a model acquisition system for image classification comprising one or more processors for implementing a method as claimed in any one of the preceding claims.
The present application provides a computer readable storage medium having stored thereon a program which, when executed by a processor, implements a method as claimed in any one of the preceding claims.
The present application provides a computer program product comprising a computer program/instruction which, when executed by a processor, implements a method as claimed in any one of the preceding claims.
In some embodiments, the model obtaining method for image classification of the present application extracts features of images in an image dataset to obtain feature maps of the images; normalizing the mean value and the variance of the feature map of each channel by using a normalized preset weight range and an initial step length to obtain normalized image data; quantizing the continuous pixel values of the normalized image data, and converting the quantized continuous pixel values into discrete pixel values to obtain quantized images; comparing the original image with the quantized image to determine the signal-to-noise ratio of the image; under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the signal-to-noise ratio of the image meets the image quality condition, and a discrete value of the image is output; inputting the discrete value of the image into an image classification model to be trained for training, and obtaining a trained image classification model; the image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.
In the embodiment of the application, under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition. Thus, the initial step length is gradually adjusted, so that a more accurate discrete value of the image is obtained, and a more accurate model for image classification is obtained, so that a more accurate image classification result is obtained. Thus, the discrete accuracy of the image is higher, and the accuracy of the image classification result is higher.
Drawings
FIG. 1 is a flow chart of a model acquisition method for image classification according to an embodiment of the present application;
FIG. 2 is a schematic diagram showing a structure of a model obtaining apparatus for image classification according to an embodiment of the present application;
Fig. 3 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The embodiments described in the following exemplary embodiments are not intended to represent all embodiments consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.
It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.
In order to solve the technical problem that the dispersion accuracy of the related technology on images is low, the embodiment of the application provides a model obtaining method for image classification. Extracting the characteristics of images in the image dataset to obtain a characteristic diagram of each image; normalizing the mean value and the variance of the feature map of each channel by using a normalized preset weight range and an initial step length to obtain normalized image data; quantizing the continuous pixel values of the normalized image data, and converting the quantized continuous pixel values into discrete pixel values to obtain quantized images; comparing the original image with the quantized image to determine the signal-to-noise ratio of the image; under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the signal-to-noise ratio of the image meets the image quality condition, and a discrete value of the image is output; inputting the discrete value of the image into an image classification model to be trained for training, and obtaining a trained image classification model; the image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.
In the embodiment of the application, under the condition that the signal-to-noise ratio of the image does not meet the image quality condition, the initial step length is adjusted until the signal-to-noise ratio of the image meets the image quality condition, and the discrete value of the image is output. Thus, the initial step length is gradually adjusted, so that a more accurate discrete value of the image is obtained, and a more accurate model for image classification is obtained, so that a more accurate image classification result is obtained. Thus, the discrete accuracy of the image is higher, and the accuracy of the image classification result is higher.
Fig. 1 is a flowchart of a model obtaining method for image classification according to an embodiment of the present application.
As shown in fig. 1, the model obtaining method for image classification may include, but is not limited to, the following steps 110 to 160:
and 110, extracting the characteristics of the images in the image data set, and obtaining a characteristic diagram of each image.
In some embodiments of the present step 110, features of images in the image dataset may be directly extracted to obtain feature maps for each image. These images in the image dataset may also be referred to as original images. Further, step 110 may use MobileNetV to extract the features of the image in the image dataset to obtain an image feature map; wherein the MobileNetV depth-separable convolutions include a first layer depth-separable convolution, a second layer depth-separable convolution, and a third depth-separable convolution; the first layer depth separable convolution is a 1 x1 convolution with a nonlinear function, the second layer depth separable convolution is a depth separable layer for filtering, and the third depth separable convolution is a point-wise separable layer for combining operations. As such, it significantly reduces computation and model size by preserving feature representation.
The above-described depth convolution consists of the same number of filters as the number of channels in the provided input, with a separate filter applied for each channel, the resulting output image preserving its depth.
The point-by-point convolution of the above-described point-by-point separable layers is typically used to increase or decrease the depth of the image, the kernel of any convolution filter being defined by [ p ]q/>N ] where p is the height, q is the width of the filter kernel, and N is the number of channels. The point-by-point convolution filter is composed of kernel size [1/>1/>N represents that each time a convolution is performed using the point-wise filter output, a kernel is generated.
The depth separable convolution of MobileNetV is specifically implemented as follows: 1. deep convolution: a convolution kernel, known as a depth convolution or depth separable convolution, is first applied to each channel of the input image. It performs a convolution operation on each channel and then integrates the information between the channels. This helps to reduce the number of parameters and improve efficiency. 2. Following the depth convolution MobileNet uses a point-wise convolution to merge the feature maps of the different channels and generate the final output feature map. Wherein the point-by-point convolution uses a 1x1 convolution kernel to help learn the relationship between channels. Through these components, the network is aided in learning the features and patterns of the image.
MobileNetV2 herein has a core layer consisting of a depth convolution and a point-by-point convolution, each convolution operation being followed by a batch normalization layer and nonlinear activated ReLU6 in MobileNetV and H-swish in MobileNetV3, respectively.
Considering the input original image x within a small batch, with channel (d) and element (m) in each channel, the batch normalization transform in the deep convolutional layer will be applied to each channel independently and expressed as:
wherein, And/>Variance and mean over small lot k,/>, respectivelyRepresenting the scale after the iterative layers,Representing the displacement after iterative multilayers,/>Represents the average value after iteration k times of batchization,/>Representing variance after k-time batch iteration,/>Representing error value,/>Representing the normalized value of x,/>Representing an independent representation of each channel after a batch normalization transformation, a being the scale, b being the displacement, taking care to avoid dividing anomalies by zeros, and/>Representing the normalized value of x.
The batch normalization equation can be restated herein as follows:
wherein, Representing convolution weights,/>,/>The normalized deviation is indicated as such,
To reduce the computational cost, for each channel k,Will be shared with weights and folded into a single convolution operation.
In the TensorFlow implementation, the minimum and maximum weight values for the computation scale are obtained jointly from all channels. Without correlation crossings in the channels, deep convolutions in the mobile network are more likely to produce zero values in one channel, which results in zero variance for that particular channel. From the above equation, it is noted that zero variance increases significantlyThe value, which will increase the output y. This in turn provides a discrete set of points within the quantization range for rarely occurring outliers and results in large quantization errors.
ReLU6 in MobileNetV2 architecture is used as a nonlinear activation function, given by the following equation:
In ReLU6, 6 is a very arbitrary number that introduces nonlinearity. While ReLU6 can validate the model to learn sparse features earlier, clipping signals from layers at an early stage may result in a signal distribution that is not quantization friendly. Thus, outliers are handled using the ReLU activation function instead of ReLU6 in depth and point-wise convolution blocks.
Of course, in some embodiments of the above step 110, the image in the image dataset may be preprocessed, such as denoising, and features of the preprocessed image may be extracted, so as to obtain a feature map of each image.
And 120, normalizing the mean and the variance of the feature map of each channel by using a preset weight range to obtain normalized image data.
The preset weight range is used for adjusting errors in the feature extraction process. The preset weight range may be, for example, -1, 1.
And 130, quantizing the continuous pixel values of the normalized image data by using the initial step length and the preset weight range, and converting the quantized pixel values into discrete pixel values to obtain a quantized image.
The initial step length and the preset weight range can be set according to the user requirement. The initial step size may be a large value, for example 0.1.
And 140, comparing the original image with the quantized image to determine the signal-to-noise ratio of the image.
The above image signal-to-noise ratio is used to represent the relationship of the image signal and noise. A higher signal-to-noise ratio indicates a relatively stronger signal component in the image and a relatively weaker noise component, generally corresponding to better image quality.
The original image refers to an original real image. The original image captured by the image capturing apparatus is subjected to no preprocessing such as denoising, and the image of the original information is retained.
The method further comprises the steps of: and judging whether the signal-to-noise ratio of the image meets the image quality condition. If not, that is, if the image signal-to-noise ratio does not meet the image quality condition, step 150 is performed; if yes, namely, if the signal-to-noise ratio of the image meets the image quality condition, outputting the discrete value of the image.
Step 150, in the case that the image signal-to-noise ratio does not meet the image quality condition, the initial step is adjusted, and the step 120 is continuously executed until the discrete value of the image is output in the case that the image signal-to-noise ratio meets the image quality condition.
The step 150 may further include reducing the value of the step size; and updating the initial step length by using the reduced step length, re-executing the normalized preset weight range and the initial step length, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.
Step 160, inputting the discrete value of the image into the image classification model to be trained for training, and obtaining a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.
The method further comprises the following steps: and continuing to acquire the input image to be classified, processing the discrete value of the image to be classified by using the method, and inputting the discrete value of the image to be classified into a trained image classification model to output an image classification result of the image to be classified. The images to be classified are images input according to the requirements of users.
In the embodiment of the application, the signal-to-noise ratio of the image does not meet the feedback result of the image quality condition, and the normalized data is adjusted to obtain more accurate discrete values of the image, so as to obtain more accurate models for image classification, and obtain more accurate image classification results. Thus, the discrete accuracy of the image is higher, and the accuracy of the image classification result is higher.
As shown in connection with fig. 1, the above step 120 further includes the steps of ①-④ of:
① . And determining the mean value and the variance of the feature map through the pixel value of each channel of the feature map of each image. Further, for each channel (e.g., red, green, and blue channels in an RGB image), the mean and variance of the feature map are calculated. These values may be obtained by calculating the mean and standard deviation of the pixel values for each channel in the feature map.
② . And normalizing the feature map by using a normalized preset weight range, an initial step length, a mean value and a variance to obtain normalized image data.
In step ② above, the feature map is normalized using the mean and variance. For each pixel, some existing formulas may be used for normalization.
③ . It is determined whether the normalized image data is within a different range.
④ . When the normalized image data contains image data in different ranges, the image data in different ranges are mapped into a preset weight range by linear transformation, and the data range of the image data is adjusted. Through the above step ④, the normalized image data, which may still be in a different range from the normalized image data, may be adjusted by linear transformation, and the normalized value is mapped into a preset weight range.
In this way, normalized image data can be obtained and adjusted to within a preset weight range by the above steps of the calculation step ① of the mean and variance of each channel, the step ② of the normalization process, and the adjustment steps ③ and ④ of the data range.
The batch normalization layer is removed in the depth convolution block, preventing large quantization loss due to zero variance problem in the depth convolution. The above step 120 may further include handling outliers by enabling L2 regularization during training, rather than applying clipping after training to handle anomalies. Regularization penalizes any large magnitude outlier weight values during training. It implements a more uniform weight distribution during training, thereby preventing any outliers from occurring. The training process refers to the normalization process.
Wherein, the L2 regularization formula is as follows:
Where cost is an outlier processed by L2 regularization, yi is a nonlinear function output value, xi is a feature value of an input original image, wi is a regularization weight for xi, and i is an accumulated amount.
And processing outliers using a ReLU activation function in place of ReLU6 in the depth and point-wise convolution blocks. Due to the nature of the depth convolution and its combined effect on the quantization loss of the batch normalization, the quantization model will have an accuracy error. By regularizing during training to handle outliers, regularization penalizes any large magnitude outlier weights during training, which enforces a more uniform weight distribution during training, thereby preventing any outlier occurrence.
In the embodiment of the application, the original image characteristics are mapped into the preset weight range by normalizing and adjusting the data range, so that the stability and the convergence rate of the model are improved. Normalization can make the distribution of the feature map more similar to the standard normal distribution, and helps the model learn the features better. The data range can be adjusted to map the values of the feature map to a proper range, so that the model is easier to converge in the training process, and the performance and training effect of the model are improved.
For image quality, the more quantization levels, the better the image quality. Higher quantization levels generally result in lower image quality. Because higher quantization levels may combine or discard detail information in the original image, thereby blurring or distorting the image. Lower quantization levels generally result in higher image quality. Because the lower quantization levels can more accurately preserve detail information in the original image, the image is made visually clearer and closer to the original image.
As shown in fig. 1, the quantization process of step 130 further includes the steps of (1) - (4) below:
(1) Quantizing each pixel value in the normalized image data by using a quantization step length and the target quantization level to obtain quantized pixel values; the quantization step size is obtained by dividing the range of consecutive pixel values by the quantization level; the quantization level is used to map pixel values of the image to a preset discrete range of pixel values.
The application specifically determines the initial step size and quantization level as follows: for an input original image with a uniform probability density function (Probability Density Function, PDF) feature, the initial step size "δ" for the quantizer to achieve the minimum distortion error is:
where delta is the step size depending on the weight range, And/>Respectively the maximum and minimum weights in the weight matrix, k is the number of bits required for the fixed point representation of the weights, the input original image x is obtained from q (x), q (x) is an integer representation of the corresponding floating point value of x, z is the offset value recovering the 0.f weight (the weight close to zero) in the quantization range of zero. z is the offset value to recover 0. f is a weight (a weight close to zero) in the quantization range, also referred to as zero.
When uniformly symmetric quantized, the ranges on both sides of 0 need to be the same. Thus, it will be calculated by considering half the floating point range.
The present quantization model aims at quantizing 8 bits (k), in unified symmetric metric quantization,Is in the range of 127 and,And thus zero (z) is 0, respectively. Typically the quantization level has 8 bits (256 discrete values), 4 bits (16 discrete values), etc. Optionally, the quantization level of the application selects 8 bits, which not only can provide high-precision images and lower noise and distortion, but also has less storage and transmission overhead requirements.
(2) It is determined whether the quantized pixel values are in different ranges.
(3) And when the quantized pixel values contain pixel values in different ranges, the quantized pixel values in different ranges are subjected to linear transformation, the data range of the quantized pixel values in different ranges is adjusted, and the quantized pixel values in different ranges are mapped into a preset discrete pixel value range.
The quantized image of x is obtained using the following formula:
wherein, Is the quantized image of x, clip is a uniform symmetric quantization function, N is the height of the output tensor, a is the minimum value of the quantization range, and b is the maximum value of the quantization range.
Thus, the scale factor and the weight distribution range,/>) Directly proportional. The quantized value of x is represented by/>Or (b)Outliers that represent and are significantly different from the weight distribution average will affect the scaling factor. The effect of such a contrast factor will result in a set of discrete points being provided for rarely occurring outliers within the quantization range.
In the embodiment of the application, the high-precision image has low noise and distortion and low storage and transmission overhead requirements.
As shown in fig. 1, the step 140 may further include inputting the original image x and the quantized image thereofThe difference between them is called quantization error, in an ideal analog-to-digital converter, the input original image has a uniform (Probability Density Function, PDF), the quantization error is uniformly distributed, so the image signal-to-noise ratio (Signal Noise Ratio, SQNR) can be obtained according to the following formula:
where Q is the number of quantization bits. The signal source in the image signal-to-noise ratio reasoning process can be a symmetrical source The symmetric source/>It is possible to use: /(I)Or other 0,/>Is a modeling function of a symmetric source,/>Is the largest weight in the weight matrix.
When the obtained initial step length isThe formula for obtaining the image signal-to-noise ratio is as follows through the following reasoning process:
wherein, For scaling factor,/>As a scale factor, M is a fixed length code, and for a fixed length code using K bits, m=2k.
As shown in connection with fig. 1, the above step 160 may further include the following steps <1> to <3 >;
And (1), inputting the discrete value of the image into an image classification model to be trained so as to output an image classification result.
And 2, verifying and testing the image classification model to be trained by using the image classification result to obtain the reasoning precision of the image classification model to be trained.
And 3, when the reasoning precision is smaller than the preset reasoning precision, adjusting the model parameters of the image classification model to be trained, and returning to continuously execute the step of inputting the discrete value of the image into the adjusted image classification model to be trained so as to output an image classification result until the reasoning precision is larger than the preset reasoning precision, thereby obtaining the trained image classification model. Thus, by adjusting parameters of the image classification model to be trained, an image classification model with more accurate reasoning precision is obtained, and the trained image classification model can also be called a trained image classification model.
The preset reasoning precision can be set according to the user requirements.
The model parameters of the image classification model may include, but are not limited to, initial step size delta including minimum distortion error, convolution weightsNormalized deviation/>. The image classification model herein may be a VGG (Visual Geometry Group, set of visual geometries) deep convolutional neural network. Further, the VGG deep convolutional neural network herein may be a VGG-19 neural network, which is used to implement a model acquisition method for image classification. VGG-19 neural network size 599mb, alexnet (Alex et al 2012) is a very powerful model that can achieve high accuracy, but deleting any convolution layers greatly reduces performance, resNet aims to overcome the gradient vanishing problem in large DNNs with jump connections (Deep Neural Networks ), squeezeNet introduces a squeeze and expand module with 1 x1 convolutions to reduce the number of parameters. In contrast, the MobileNet architecture utilizes a jump connection (Res Net) and a 1 x1 convolution (Squeeze Net) to build a DNN model that is much smaller in size than the other models.
The above image classification model is ultimately aimed at reducing image size and loss, and the number of MACs (MultiAction Computer, multi-function computers) that are needed to compute the multiply-accumulate needed for inference on a single image plays a critical role in running DNN on an embedded platform, due to processing and memory constraints, the number of feature parameters determines the size of the model. The fully connected layer is a conventional method of neural networks, typically used in all DNNs in the final stage to introduce classification tasks into the network. Convolution operations involve the process of shifting a filter over an input image in specific steps to extract advanced features. Here, the number of MACs depends on the size of the filter used and the corresponding image input. And taking the result in the final model prediction test set as the model inference precision of the model obtained by the method.
In embodiments of the present application, the present application discusses various DNN architectures and their dependencies with ports on embedded platforms. The model acquisition method task for image classification runs on MobileNet architecture, and shows better results, namely less power consumption and memory occupation, by keeping better reasoning accuracy. The QF MobileNet architecture is proposed over the baseline MobileNetV and MobileNetV3 architectures to overcome quantization loss and improve inference accuracy. With the proposed QF MobileNet architecture, promising improvements are made to feature parameters, MAC operation, and less quantization loss.
Based on the same application concept as the above method, as shown in fig. 2, an embodiment of the present application provides a model obtaining apparatus for image classification, which may include the following modules:
A feature extraction module 31, configured to extract features of images in the image dataset, and obtain feature graphs of the images;
The normalization module 32 is configured to normalize the mean and the variance of the feature map of each channel by using a preset weight range, so as to obtain normalized image data;
A quantization module 33, configured to quantize the continuous pixel values of the normalized image data using an initial step size and the preset weight range, and convert the quantized pixel values into discrete pixel values, so as to obtain a quantized image;
a comparison module 34, configured to compare the original image with the quantized image, and determine an image signal-to-noise ratio;
An adjustment module 35, configured to adjust the initial step size if the image signal-to-noise ratio does not meet an image quality condition, until a discrete value of the image is output if the image signal-to-noise ratio meets the image quality condition;
The training module 36 is configured to input the discrete value of the image to the image classification model to be trained to obtain a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.
In some embodiments, the adjusting module is specifically configured to: gradually reducing the value of the step length; and updating the initial step length by using the reduced step length, re-executing the normalized preset weight range and the initial step length, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.
The implementation process of the functions and actions of each module in the device is specifically detailed in the implementation process of the corresponding steps in the method, so that the same technical effects can be achieved, and the detailed description is omitted here.
Fig. 3 is a block diagram of an electronic device 50 according to an embodiment of the present application.
As shown in fig. 3, the electronic device 50 comprises one or more processors 51 for implementing the model acquisition method for image classification as described above.
In some embodiments, electronic device 50 may include storage medium 59. For example, the computer-readable storage medium may store a program that can be called by the processor 51, and may include a nonvolatile storage medium. In some embodiments, electronic device 50 may include memory 58 and interface 57. In some embodiments, electronic device 50 may also include other hardware depending on the actual application.
The computer-readable storage medium of the embodiment of the present application has stored thereon a program for implementing the model obtaining method for image classification as described above when executed by the processor 51.
The present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, magnetic disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-readable storage media include both non-transitory and non-transitory, removable and non-removable media, and information storage may be implemented in any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer readable storage media include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.
The model obtaining method for image classification is applied to electronic equipment. The electronic device may be a PC (Personal Computer ) terminal device. The PC-side device may include, but is not limited to, a desktop computer, a tablet computer, or a notebook computer.
The foregoing description of the preferred embodiments is provided for the purpose of illustration only, and is not intended to limit the scope of the disclosure, since any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the disclosure are intended to be included within the scope of the disclosure.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the phrase "comprising one … …" does not exclude the presence of additional identical elements in a process, method, article, or apparatus that comprises the depicted element.

Claims (10)

1. A model acquisition method for image classification, comprising:
extracting the characteristics of images in the image dataset to obtain a characteristic diagram of each image;
Normalizing the mean and variance of the feature map of each channel by using a preset weight range to obtain normalized image data;
quantizing continuous pixel values of the normalized image data by using an initial step length and the preset weight range, and converting the continuous pixel values into discrete pixel values to obtain quantized images;
comparing the original image with the quantized image to determine an image signal-to-noise ratio;
under the condition that the image signal-to-noise ratio does not meet the image quality condition, adjusting the initial step length, returning to continue to execute the use of the preset weight range, normalizing the mean value and the variance of the feature image of each channel, and obtaining normalized image data until the image signal-to-noise ratio meets the image quality condition, and outputting the discrete value of the image;
Inputting the discrete value of the image into an image classification model to be trained for training, and obtaining a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.
2. The model obtaining method for image classification according to claim 1, wherein said adjusting the initial step size in a case where the image signal-to-noise ratio does not satisfy an image quality condition until the image signal-to-noise ratio satisfies an image quality condition, outputting a discrete value of the image, comprises:
gradually reducing the value of the step length;
And updating the initial step length by using the reduced step length, re-executing the preset weight range, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.
3. The model obtaining method for image classification according to claim 1 or 2, wherein the normalizing the feature map of each channel with a preset weight range to obtain normalized image data includes:
Determining the mean value and variance of the feature map through the pixel value of each channel of the feature map of each image;
Normalizing the feature map by using a normalized preset weight range, an initial step length, a mean value and a variance to obtain normalized image data;
determining whether the normalized image data is in a different range;
when the normalized image data contains image data in different ranges, the image data in different ranges are mapped into a preset weight range by linear transformation, and the data range of the image data is adjusted.
4. The model obtaining method for image classification according to claim 1, wherein the extracting features of images in the image dataset to obtain feature maps of the respective images includes:
Extracting features of images in the image dataset by using MobileNetV to obtain an image feature map; wherein the MobileNetV depth-separable convolutions include a first layer depth-separable convolution, a second layer depth-separable convolution, and a third depth-separable convolution; the first layer depth separable convolution is a1 x1 convolution with a nonlinear function, the second layer depth separable convolution is a depth separable layer for filtering, and the third depth separable convolution is a point-wise separable layer for combining operations.
5. The method for obtaining the model for image classification according to claim 1, wherein said quantizing the continuous pixel values of the normalized image data into discrete pixel values using the initial step size and the preset weight range, and obtaining the quantized image, comprises:
Quantizing each pixel value in the normalized image data by using a quantization step length and a quantization level to obtain quantized pixel values; the quantization step size is obtained by dividing the range of consecutive pixel values by the quantization level; the quantization level is used for mapping the pixel value of the image to a preset discrete pixel value range;
determining whether the quantized pixel values are in different ranges; and
When the quantized pixel values contain pixel values in different ranges, the quantized pixel values in different ranges are subjected to linear transformation, the data range of the quantized pixel values in different ranges is adjusted, and the quantized pixel values in different ranges are mapped into a preset discrete pixel value range.
6. The method for obtaining a model for image classification according to claim 1, wherein the inputting the discrete values of the image into the image classification model to be trained for training, to obtain a trained image classification model, comprises:
inputting the discrete value of the image into an image classification model to be trained so as to output an image classification result;
verifying and testing the image classification model to be trained by using the image classification result to obtain the reasoning precision of the image classification model to be trained;
When the reasoning precision is smaller than the preset reasoning precision, the model parameters of the image classification model to be trained are adjusted, the discrete values of the images are continuously input into the adjusted image classification model to be trained, so that an image classification result is output, and the trained image classification model is obtained until the reasoning precision is larger than the preset reasoning precision.
7. A model obtaining apparatus for image classification, comprising:
the feature extraction module is used for extracting the features of the images in the image dataset and obtaining feature graphs of the images;
The normalization module is used for normalizing the mean and the variance of the feature map of each channel by using a preset weight range to obtain normalized image data;
The quantization module is used for quantizing the continuous pixel values of the normalized image data by using the initial step length and the preset weight range, and converting the continuous pixel values into discrete pixel values to obtain quantized images;
the comparison module is used for comparing the original image with the quantized image to determine the signal-to-noise ratio of the image;
The adjustment module is used for adjusting the initial step length under the condition that the image signal-to-noise ratio does not meet the image quality condition, returning to continuously execute the using preset weight range, normalizing the mean value and the variance of the feature image of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the image signal-to-noise ratio meets the image quality condition; and
The training module is used for inputting the discrete value of the image into the image classification model to be trained to train so as to obtain a trained image classification model; the trained image classification model is used for acquiring an image to be classified as input so as to output an image classification result of the image to be classified.
8. The model acquisition device for image classification as claimed in claim 7, wherein the adjustment module is specifically configured to: gradually reducing the value of the step length; and updating the initial step length by using the reduced step length, re-executing the preset weight range, normalizing the mean value and the variance of the feature map of each channel, and obtaining normalized image data until the discrete value of the image is output under the condition that the signal-to-noise ratio of the image meets the image quality condition.
9. An electronic device comprising one or more processors configured to implement the model acquisition method for image classification of any one of claims 1-6.
10. A computer-readable storage medium, having stored thereon a program which, when executed by a processor, implements the model obtaining method for image classification according to any one of claims 1 to 6.
CN202410302535.8A 2024-03-15 2024-03-15 Model obtaining method and device for image classification, electronic equipment and storage medium Active CN117911794B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410302535.8A CN117911794B (en) 2024-03-15 2024-03-15 Model obtaining method and device for image classification, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410302535.8A CN117911794B (en) 2024-03-15 2024-03-15 Model obtaining method and device for image classification, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN117911794A true CN117911794A (en) 2024-04-19
CN117911794B CN117911794B (en) 2024-06-25

Family

ID=90687458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410302535.8A Active CN117911794B (en) 2024-03-15 2024-03-15 Model obtaining method and device for image classification, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117911794B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780482A (en) * 2017-01-08 2017-05-31 广东工业大学 A kind of classification method of medical image
US20190012559A1 (en) * 2017-07-06 2019-01-10 Texas Instruments Incorporated Dynamic quantization for deep neural network inference system and method
CN114528924A (en) * 2022-01-27 2022-05-24 山东浪潮科学研究院有限公司 Inference method, device, equipment and medium of image classification model
WO2022141258A1 (en) * 2020-12-30 2022-07-07 深圳市优必选科技股份有限公司 Image classification method, computer device, and storage medium
CN115880516A (en) * 2021-09-27 2023-03-31 马上消费金融股份有限公司 Image classification method, image classification model training method and related equipment
CN116580251A (en) * 2023-06-09 2023-08-11 山东云海国创云计算装备产业创新中心有限公司 Image classification method, device, equipment and computer readable storage medium
CN117292191A (en) * 2023-09-28 2023-12-26 中科云谷科技有限公司 Model optimization method, electronic equipment and computer readable storage medium
CN117456256A (en) * 2023-11-07 2024-01-26 江苏大学 Image classification model training and image classification method and system based on enhanced data reconstruction
CN117670709A (en) * 2023-11-17 2024-03-08 上海睿触科技有限公司 Denoising method, denoising device, denoising equipment and denoising readable storage medium for medical image

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106780482A (en) * 2017-01-08 2017-05-31 广东工业大学 A kind of classification method of medical image
US20190012559A1 (en) * 2017-07-06 2019-01-10 Texas Instruments Incorporated Dynamic quantization for deep neural network inference system and method
WO2022141258A1 (en) * 2020-12-30 2022-07-07 深圳市优必选科技股份有限公司 Image classification method, computer device, and storage medium
CN115880516A (en) * 2021-09-27 2023-03-31 马上消费金融股份有限公司 Image classification method, image classification model training method and related equipment
CN114528924A (en) * 2022-01-27 2022-05-24 山东浪潮科学研究院有限公司 Inference method, device, equipment and medium of image classification model
CN116580251A (en) * 2023-06-09 2023-08-11 山东云海国创云计算装备产业创新中心有限公司 Image classification method, device, equipment and computer readable storage medium
CN117292191A (en) * 2023-09-28 2023-12-26 中科云谷科技有限公司 Model optimization method, electronic equipment and computer readable storage medium
CN117456256A (en) * 2023-11-07 2024-01-26 江苏大学 Image classification model training and image classification method and system based on enhanced data reconstruction
CN117670709A (en) * 2023-11-17 2024-03-08 上海睿触科技有限公司 Denoising method, denoising device, denoising equipment and denoising readable storage medium for medical image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陆敏俊;王慈;: "基于相关性的JPEG高压缩图像峰值信噪比盲估计", 计算机工程, no. 08, 31 December 2017 (2017-12-31) *

Also Published As

Publication number Publication date
CN117911794B (en) 2024-06-25

Similar Documents

Publication Publication Date Title
CN109949255B (en) Image reconstruction method and device
WO2021062029A1 (en) Joint pruning and quantization scheme for deep neural networks
CN109902745A (en) A kind of low precision training based on CNN and 8 integers quantization inference methods
CN112508125A (en) Efficient full-integer quantization method of image detection model
WO2016182671A1 (en) Fixed point neural network based on floating point neural network quantization
CN110175641B (en) Image recognition method, device, equipment and storage medium
CN109242092B (en) Image processing method and device, electronic equipment and storage medium
TW202141358A (en) Method and apparatus for image restoration, storage medium and terminal
CN110276451A (en) One kind being based on the normalized deep neural network compression method of weight
WO2022111002A1 (en) Method and apparatus for training neural network, and computer readable storage medium
US20200380360A1 (en) Method and apparatus with neural network parameter quantization
CN114677548A (en) Neural network image classification system and method based on resistive random access memory
CN112085175B (en) Data processing method and device based on neural network calculation
CN112561050B (en) Neural network model training method and device
CN116188878A (en) Image classification method, device and storage medium based on neural network structure fine adjustment
CN117893455A (en) Image brightness and contrast adjusting method
CN113424200A (en) Methods, apparatuses and computer program products for video encoding and video decoding
CN117911794B (en) Model obtaining method and device for image classification, electronic equipment and storage medium
US20230394312A1 (en) Pruning activations and weights of neural networks with programmable thresholds
CN115221932A (en) Spectrum recovery method and device based on neural network and electronic equipment
CN111382854B (en) Convolutional neural network processing method, device, equipment and storage medium
CN112418388A (en) Method and device for realizing deep convolutional neural network processing
CN115147283A (en) Image reconstruction method, device, equipment and medium
CN112508049B (en) Clustering method based on group sparse optimization
CN115115845B (en) Image semantic content understanding method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant