CN110689056A

CN110689056A - Classification method and device, equipment and storage medium

Info

Publication number: CN110689056A
Application number: CN201910854477.9A
Authority: CN
Inventors: 孙莹莹
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2019-09-10
Filing date: 2019-09-10
Publication date: 2020-01-14

Abstract

The embodiment of the application discloses a classification method, a classification device, classification equipment and a storage medium, wherein the method comprises the following steps: extracting the features of the image to be processed to obtain a feature map; determining the weight of each characteristic channel in the characteristic diagram; recalibrating the features of the feature map according to the weights; and classifying the images to be processed by using the re-calibrated characteristics.

Description

Classification method and device, equipment and storage medium

Technical Field

The embodiment of the application relates to the field of computer vision, and relates to but is not limited to a classification method, a classification device, classification equipment and a storage medium.

Background

The traditional image classification algorithms mainly include two types, one is to extract features and classify by using a Scale-invariant feature transform (SIFT) algorithm. And the other method is that a neural network full connection mode and a back propagation algorithm are adopted, namely a Gaussian distribution or random initial value method is utilized, the score value of the output of the current network is calculated in an iterative mode, and then the parameters between the previous layers are continuously adjusted according to the difference of the score values between the current prediction label and the actual label until the parameter weight and the bias of the whole model converge.

The traditional SIFT algorithm has large error and needs to combine a large amount of complicated image preprocessing. And another method, such as a BP (Error Back Propagation) algorithm, has the problems of gradient diffusion and explosion, insufficient image training samples, local optimization and the like. Meanwhile, as the mass of non-label RGB (red green Blue ) images in the internet are increased explosively, the BP algorithm in the neural network cannot meet the requirement of the classification of the mass non-label RGB images in the future. In addition, the existing RGB image classification method also affects the classification accuracy to a certain extent due to adverse effects caused by the influence of aspects such as image definition, brightness, contrast and the like. Therefore, how to overcome the above problems becomes a focus of research by those skilled in the art.

Disclosure of Invention

In view of this, embodiments of the present application provide a classification method and apparatus, a device, and a storage medium.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides a classification method, where the method includes:

extracting the features of the image to be processed to obtain a feature map;

determining the weight of each characteristic channel in the characteristic diagram;

recalibrating the features of the feature map according to the weights;

and classifying the images to be processed by using the re-calibrated characteristics.

In an embodiment of the present application, the extracting features of the image to be processed to obtain a feature map includes:

performing feature extraction on the image to be processed by utilizing each convolutional layer in the trained convolutional neural network model to obtain a feature map corresponding to each convolutional layer;

correspondingly, the determining the weight of each feature channel in the feature map comprises: determining the weight of each characteristic channel in the characteristic diagram corresponding to each convolution layer;

correspondingly, the recalibrating the features of the feature map according to the weights comprises:

carrying out normalization processing on the weight of each characteristic channel;

and weighting the normalized weight to each characteristic channel to realize the recalibration of the characteristics of the characteristic graph corresponding to each convolution layer.

In this embodiment of the present application, the determining the weight of each feature channel in the feature map includes:

performing global pooling on the feature map to realize feature compression of spatial dimensions;

and constructing the correlation among the characteristic channels corresponding to the compressed characteristics, and generating a weight for each characteristic channel according to the correlation among the characteristic channels.

detecting an object to be classified in an image to be processed to obtain a detection result;

intercepting the object to be classified from the image to be processed according to the detection result to obtain an image of the object to be classified;

adding a margin with a preset proportion into the object images to be classified so as to enable the size of each object image to be classified to be the same;

and performing feature extraction on the image of the object to be classified after the edge distance is added to obtain a feature map.

In an embodiment of the present application, after the object to be classified is intercepted from the image to be processed according to the detection result to obtain an image of the object to be classified, the method further includes:

if the image of the object to be classified comprises M objects to be classified, carrying out image segmentation on the image of the object to be classified to obtain M sub-images, wherein each sub-image comprises one object to be classified; wherein M is a natural number greater than or equal to 2;

correspondingly, the adding of the margins with the preset proportion to the object image to be classified comprises: adding a predetermined proportion of margin to each of the sub-images;

correspondingly, the step of performing feature extraction on the image of the object to be classified after the margin is added to obtain a feature map includes: and performing feature extraction on each sub-image added with the edge distance to obtain a feature map.

In an embodiment of the present application, the method further includes:

preprocessing each sample image in the sample image set to obtain a training sample set;

and training the convolutional neural network model by using the training sample set to obtain the trained convolutional neural network model.

In this embodiment of the present application, the preprocessing each sample image in the sample image set to obtain the training sample set includes:

changing the illumination intensity of each sample image in the sample image set according to a preset illumination intensity interval to obtain first data of the distribution of the illumination intensity of each sample image in the interval;

changing the illumination intensity of each sample image according to a preset contrast interval to obtain second data of the distribution of the contrast of the sample images in the interval;

and performing data enhancement processing on the sample image according to the first data and the second data to obtain a training sample set.

and (3) performing feature extraction on the image to be processed by utilizing an inclusion V4 convolutional neural network model to obtain a feature map.

In a second aspect, an embodiment of the present application provides a classification apparatus, including:

the characteristic extraction unit is used for extracting the characteristics of the image to be processed to obtain a characteristic diagram;

the determining unit is used for determining the weight of each characteristic channel in the characteristic map;

the recalibration unit is used for recalibrating the characteristics of the characteristic diagram according to the weight;

and the classification unit is used for classifying the image to be processed by utilizing the re-calibrated characteristics.

In a third aspect, an embodiment of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program executable on the processor, and the processor executes the computer program to implement the steps in the classification method described above.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps in the classification method as described above.

The embodiment of the application provides a classification method, a classification device, classification equipment and a storage medium, wherein a feature map is obtained by performing feature extraction on an image to be processed; determining the weight of each characteristic channel in the characteristic diagram; recalibrating the features of the feature map according to the weights; the images to be processed are classified by using the re-calibrated features, so that the self-adaptive calibration of the feature channels can be realized, the accuracy of classification is further ensured, the recognition and classification efficiency is improved, the time cost is greatly reduced, and the reliability is high.

Drawings

Fig. 1A is a schematic flow chart illustrating a first implementation of the classification method according to the embodiment of the present application;

FIG. 1B is a schematic diagram illustrating a second implementation flow of the classification method according to the embodiment of the present application;

FIG. 2 is a third schematic flow chart illustrating an implementation of the classification method according to the embodiment of the present application;

fig. 3A is a schematic flow chart illustrating an implementation of the classification method according to the embodiment of the present application;

FIG. 3B is a schematic diagram of a classification network model according to an embodiment of the present application;

fig. 3C is a schematic flow chart illustrating an implementation of the classification method according to the embodiment of the present application;

FIG. 4 is a schematic diagram illustrating a structure of a sorting apparatus according to an embodiment of the present application;

fig. 5 is a hardware entity diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the following will describe the specific technical solutions of the present application in further detail with reference to the accompanying drawings in the embodiments of the present application. The following examples are intended to illustrate the present application only and are not intended to limit the scope of the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning by themselves. Thus, "module", "component" or "unit" may be used mixedly.

The embodiment of the present application provides a classification method, which is applied to computer equipment, and the functions implemented by the method can be implemented by a processor in a server calling a program code, where of course, the program code can be stored in a computer storage medium, and thus, the server at least includes a processor and a storage medium. Fig. 1A is a first schematic flow chart illustrating an implementation process of a classification method according to an embodiment of the present application, as shown in fig. 1A, the method includes:

s101, extracting features of an image to be processed to obtain a feature map;

here, the computer device may be various types of devices having information processing capability, such as a mobile phone, a PDA (Personal Digital Assistant), a navigator, a Digital phone, a video phone, a smart watch, a smart band, a wearable device, a tablet computer, a kiosk, and the like.

In some embodiments, the step S101 of performing feature extraction on the image to be processed to obtain a feature map includes: and (3) performing feature extraction on the image to be processed by utilizing an inclusion V4 convolutional neural network model to obtain a feature map.

Here, google lenet is a deep convolutional neural network, which has a structure of 22 layers, and is advanced in that an inclusion structure is applied. The inclusion structure increases in both the width and depth of the network without increasing the computational load. Meanwhile, in order to achieve a better effect on the quality of the network, google lenet adopts a Hebbian (Hebbian theory) principle and multi-scale processing, and obtains better results in both classification and detection aspects.

The assembly V4 in GoogleLeNet is evolved on the basis of assembly V1, assembly V2 and assembly V3, improves the structure of the assembly V3, and combines an assembly module with ResNet (Residual error network) by utilizing Residual error Connection. The ResNet structure greatly deepens the network depth, greatly improves the training speed and simultaneously improves the performance.

That is, inclusion V4 has several characteristics: first, a deep structure with 22 layers in depth, in order to prevent the gradient from disappearing, a loss function is added at different positions. Secondly, the width aspect is added with various kernels of 1 × 1, 3 × 3, 5 × 5 and pooling layers, and 1 × 1 convolution kernels are respectively added after the 3 × 3 kernels, the 5 × 5 kernels and the pooling layers, so that the effect of reducing the thickness of the feature map is achieved. Therefore, the Incepton V4 is used for extracting the features of the image to be processed, so that the accuracy of feature extraction is ensured, the speed of feature extraction is also ensured, and the accuracy and the speed of image classification are improved.

S102, determining the weight of each characteristic channel in the characteristic diagram;

in the embodiment of the application, each image to be processed is subjected to feature extraction to obtain a plurality of feature maps. On this basis, the weight of each feature channel of each feature map of the plurality of feature maps needs to be determined.

Step S103, recalibrating the characteristics of the characteristic diagram according to the weight;

in the embodiment of the present application, the features of the corresponding feature map need to be recalibrated according to the weight of each feature channel of each feature map, so as to improve the final classification accuracy.

And S104, classifying the images to be processed by using the re-calibrated characteristics.

In the embodiment of the application, a feature map is obtained by extracting features of an image to be processed; determining the weight of each characteristic channel in the characteristic diagram; recalibrating the features of the feature map according to the weights; the images to be processed are classified by using the re-calibrated features, so that the self-adaptive calibration of the feature channels can be realized, the accuracy of classification is further ensured, the recognition and classification efficiency is improved, the time cost is greatly reduced, and the reliability is high.

Based on the foregoing embodiments, an embodiment of the present application further provides a classification method, and fig. 1B is a schematic flow chart illustrating an implementation of the classification method according to the embodiment of the present application, as shown in fig. 1B, the method includes:

s111, extracting features of the image to be processed by using each convolutional layer in the trained convolutional neural network model to obtain a feature map corresponding to each convolutional layer;

step S112, determining the weight of each characteristic channel in the characteristic diagram corresponding to each convolution layer;

in the embodiment of the present application, the weight of each feature channel in the feature map corresponding to each convolutional layer needs to be determined, that is, the feature map corresponding to each convolutional layer needs to be re-calibrated.

Step S113, carrying out normalization processing on the weight of each characteristic channel;

here, the normalization process is performed on the weight of each feature channel in order to map the variable between (0, 1).

Step S114, weighting the normalized weight to each characteristic channel to realize the recalibration of the characteristics of the characteristic graph corresponding to each convolution layer;

here, weighting the normalized weight to each feature channel to achieve re-calibration of the feature map corresponding to each convolution layer may be performed in the following manner: and multiplying the feature of each feature channel by the corresponding normalized weight, and adding the feature of each feature channel to realize the recalibration of the feature map corresponding to each convolution layer.

And S115, classifying the images to be processed by using the re-calibrated characteristics.

In the embodiment of the application, each convolutional layer in the trained convolutional neural network model is utilized to extract the characteristics of the image to be processed, so as to obtain a characteristic diagram corresponding to each convolutional layer; determining the weight of each characteristic channel in the characteristic diagram corresponding to each convolution layer; carrying out normalization processing on the weight of each characteristic channel; weighting the normalized weight to each characteristic channel to realize the recalibration of the characteristics of the characteristic graph corresponding to each convolution layer; the images to be processed are classified by using the re-calibrated features, so that the self-adaptive calibration of the feature channels can be realized, the accuracy of classification is further ensured, the recognition and classification efficiency is improved, the time cost is greatly reduced, and the reliability is high.

Based on the foregoing embodiments, an embodiment of the present application further provides a classification method, where the method includes:

step S121, extracting the features of the image to be processed to obtain a feature map;

step S122, performing global pooling on the feature map to realize feature compression of spatial dimensions;

here, the step S122 of performing global pooling on the feature map to implement feature compression of spatial dimensions may be implemented by: feature compression is performed along the spatial dimension, each two-dimensional feature channel is changed into a real number, the real number has a global receptive field to some extent, and the output dimension is matched with the number of input feature channels. It characterizes the global distribution of responses over the eigen-channels and makes it possible to obtain a global receptive field also for layers close to the input.

S123, constructing correlation among the feature channels corresponding to the compressed features, and generating weight for each feature channel according to the correlation among the feature channels;

step S124, recalibrating the characteristics of the characteristic diagram according to the weight;

here, the steps S123 and S124 may be implemented as follows: generating a weight for each feature channel through parameters, wherein the parameters are learned to explicitly model the correlation among the feature channels, finally, regarding the weight output in the last step as the importance of each feature channel after feature selection, and then weighting the previous features channel by channel through multiplication to complete the recalibration of the original features in the channel dimension.

And step S125, classifying the images to be processed by using the re-calibrated characteristics.

s131, detecting an object to be classified in the image to be processed to obtain a detection result;

for example, in a fruit sorting task, the image to be processed may be an image of a lot of fruit placed in a fruit tray placed on a table. Then the object to be classified in the image to be processed is detected, which may be all fruits in the fruit tray in the image.

Step S132, intercepting the object to be classified from the image to be processed according to the detection result to obtain an image of the object to be classified;

for example, the object to be classified may be all the fruits in the detected fruit tray, and the object to be classified is cut out from the image to be processed according to the detection result to obtain an image of the object to be classified, or the detected fruit tray may be all the fruits, and the cut-out image including only all the fruits in the fruit tray is used as the image of the object to be classified.

Step S133, adding margins with a preset proportion into the images of the objects to be classified so as to enable the sizes of the images of the objects to be classified to be the same;

the sizes of the to-be-classified object images intercepted from different to-be-processed images are different to a large extent, so that the edge distances with preset proportions can be added to all the intercepted to-be-classified object images, the sizes of all the to-be-classified object images are ensured to be the same, the input images with any sizes can be classified, and the final classification accuracy of the images is improved through the detection and the segmentation of the to-be-processed images.

S134, extracting the features of the image of the object to be classified after the edge distance is added to obtain a feature map;

step S135, determining the weight of each characteristic channel in the characteristic diagram;

s136, recalibrating the characteristics of the characteristic diagram according to the weight;

and S137, classifying the images to be processed by using the re-calibrated characteristics.

step S141, detecting an object to be classified in an image to be processed to obtain a detection result;

step S142, intercepting the object to be classified from the image to be processed according to the detection result to obtain an image of the object to be classified;

for example, in the fruit sorting task, the image to be processed may be an image in which a lot of fruits are placed in a fruit tray and the fruit tray is placed on a table, and the image to be sorted may be an image in which only all fruits in the fruit tray are cut out.

Step S143, if the image of the object to be classified comprises M objects to be classified, performing image segmentation on the image of the object to be classified to obtain M sub-images, wherein each sub-image comprises one object to be classified; wherein M is a natural number greater than or equal to 2;

for example, if a lot of fruits such as apples, pears and bananas exist in the object image to be classified, image segmentation is performed on the object image to be classified to obtain three sub-images, wherein the first sub-image comprises an apple, the second sub-image comprises a pear, and the third sub-image comprises a banana. Therefore, the identification of the object image to be classified is changed into the identification of three sub-images respectively, and the accuracy of classification is improved.

Step S144, adding a margin with a preset proportion into each sub-image to enable the size of each object image to be classified to be the same;

step S145, extracting the characteristics of each sub-image added with the edge distance to obtain a characteristic diagram;

step S146, determining the weight of each characteristic channel in the characteristic diagram;

step S147, recalibrating the characteristics of the characteristic diagram according to the weight;

and S148, classifying the images to be processed by using the re-calibrated characteristics.

step S201, preprocessing each sample image in a sample image set to obtain a training sample set;

here, the preprocessing for each sample image in the sample image set may include performing the processing of step S141 to step S144 for each sample image in the sample image set, and of course, may include other image preprocessing.

Step S202, training the convolutional neural network model by utilizing the training sample set to obtain a trained convolutional neural network model;

step S203, extracting the characteristics of the image to be processed by utilizing each convolutional layer in the trained convolutional neural network model to obtain a characteristic diagram corresponding to each convolutional layer;

step S204, determining the weight of each characteristic channel in the characteristic diagram corresponding to each convolution layer;

step S205, carrying out normalization processing on the weight of each characteristic channel;

step S206, weighting the normalized weight to each characteristic channel to realize the recalibration of the characteristics of the characteristic graph corresponding to each convolution layer;

and S207, classifying the images to be processed by using the re-calibrated characteristics.

Based on the foregoing embodiments, an embodiment of the present application further provides a classification method, and fig. 2 is a schematic flow chart illustrating implementation of the classification method according to the embodiment of the present application, where as shown in fig. 2, the method includes:

step S211, transforming the illumination intensity of each sample image in the sample image set according to a preset illumination intensity interval to obtain first data of the illumination intensity of each sample image in interval distribution;

step S212, the illumination intensity of each sample image is changed according to a preset contrast interval, and second data of the distribution of the contrast of the sample images in the interval are obtained;

step S213, according to the first data and the second data, performing data enhancement processing on the sample image to obtain a training sample set;

here, the sample image may be data-enhanced by the operations in step S211 to step S213 to expand the data amount of the sample image in the sample image set.

Step S214, training the convolutional neural network model by using the training sample set to obtain a trained convolutional neural network model;

s215, extracting the characteristics of the image to be processed by utilizing each convolutional layer in the trained convolutional neural network model to obtain a characteristic diagram corresponding to each convolutional layer;

step S216, determining the weight of each characteristic channel in the characteristic diagram corresponding to each convolution layer;

s217, normalizing the weight of each characteristic channel;

step S218, weighting the normalized weight to each characteristic channel to realize the recalibration of the characteristics of the characteristic graph corresponding to each convolution layer;

and S219, classifying the images to be processed by using the re-calibrated characteristics.

In the embodiment of the application, the illumination intensity of each sample image in the sample image set is changed according to a preset illumination intensity interval, so that first data of the distribution of the illumination intensity of each sample image in the interval is obtained; changing the illumination intensity of each sample image according to a preset contrast interval to obtain second data of the distribution of the contrast of the sample images in the interval; performing data enhancement processing on the sample image according to the first data and the second data to obtain a training sample set; training the convolutional neural network model by using the training sample set to obtain a trained convolutional neural network model; performing feature extraction on the image to be processed by utilizing each convolutional layer in the trained convolutional neural network model to obtain a feature map corresponding to each convolutional layer; determining the weight of each characteristic channel in the characteristic diagram corresponding to each convolution layer; carrying out normalization processing on the weight of each characteristic channel; weighting the normalized weight to each characteristic channel to realize the recalibration of the characteristics of the characteristic graph corresponding to each convolution layer; the images to be processed are classified by using the re-calibrated features, so that the self-adaptive calibration of the feature channels can be realized, the accuracy of classification is further ensured, the recognition and classification efficiency is improved, the time cost is greatly reduced, and the reliability is high. Meanwhile, the generalization and stability of the model are improved through data enhancement processing.

Based on the foregoing embodiment, in order to overcome, at least to a certain extent, a problem that factors such as environment, posture and illumination of fruit image data obtained when a deep learning model performs fruit image classification affect a classification result, an embodiment of the present application further provides a classification method, where fig. 3A is a schematic diagram of an implementation flow of the classification method according to the embodiment of the present application, and as shown in fig. 3A, the method includes:

s301, acquiring image data;

here, the acquiring of the image data includes: sample data to be classified containing a certain fruit target is obtained.

Step S302, preprocessing the image data, wherein the preprocessing at least comprises data enhancement;

in this embodiment of the application, the step S302 is to perform preprocessing on the image data, where the preprocessing at least includes data enhancement, and may be implemented in the following manner:

s3021, framing a single fruit category in the sample data by using a target detection model, cutting the single fruit category, and taking the cut image as a target image;

step S3022, adding an edge distance of a predetermined proportion to the target image, and ensuring that the size of the image is 299 × 299;

step S3023, carrying out normalization processing on the target image added with the edge distance, normalizing the pixel value of the target image from [0,255] to [0,1], and removing redundant information contained in sample data to be trained;

and S3024, performing data enhancement processing on the normalized target image.

The data enhancement processing can be realized by the following method:

step S31, converting the normalized illumination intensity of the target image according to a preset illumination intensity interval to obtain first data of random distribution of the illumination intensity of each normalized target image in the interval;

step S32, converting the illumination intensity of the normalized target image according to a preset contrast interval to obtain second data of each normalized target image with the contrast randomly distributed in the interval;

step S33, cutting the normalized target image according to a preset ratio according to the first data and the second data, and adjusting the target image to 224 × 224;

step S34, the clipped target image is flipped in the horizontal direction so that the image data is expanded twice as much as the original image data.

Step S303, dividing the preprocessed image data into a training set and a test set, and calibrating the data;

in the embodiment of the application, the preprocessed image data is proportionally divided into a training set and a test set, wherein the training set is used for training a classification model, and the test set is used for testing the classification model. And after dividing the image data into a training set and a testing set, performing label determination on sample images in the training set and the testing set.

Here, the image data may include a single fruit, a plurality of fruits, and peeled fruits, etc.

S304, adding an SE channel attention module into each layer of the inclusion V4 convolutional neural network to form a fruit classification model based on the inclusion V4-SE;

in the embodiment of the application, the core idea of the SE (squeeze excitation) channel attention module (i.e., SE channel attention mechanism) is to learn the feature weights according to a loss function through a network, so that the effective feature map has a large weight, and the training model with an invalid or small effect has a small weight to achieve a better result. The SE channel attention module is not a complete network structure but a substructure, and can be embedded into other classification models, and based on the structure, the Incep V4 is used as a basic network to be added into the SE channel attention module. The SE channel attention module mainly comprises three operations, wherein one operation is a squeezing operation, namely feature compression is carried out along the spatial dimension, each two-dimensional feature channel is changed into a real number, the real number has a global receptive field to some extent, and the output dimension is matched with the input feature channel number. It characterizes the global distribution of responses over the eigen-channels and makes it possible to obtain a global receptive field also for layers close to the input. The other is the firing operation, similar to the mechanism of gates in a recurrent neural network, which generates weights for each eigen-channel by parameters that are learned to model the correlation between eigen-channels explicitly. And the other is a re-weighting operation, the weight of the output of the excitation operation is considered as the importance of each feature channel after feature selection, and then the original features are re-calibrated in the channel dimension by multiplying the feature channels by channels and weighting the feature channels to the previous features.

Fig. 3B is a schematic structural diagram of a classification network model according to an embodiment of the present application, and as shown in fig. 3B, the diagram is an example of embedding an SE module into an inclusion structure, and dimension information beside each operation represents an output of the layer. Here, total pooling is used as the pressing operation. Then two fully connected layers are used to form a Bottleneck structure to model the correlation between channels, and the weights of the same number of input features are output. The feature dimensions are first reduced to 1/16 for input, then passed through ReLuThe (Rectified Linear Unit, Linear rectification) function is activated and then raised back to the original dimension through the full connection layer. This allows for more non-linearity, better fitting of complex correlations between channels, and greatly reduces the number of parameters and computations. Then, obtaining a normalized weight between 0 and 1 through a gate of a Sigmoid function, and finally weighting the normalized weight to the characteristics of each channel through a recalibration operation. Wherein x is the input of a certain layer in the Incepton V4 convolutional neural network, c is the number of channels, h is the height of the image, and w is the width of the image.

The characteristics are recalibrated.

S305, training the fruit classification model of the inclusion V4-SE by using the training set;

here, training data is input to the convolutional neural network for training to generate a trained convolutional neural network model. The convolutional neural network adopted in the embodiment of the application is an inclusion V4, and the network is based on an inclusion basic structure and greatly improves the network performance through multiple convolution and nonlinear change.

And S306, verifying the trained inclusion V4-SE fruit classification model by using the test set.

And sending the test data into a trained fruit classification network for testing, and verifying the accuracy of the model. In some embodiments, the sample with the error judgment in the test data set is sent to the network model again for fine adjustment, so that the generalization of the model is improved.

In the embodiment of the application, a Softmax classifier is selected as an output layer of the convolutional neural network model, the selected loss function is a cross entropy loss function, and then a result calculated by the loss function is input into a gradient descent algorithm, for example, an Adam gradient descent algorithm is adopted.

In some embodiments, the training set is input into the convolutional neural network model during the training process and iterated for a preset number of times, which may be set to 90 times, and the Adam gradient descent algorithm is used to optimize the objective function during each iterative computation process.

Based on the foregoing embodiment, an embodiment of the present application further provides a classification method, and fig. 3C is a schematic flow chart of implementing the classification method according to the embodiment of the present application, as shown in fig. 3C, the method includes:

step S311, acquiring a current data frame in the photo or video stream;

here, the obtaining of the current data frame in the photo or video stream may be obtaining an object to be predicted that includes a fruit target of a certain type.

In this embodiment of the application, a camera may be used to obtain a picture or a video, or a local picture in a PC (personal computer) may be directly called. In order to improve general applicability, in an embodiment of the present application, a user input may be received or a stored dynamic image, such as a video, may be shot, but processing is performed to extract only a specific frame (e.g., a first frame) of the dynamic image as an image input by the user. For example, a camera module may be called to obtain a camera video, a frame image may be obtained from the camera video, if it is detected that a fruit is shot by the camera, the frame image may be displayed, a fruit target may be framed, and positioning of the fruit target of an individual category may be completed. Directly calling the local picture in the PC, comprising the following steps: and loading the local picture, if detecting that a certain type of fruit target exists in the local picture, displaying the picture, framing the fruit target and completing fruit target positioning.

Step S312, capturing a fruit target from the photo or the current data frame and cutting;

here, the fruit target area of the object to be recognized can be obtained and cut, and then the margin with a predetermined proportion is added to the fruit target area, so that the sizes of the input pictures are consistent.

Step S313, preprocessing the cut image;

in the embodiment of the application, normalization processing can be performed on the image to be predicted, and redundant information in the image to be predicted is removed.

S314, inputting the preprocessed image into a trained inclusion V4-SE fruit classification model;

and S315, acquiring a corresponding fruit type according to an output result of the inclusion V4-SE fruit classification model.

Here, the output of the trained inclusion V4-SE fruit classification model may be a specific numerical value from 0 to 74, where each numerical value corresponds to a fruit class and represents a classification result.

In the embodiment of the application, the algorithm system defines a convolutional neural network model by using TensorFlow, which is an open source code software library and is used for high-performance numerical calculation. With its flexible architecture, computing work can be easily deployed to a variety of platforms and devices (desktop devices, server clusters, mobile devices, edge devices, etc.).

In the embodiment of the application, a fruit image classification algorithm and a system based on deep learning and attention mechanism are provided, classification of input images with any size can be achieved, the method depends on a fruit image classification model based on a neural network and the attention mechanism, recognition and classification efficiency is improved, a large amount of time cost is reduced, and reliability is high. Meanwhile, a channel attention mechanism is introduced, and a network selectively enhances the beneficial feature channels and inhibits the useless feature channels by utilizing global information by adding an SE channel attention module, so that the self-adaptive calibration of the feature channels can be realized, and the classification accuracy is further ensured.

In the embodiment of the application, a fruit classification algorithm and a fruit classification system are provided, fruit detection is added, the position of a fruit target is firstly positioned, the generalization and stability of a model are improved through data enhancement, meanwhile, a channel attention mechanism is introduced on the basis of an inclusion V4 convolutional neural network, and a fruit classification scheme based on deep learning is realized. The importance degree of each feature channel is automatically acquired in a learning mode, then useful features are enhanced according to the importance degree, the features with low use for the current task are inhibited, and the adverse effect caused by the influence of the definition, brightness, contrast and the like of an image on the conventional RGB image classification method is overcome.

Here, the multi-class fruit identification has wide application value in practice, and the self-service fruit purchase can be realized in a supermarket by utilizing the multi-class fruit image identification. Categorised discernment of multiclass fruit can also reduce the human cost on the production line, improves production efficiency. In addition, fruit image classification also has certain research significance in the fields of intelligent agriculture and digital health medical treatment, and in the aspect of intelligent agriculture, automatic picking of fruits can be carried out through recognition of fruit images. In the aspect of digital medical treatment, the nutritional ingredients contained in the fruit can be further obtained on the basis of fruit classification, and the reasonable diet matching can be formulated by the patient in the later recovery.

Based on the foregoing embodiments, the present application provides a classification apparatus, where the apparatus includes units, modules included in the units, and components included in the modules, and may be implemented by a processor in a computer device; of course, the implementation can also be realized through a specific logic circuit; in the implementation process, the processor may be a CPU (Central Processing Unit), an MPU (Microprocessor Unit), a DSP (Digital Signal Processing), an FPGA (Field Programmable gate array), or the like.

Fig. 4 is a schematic structural diagram of a classification apparatus according to an embodiment of the present application, and as shown in fig. 4, the apparatus 400 includes:

a feature extraction unit 401, configured to perform feature extraction on an image to be processed to obtain a feature map;

a determining unit 402, configured to determine a weight of each feature channel in the feature map;

a recalibration unit 403, configured to recalibrate the features of the feature map according to the weights;

a classifying unit 404, configured to classify the image to be processed by using the re-calibrated features.

In some embodiments, the feature extraction unit 401 includes: the feature extraction module is used for extracting features of the image to be processed by utilizing each convolutional layer in the trained convolutional neural network model to obtain a feature map corresponding to each convolutional layer;

correspondingly, the determining unit 402 includes: the determining module is used for determining the weight of each characteristic channel in the characteristic diagram corresponding to each convolution layer;

correspondingly, the recalibration unit 403 includes:

the normalization module is used for normalizing the weight of each characteristic channel;

and the recalibration module is used for weighting the normalized weight to each characteristic channel to realize recalibration of the characteristics of the characteristic diagram corresponding to each convolution layer.

In some embodiments, the determining unit 402 includes:

the pooling module is used for performing global pooling on the feature map to realize feature compression of spatial dimensions;

and the weight generation module is used for constructing the correlation among the characteristic channels corresponding to the compressed characteristics and generating the weight for each characteristic channel according to the correlation among the characteristic channels.

In some embodiments, the feature extraction unit 401 includes:

the detection module is used for detecting the object to be classified in the image to be processed to obtain a detection result;

the intercepting module is used for intercepting the object to be classified from the image to be processed according to the detection result to obtain an image of the object to be classified;

the adding module is used for adding margins with a preset proportion into the images of the objects to be classified so as to enable the sizes of the images of the objects to be classified to be the same;

and the extraction module is used for extracting the features of the image of the object to be classified after the edge distance is added to obtain a feature map.

In some embodiments, the feature extraction unit 401 further includes:

the segmentation module is used for carrying out image segmentation on the image of the object to be classified to obtain M sub-images if the image of the object to be classified comprises M objects to be classified, and each sub-image comprises one object to be classified; wherein M is a natural number greater than or equal to 2;

correspondingly, the adding module comprises: adding means for adding a predetermined proportion of margins to each of the sub-images;

correspondingly, the extraction module comprises: and the extraction component is used for extracting the characteristics of each sub-image added with the edge distance to obtain a characteristic diagram.

In some embodiments, the apparatus further comprises:

the preprocessing unit is used for preprocessing each sample image in the sample image set to obtain a training sample set;

and the training unit is used for training the convolutional neural network model by using the training sample set to obtain the trained convolutional neural network model.

In some embodiments, the pre-processing unit comprises:

the first transformation module is used for transforming the illumination intensity of each sample image in the sample image set according to a preset illumination intensity interval to obtain first data of the illumination intensity of each sample image in interval distribution;

the second transformation module is used for transforming the illumination intensity of each sample image according to a preset contrast interval to obtain second data of the distribution of the contrast of the sample images in the interval;

and the processing module is used for performing data enhancement processing on the sample image according to the first data and the second data to obtain a training sample set.

In some embodiments, the feature extraction unit 401 includes:

and the feature extraction subunit is used for extracting features of the image to be processed by utilizing an inclusion V4 convolutional neural network model to obtain a feature map.

The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that, in the embodiment of the present application, if the classification method is implemented in the form of a software functional module and sold or used as a standalone product, the classification method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing an electronic device (which may be a personal computer, a server, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a ROM (Read Only Memory), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, the present application provides a computer device, including a memory and a processor, where the memory stores a computer program executable on the processor, and the processor executes the program to implement the steps in the classification method provided in the foregoing embodiments.

Correspondingly, the embodiment of the present application provides a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the classification method described above.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that fig. 5 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application, and as shown in fig. 5, the hardware entity of the computer device 500 includes: a processor 501, a communication interface 502 and a memory 503, wherein

The processor 501 generally controls the overall operation of the computer device 500.

The communication interface 502 may enable the computer device 500 to communicate with other terminals or servers via a network.

The Memory 503 is configured to store instructions and applications executable by the processor 501, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 501 and modules in the computer device 500, and may be implemented by FLASH Memory or RAM (Random Access Memory).

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing module, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit. Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media capable of storing program codes, such as a removable Memory device, a ROM (Read-only Memory), a RAM (Random Access Memory), a magnetic disk, and an optical disk.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of classification, the method comprising:

extracting the features of the image to be processed to obtain a feature map;

recalibrating the features of the feature map according to the weights;

2. The method according to claim 1, wherein the extracting features of the image to be processed to obtain a feature map comprises:

3. The method of claim 1, wherein determining the weight for each eigen channel in the eigen map comprises:

4. The method according to claim 1, wherein the extracting features of the image to be processed to obtain a feature map comprises:

5. The method according to claim 4, wherein after the object to be classified is intercepted from the image to be processed according to the detection result, and an image of the object to be classified is obtained, the method further comprises:

6. The method of claim 2, further comprising:

7. The method of claim 6, wherein preprocessing each sample image in the sample image set to obtain a training sample set comprises:

8. The method according to any one of claims 1 to 7, wherein the extracting features of the image to be processed to obtain a feature map comprises:

9. A sorting apparatus, characterized in that the apparatus comprises:

10. A computer device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the classification method according to any one of claims 1 to 8 when executing the program.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the classification method according to any one of claims 1 to 8.