Disclosure of Invention
In order to solve the problems, the invention provides a large-scale food material image classification method based on transfer learning, a convolutional neural network model (CNN) is improved based on the transfer learning method, a primer-Tree CNN model is provided for learning class structures and classifier parameters, and the improved model has high efficiency, high accuracy and universality.
The invention discloses a large-scale food material image classification method based on transfer learning, which comprises the following steps of: inputting an original picture, and outputting information containing two aspects through the transfer learning knowledge, wherein one is the identification result of the receiving environment, and the other is the identification result of the food material type; after forward propagation of the multitasking based Prior-Tree CNN model, the joint features generated by the last fully connected layer will be input into different classification tasks.
Further, the classification method specifically comprises the following steps: adding an artificially set tree structure on the multi-task classification CNN model, firstly setting a parent node according to the tree structure, then enabling a newly added subclass to directly obtain the characteristics of the parent, and training under the condition; therefore, even if the data is less, the newly added subclasses can have good effects; the model can also be viewed as a specific softmax penalty for different tasks; after optimizing the joint loss layer and sharing the visual features, the CNN model will deliver its relevant parameters in the back propagation process; marking whether the collected food material images are pictures in a clean goods receiving environment by using c e to {0, 1}, and marking the food material types by using a multi-type label K to e to { 1.. K }, wherein K is the number of all the types; finally, training the whole network and parameters iteratively until convergence;
in the actual process, the learning process is almost the same as the standard gradient reduction, and during testing, whether the image is a clean receiving environment picture is verified firstly; if the image is predicted to be a non-clean image, the system will take a new image to continue the process until a clean image is found within a given time; second, if the image is predicted to be a vegetable category, the model will filter other labels in the vegetable in the given order; finally, through the Priori Tree CNN model combined with the information of orders, weight and the like, the clean image can obtain the food material category and the predicted score.
Still further, the formula expression of the color-Tree CNN model is as follows:
firstly, supposing that a prior tree is provided, all food categories are composed of three layers of trees, the prior tree has K +1 leaf nodes, and K food category labels and 1 neglected label correspond to the prior tree respectively; the leaf nodes are connected with 7 father nodes of the second layer, the K food material categories have 4 fathers and comprise S1, S2, S3 and S4, and the receiving environment categories are divided into 3 groups comprising N1, N2 and N3; for the classification, if the input images are classified into vegetables with high confidence level, other categories in the order are filtered for prediction, the categories such as meat, aquatic products and the like do not need to be concerned, and the accuracy rate of the food material category identification task is further improved;
in order to simplify the Priori Tree model, only the relationships between the labels in the K class and the four parent nodes thereof are concerned; before the softmax function, the leaf tag node k is associated with a weight vector
Associating; each super class node s and vector
Associating, wherein s belongs to {1, 2, 3, 4 }; e.g. beta
cabbageAnd beta
carrotRecord and alpha
vegatableThe deviation therebetween; the following generative model is defined for β:
this prior expression describes the relationship between classes, with the conditional distribution over k being expressed as:
inferring the value of { W, β, α } from MAP, maximizing:
logP(k|I,t,W,β)+logP(k|I,W,β)+logP(W)+logP(β|α)+logP(α); (3)
from the point of view of the loss function, minimize:
here, by fixing the value of α to 0, the loss function will be reduced to the standard loss function, let
CsThen { k | paramt (k) ═ s }, then
Therefore, the loss function in equation (4) is optimized by iteratively performing the following two steps; first, W and β are maximized, α is fixed using a standard random gradient descent (SGD) of a standard loss function; second, β is fixed by equation (5), thereby maximizing α.
Compared with the Prior art, the migration learning-based large-scale food material image classification method improves the convolutional neural network model based on the migration learning method, and provides the Prior-Tree CNN model to learn the class structure and the classifier parameters, and the improved model has the following advantages: the method has high efficiency, provides a multitask CNN model to classify honest/dishonest environment images so as to reduce noise in a Meal-53 data set and more effectively utilize the characteristics of training samples; the accuracy is high, and the improved model can correctly identify the category when the data training set is less; generally, the improved CNN model solves the problem through learning, is not limited to a specific problem, and can automatically build a model according to the problem, thereby solving similar problems.
Detailed Description
The invention discloses a large-scale food material image classification method based on transfer learning, which comprises the following steps of: in order to enhance the effect of the model on classification and enable the model to correctly identify the class when the training set of data is less; a primitive picture Ii is input, and information containing two aspects is output through the knowledge of transfer learning, wherein one is an identification result of a receiving environment, and the other is an identification result of a food material type; after forward propagation of the multitasking based Prior-Tree CNN model, the joint features generated by the last fully connected layer will be input into different classification tasks.
The classification method specifically comprises the following steps: adding an artificially set tree structure on the multi-task classification CNN model, firstly setting a parent node according to the tree structure, then enabling a newly added subclass to directly obtain the characteristics of the parent, and training under the condition; therefore, even if the data is less, the newly added subclasses can have good effects; the model can also be viewed as a specific softmax penalty for different tasks; after optimizing the joint loss layer and sharing the visual features, the CNN model will deliver its relevant parameters in the back propagation process; marking whether the collected food material images are pictures in a clean goods receiving environment by using c e to {0, 1}, and marking the food material types by using a multi-type label K to e to { 1.. K }, wherein K is the number of all the types; finally, training the whole network and parameters iteratively until convergence;
in the actual process, the learning process is almost the same as the standard gradient reduction, and during testing, whether the image is a clean receiving environment picture is verified firstly; if the image is predicted to be a non-clean image, the system will take a new image to continue the process until a clean image is found within a given time; second, if the image is predicted to be a vegetable category, the model will filter other labels in the vegetable in the given order; finally, through the Priori Tree CNN model combined with the information of orders, weight and the like, the clean image can obtain the food material category and the predicted score.
The formula expression mode of the primer-Tree CNN model is as follows:
firstly, supposing that a prior tree is provided, all food categories are composed of three layers of trees, as shown in fig. 2, the prior tree has K +1 leaf nodes, and corresponding to K food category labels and 1 neglected label respectively; the leaf nodes are connected with 7 father nodes of the second layer, the K food material categories have 4 fathers, namely are divided into 4 groups (S1, S2, S3 and S4), and the receiving environment categories are divided into 3 groups (N1, N2 and N3); for the classification, if the input images are classified into vegetables with high confidence level, other categories in the order are filtered for prediction, the categories such as meat, aquatic products and the like do not need to be concerned, and the accuracy rate of the food material category identification task is further improved;
in order to simplify the Priori Tree model, only the relationships between the labels in the K class and the four parent nodes thereof are concerned; before the softmax function, the leaf tag node k is associated with a weight vector
Associating; each super class node s and vector
Associating, wherein s belongs to {1, 2, 3, 4 }; e.g. beta
cabbageAnd beta
carrotRecord and alpha
vegatableThe deviation therebetween; the following generative model is defined for β:
this prior expression describes the relationship between classes, with the conditional distribution over k being expressed as:
inferring the value of { W, β, α } from MAP, maximizing:
logP(k|I,t,W,β)+logP(k|I,W,β)+logP(W)+logP(β|α)+logP(α); (3)
from the point of view of the loss function, minimize:
here, by fixing the value of α to 0, the loss function will be reduced to the standard loss function, let
CsThen { k | paramt (k) ═ s }, then
Therefore, the loss function in equation (4) is optimized by iteratively performing the following two steps; first, W and β are maximized, α is fixed using a standard random gradient descent (SGD) of a standard loss function; second, β is fixed by equation (5), thereby maximizing α.
Example 1:
training by taking a ResNet-50 multitask experiment as a baseline experiment, and then gradually improving the identification precision of food materials by using an order, a weight characteristic and a transfer learning technology, wherein a Prior-Tree CNN model has the following specific experimental method:
A. the data set is compared with the experimental environment,
the data set originates from Mealcome, which is one of the largest food supply chain platforms in china (www.mealcome.com), serving nearly 1000 restaurants; 15020 pictures of a clean environment and 15557 dirty pictures are selected to construct a Meal-53 data set; the clean pictures are divided into 51 food material categories, and the dirty pictures are divided into 3 categories; all of these data tags are added manually; in the part of the clean picture, the image numbers of each category vary from 106 to 895; for all dirty pictures, "light dark" is merged to "others", so the dirty picture is divided into two parts, the "bag not opened" and "others", the number of pictures is 11382 and 4157 respectively; the ratio of the training set, the test set and the validation set was 70%, 15% and 15%, respectively; due to the imbalance of the data set, in training the ResNet model, the method of oversample was used to ensure that there were 500 images for training and 100 images for testing per class; to train the multitask-based deep CNN model, clean and dirty picture datasets are used;
the clean pictures in the Meal-53 data set contain order information, the number of the order information is 5026, and the number of the food material pictures of each order is between 10 and 35, which is also unbalanced; in each order, the food material image also has weight information; 5026 orders are further divided into a training set, a testing set and a verification set, and the proportion of the three orders is the same as that of the data set; these data sets are used to train Order-based models (Order Dropout, Order weighing and priority Tree), with Order and weight information integrated into the image labels during training, testing and validation;
in the experiment, the depth learning frame uses the modified cafe, the depth learning model selects ResNet-50, and the ResNet-50 is pre-trained to reduce the training time and improve the accuracy; in all models, the weight lr _ mult of the last Innerproduct layer is set to 10.0, and the bias term is set to 20.0 for training; the training and testing batches are both 16, and the momentum is 0.9; all experiments were performed on Intel (R) i7CPU, 32GB RAM and GeForce GTX 1080 Ti;
B. a baseline (baseline) and an assessment method,
a deep learning model ResNet and a multitask CNN are used as baseline to complete two tasks simultaneously;
the data shown in fig. 3 is average accuracy, and for the receiving environment recognition task, the recognition ratio of clean pictures and dirty pictures can be obtained: an integration ratio and a non-integration ratio; these two ratios are True Positive Rate (TPR); for the food material category identification task, the probability of each food material category can be obtained, and the evaluation formulas of Top-k hit rate and recall rate are as follows:
in the formula, NiI ∈ (0, 1...., 50) is the number of the ith food material category, NkiIs the probability of top-k for the ith food material in the test set, n equals 50;
C. the results of the experiments and the analysis thereof,
the experimental results of ResNet, Baseline, Order weighing Model and priority Tree Model are shown in FIGS. 3 and 4, and FIG. 3 shows the experimental results obtained using the 1-crop validation method; it can be concluded that the priority Tree Model further increases the integration rate and non-integration rate by 94.41% and 92.86%, respectively, in the receiving environment recognition task; similarly, FIG. 4 shows that the Priori Tree Model achieved the best results of 99.12% and 99.94% on Top-1 and Top-3 accuracy, and that Top-5 achieved substantially consistent results of 100% on all other models; the experimental results in FIG. 4 were obtained by the 10-crop verification method, and it can be seen that the method has a better effect than the 1-crop verification method, in which it can be seen that the integer ratio and non-integer ratio in the Priori Tree Model are further improved to 94.84% and 94.56%, respectively; fig. 5 shows the recognition results of the Baseline and the Priori Tree models for some food materials, wherein the "Baseline" label under the sample image represents the recognition result of the Baseline (Baseline), and the "Priori Tree" represents the recognition result of the prior Tree Model; the above results indicate that the use of the Priori Tree CNN of the transfer learning can improve the accuracy of food material identification.
According to the large-scale food material image classification method based on the transfer learning, when only a small number of labeled samples exist in a food material data set, a convolutional neural network model (CNN) is improved based on the transfer learning method, a Prior-Tree CNN model is provided for learning food material class structures and classifier parameters, a large amount of existing labeled data of related classes are fully utilized, efficient knowledge transfer of the related classes is learned, and therefore the accuracy and robustness of object classification of a small sample set are improved by using a trained classifier on the large sample set; establishing a Meal-53 small sample data set, wherein the data set comprises 32 vegetable categories, 15 meat and poultry eggs, 3 aquatic food materials, 1 bean food material and a bean food material, and 3 dirty pictures, namely, no bag opening, dark light and the like;
the convolutional neural network model (CNN) is improved based on a transfer learning method, and a primer-Tree CNN model is provided for learning class structures and classifier parameters, and the improved model has the following advantages: the method has high efficiency, provides a multitask CNN model to classify honest/dishonest environment images so as to reduce noise in a Meal-53 data set and more effectively utilize the characteristics of training samples; the accuracy is high, and the improved model can correctly identify the category when the data training set is less; generally, the improved CNN model solves the problem through learning, is not limited to a specific problem, and can automatically build a model according to the problem, thereby solving similar problems.
The above-described embodiments are merely preferred embodiments of the present invention, and all equivalent changes or modifications of the structures, features and principles described in the claims of the present invention are included in the scope of the present invention.