CN108647702B

CN108647702B - Large-scale food material image classification method based on transfer learning

Info

Publication number: CN108647702B
Application number: CN201810332217.0A
Authority: CN
Inventors: 肖光意; 刘欢; 刘毅; 吴淇; 黄宗杰; 陈浩
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2018-04-13
Filing date: 2018-04-13
Publication date: 2021-06-01
Anticipated expiration: 2038-04-13
Also published as: CN108647702A

Abstract

The invention discloses a large-scale food image classification method based on migration learning, comprising the following steps: inputting an original image, and outputting information including two aspects through the knowledge of migration learning, one is the recognition result of the receiving environment , and the second is the recognition result of the type of food; after the forward propagation of the multi-task-based Prior-Tree CNN model, the joint features generated by the last fully connected layer will be input into different classification tasks; A large-scale food image classification method based on transfer learning, the convolutional neural network model is improved based on the transfer learning method, and the Prior-Tree CNN model is proposed to learn the class structure and classifier parameters. The improved model has high efficiency and high accuracy rate and universality.

Description

Large-scale food material image classification method based on transfer learning

Technical Field

The invention relates to a large-scale food material image classification method based on transfer learning, and belongs to the technical field of food material image classification.

Background

With the rapid development of deep learning, the image classification technology has developed into a more popular research direction in the computer vision field at home and abroad, and is applied to more and more fields; based on an actual scene, how to construct a method for classifying images, and the method with high accuracy and good robustness is an unavoidable core problem in the image classification technology moving to practical application.

The training classifier needs a large number of samples, when the training samples are few, the problem is solved, and for the serious problem of unbalanced category, the inherent structure of a group of classes in the integrity environment data set is used for presenting a priority tree CNN model; for example, four food materials need to be purchased, large red pepper is related to the screw pepper, the small rice pepper and the sharp green pepper, but only one image of the screw pepper is needed, which makes it difficult to train one classifier to distinguish hundreds of pictures of the screw pepper and other three categories; by the marking examples of the related categories, the model can learn a task from one screw pepper example more easily through zero-shot; learning new categories by transferring knowledge from relevant categories, and only knowing the characteristics of the Zanthoxylum piperitum; the traditional image classification comprises methods such as KNN and SVM, the KNN is classified by measuring distances among different characteristic values, and the defects of the KNN comprise large calculated amount, large prediction deviation when a sample is unbalanced and the like; the SVM belongs to supervised learning, is a two-classification model, the basic model of the SVM is a linear classifier with the maximum interval defined on a feature space, and the defects of the SVM comprise that the interpretation force of high-dimensional mapping of a kernel function is not strong, particularly a radial basis function; only a two-class classification algorithm is given, so that the problem of multi-classification is difficult to solve; aiming at the problems in the Prior art, the convolutional neural network model (CNN) is improved based on a migration learning method, a Prior-Tree CNN model is provided for learning class structures and classifier parameters, a large amount of existing labeled data of related classes are fully utilized, and efficient knowledge migration of the related classes is learned, so that the accuracy and robustness of object classification of a small sample set are improved by using a classifier obtained by training on a large sample set, and a Meal-53 small sample data set is established, and the aim of improving the classification accuracy is fulfilled under the condition that only a few samples exist.

Disclosure of Invention

In order to solve the problems, the invention provides a large-scale food material image classification method based on transfer learning, a convolutional neural network model (CNN) is improved based on the transfer learning method, a primer-Tree CNN model is provided for learning class structures and classifier parameters, and the improved model has high efficiency, high accuracy and universality.

The invention discloses a large-scale food material image classification method based on transfer learning, which comprises the following steps of: inputting an original picture, and outputting information containing two aspects through the transfer learning knowledge, wherein one is the identification result of the receiving environment, and the other is the identification result of the food material type; after forward propagation of the multitasking based Prior-Tree CNN model, the joint features generated by the last fully connected layer will be input into different classification tasks.

Further, the classification method specifically comprises the following steps: adding an artificially set tree structure on the multi-task classification CNN model, firstly setting a parent node according to the tree structure, then enabling a newly added subclass to directly obtain the characteristics of the parent, and training under the condition; therefore, even if the data is less, the newly added subclasses can have good effects; the model can also be viewed as a specific softmax penalty for different tasks; after optimizing the joint loss layer and sharing the visual features, the CNN model will deliver its relevant parameters in the back propagation process; marking whether the collected food material images are pictures in a clean goods receiving environment by using c e to {0, 1}, and marking the food material types by using a multi-type label K to e to { 1.. K }, wherein K is the number of all the types; finally, training the whole network and parameters iteratively until convergence;

in the actual process, the learning process is almost the same as the standard gradient reduction, and during testing, whether the image is a clean receiving environment picture is verified firstly; if the image is predicted to be a non-clean image, the system will take a new image to continue the process until a clean image is found within a given time; second, if the image is predicted to be a vegetable category, the model will filter other labels in the vegetable in the given order; finally, through the Priori Tree CNN model combined with the information of orders, weight and the like, the clean image can obtain the food material category and the predicted score.

Still further, the formula expression of the color-Tree CNN model is as follows:

firstly, supposing that a prior tree is provided, all food categories are composed of three layers of trees, the prior tree has K +1 leaf nodes, and K food category labels and 1 neglected label correspond to the prior tree respectively; the leaf nodes are connected with 7 father nodes of the second layer, the K food material categories have 4 fathers and comprise S1, S2, S3 and S4, and the receiving environment categories are divided into 3 groups comprising N1, N2 and N3; for the classification, if the input images are classified into vegetables with high confidence level, other categories in the order are filtered for prediction, the categories such as meat, aquatic products and the like do not need to be concerned, and the accuracy rate of the food material category identification task is further improved;

in order to simplify the Priori Tree model, only the relationships between the labels in the K class and the four parent nodes thereof are concerned; before the softmax function, the leaf tag node k is associated with a weight vector

Associating; each super class node s and vector

Associating, wherein s belongs to {1, 2, 3, 4 }; e.g. beta_cabbageAnd beta_carrotRecord and alpha_vegatableThe deviation therebetween; the following generative model is defined for β:

this prior expression describes the relationship between classes, with the conditional distribution over k being expressed as:

inferring the value of { W, β, α } from MAP, maximizing:

logP(k|I，t，W，β)+logP(k|I，W，β)+logP(W)+logP(β|α)+logP(α)； (3)

from the point of view of the loss function, minimize:

here, by fixing the value of α to 0, the loss function will be reduced to the standard loss function, let

C_sThen { k | paramt (k) ═ s }, then

Therefore, the loss function in equation (4) is optimized by iteratively performing the following two steps; first, W and β are maximized, α is fixed using a standard random gradient descent (SGD) of a standard loss function; second, β is fixed by equation (5), thereby maximizing α.

Compared with the Prior art, the migration learning-based large-scale food material image classification method improves the convolutional neural network model based on the migration learning method, and provides the Prior-Tree CNN model to learn the class structure and the classifier parameters, and the improved model has the following advantages: the method has high efficiency, provides a multitask CNN model to classify honest/dishonest environment images so as to reduce noise in a Meal-53 data set and more effectively utilize the characteristics of training samples; the accuracy is high, and the improved model can correctly identify the category when the data training set is less; generally, the improved CNN model solves the problem through learning, is not limited to a specific problem, and can automatically build a model according to the problem, thereby solving similar problems.

Drawings

FIG. 1 is a schematic diagram of the overall architecture of the Prior-Tree CNN model of the present invention; wherein, the diagram (a) is a Prior-Tree CNN model diagram; fig. b is a tree structure diagram.

FIG. 2 is a schematic diagram of the filtering of tags within a model by super class prediction of the present invention; wherein, the diagram (a) is a priori tree diagram; FIG. b is a schematic view of order classification.

FIG. 3 is a schematic illustration of the accuracy of validation using 1-crop on the Mealcome dataset of the present invention.

Figure 4 is a graphical representation of the accuracy of validation using 10-crops on the Mealcome dataset of the present invention.

FIG. 5 is a schematic diagram of an exemplary structure of a Priori Tree Model prediction sample of the present invention.

Detailed Description

The invention discloses a large-scale food material image classification method based on transfer learning, which comprises the following steps of: in order to enhance the effect of the model on classification and enable the model to correctly identify the class when the training set of data is less; a primitive picture Ii is input, and information containing two aspects is output through the knowledge of transfer learning, wherein one is an identification result of a receiving environment, and the other is an identification result of a food material type; after forward propagation of the multitasking based Prior-Tree CNN model, the joint features generated by the last fully connected layer will be input into different classification tasks.

The classification method specifically comprises the following steps: adding an artificially set tree structure on the multi-task classification CNN model, firstly setting a parent node according to the tree structure, then enabling a newly added subclass to directly obtain the characteristics of the parent, and training under the condition; therefore, even if the data is less, the newly added subclasses can have good effects; the model can also be viewed as a specific softmax penalty for different tasks; after optimizing the joint loss layer and sharing the visual features, the CNN model will deliver its relevant parameters in the back propagation process; marking whether the collected food material images are pictures in a clean goods receiving environment by using c e to {0, 1}, and marking the food material types by using a multi-type label K to e to { 1.. K }, wherein K is the number of all the types; finally, training the whole network and parameters iteratively until convergence;

The formula expression mode of the primer-Tree CNN model is as follows:

firstly, supposing that a prior tree is provided, all food categories are composed of three layers of trees, as shown in fig. 2, the prior tree has K +1 leaf nodes, and corresponding to K food category labels and 1 neglected label respectively; the leaf nodes are connected with 7 father nodes of the second layer, the K food material categories have 4 fathers, namely are divided into 4 groups (S1, S2, S3 and S4), and the receiving environment categories are divided into 3 groups (N1, N2 and N3); for the classification, if the input images are classified into vegetables with high confidence level, other categories in the order are filtered for prediction, the categories such as meat, aquatic products and the like do not need to be concerned, and the accuracy rate of the food material category identification task is further improved;

Associating; each super class node s and vector

inferring the value of { W, β, α } from MAP, maximizing:

logP(k|I，t，W，β)+logP(k|I，W，β)+logP(W)+logP(β|α)+logP(α)； (3)

from the point of view of the loss function, minimize:

C_sThen { k | paramt (k) ═ s }, then

Example 1:

training by taking a ResNet-50 multitask experiment as a baseline experiment, and then gradually improving the identification precision of food materials by using an order, a weight characteristic and a transfer learning technology, wherein a Prior-Tree CNN model has the following specific experimental method:

A. the data set is compared with the experimental environment,

the data set originates from Mealcome, which is one of the largest food supply chain platforms in china (www.mealcome.com), serving nearly 1000 restaurants; 15020 pictures of a clean environment and 15557 dirty pictures are selected to construct a Meal-53 data set; the clean pictures are divided into 51 food material categories, and the dirty pictures are divided into 3 categories; all of these data tags are added manually; in the part of the clean picture, the image numbers of each category vary from 106 to 895; for all dirty pictures, "light dark" is merged to "others", so the dirty picture is divided into two parts, the "bag not opened" and "others", the number of pictures is 11382 and 4157 respectively; the ratio of the training set, the test set and the validation set was 70%, 15% and 15%, respectively; due to the imbalance of the data set, in training the ResNet model, the method of oversample was used to ensure that there were 500 images for training and 100 images for testing per class; to train the multitask-based deep CNN model, clean and dirty picture datasets are used;

the clean pictures in the Meal-53 data set contain order information, the number of the order information is 5026, and the number of the food material pictures of each order is between 10 and 35, which is also unbalanced; in each order, the food material image also has weight information; 5026 orders are further divided into a training set, a testing set and a verification set, and the proportion of the three orders is the same as that of the data set; these data sets are used to train Order-based models (Order Dropout, Order weighing and priority Tree), with Order and weight information integrated into the image labels during training, testing and validation;

in the experiment, the depth learning frame uses the modified cafe, the depth learning model selects ResNet-50, and the ResNet-50 is pre-trained to reduce the training time and improve the accuracy; in all models, the weight lr _ mult of the last Innerproduct layer is set to 10.0, and the bias term is set to 20.0 for training; the training and testing batches are both 16, and the momentum is 0.9; all experiments were performed on Intel (R) i7CPU, 32GB RAM and GeForce GTX 1080 Ti;

B. a baseline (baseline) and an assessment method,

a deep learning model ResNet and a multitask CNN are used as baseline to complete two tasks simultaneously;

the data shown in fig. 3 is average accuracy, and for the receiving environment recognition task, the recognition ratio of clean pictures and dirty pictures can be obtained: an integration ratio and a non-integration ratio; these two ratios are True Positive Rate (TPR); for the food material category identification task, the probability of each food material category can be obtained, and the evaluation formulas of Top-k hit rate and recall rate are as follows:

in the formula, N_iI ∈ (0, 1...., 50) is the number of the ith food material category, N_kiIs the probability of top-k for the ith food material in the test set, n equals 50;

C. the results of the experiments and the analysis thereof,

the experimental results of ResNet, Baseline, Order weighing Model and priority Tree Model are shown in FIGS. 3 and 4, and FIG. 3 shows the experimental results obtained using the 1-crop validation method; it can be concluded that the priority Tree Model further increases the integration rate and non-integration rate by 94.41% and 92.86%, respectively, in the receiving environment recognition task; similarly, FIG. 4 shows that the Priori Tree Model achieved the best results of 99.12% and 99.94% on Top-1 and Top-3 accuracy, and that Top-5 achieved substantially consistent results of 100% on all other models; the experimental results in FIG. 4 were obtained by the 10-crop verification method, and it can be seen that the method has a better effect than the 1-crop verification method, in which it can be seen that the integer ratio and non-integer ratio in the Priori Tree Model are further improved to 94.84% and 94.56%, respectively; fig. 5 shows the recognition results of the Baseline and the Priori Tree models for some food materials, wherein the "Baseline" label under the sample image represents the recognition result of the Baseline (Baseline), and the "Priori Tree" represents the recognition result of the prior Tree Model; the above results indicate that the use of the Priori Tree CNN of the transfer learning can improve the accuracy of food material identification.

According to the large-scale food material image classification method based on the transfer learning, when only a small number of labeled samples exist in a food material data set, a convolutional neural network model (CNN) is improved based on the transfer learning method, a Prior-Tree CNN model is provided for learning food material class structures and classifier parameters, a large amount of existing labeled data of related classes are fully utilized, efficient knowledge transfer of the related classes is learned, and therefore the accuracy and robustness of object classification of a small sample set are improved by using a trained classifier on the large sample set; establishing a Meal-53 small sample data set, wherein the data set comprises 32 vegetable categories, 15 meat and poultry eggs, 3 aquatic food materials, 1 bean food material and a bean food material, and 3 dirty pictures, namely, no bag opening, dark light and the like;

the convolutional neural network model (CNN) is improved based on a transfer learning method, and a primer-Tree CNN model is provided for learning class structures and classifier parameters, and the improved model has the following advantages: the method has high efficiency, provides a multitask CNN model to classify honest/dishonest environment images so as to reduce noise in a Meal-53 data set and more effectively utilize the characteristics of training samples; the accuracy is high, and the improved model can correctly identify the category when the data training set is less; generally, the improved CNN model solves the problem through learning, is not limited to a specific problem, and can automatically build a model according to the problem, thereby solving similar problems.

The above-described embodiments are merely preferred embodiments of the present invention, and all equivalent changes or modifications of the structures, features and principles described in the claims of the present invention are included in the scope of the present invention.

Claims

1. A large-scale food material image classification method based on transfer learning is characterized by comprising the following steps: inputting an original picture, and outputting information containing two aspects through the transfer learning knowledge, wherein one is the identification result of the receiving environment, and the other is the identification result of the food material type; after forward propagation of the multitask-based prior tree CNN model, the joint features produced by the last fully-connected layer will be input into different classification tasks;

the classification method specifically comprises the following steps: adding an artificially set tree structure on the multi-task classification CNN model, firstly setting a parent node according to the tree structure, and then enabling a newly added subclass to directly obtain the characteristics of the parent; after optimizing the joint loss layer and sharing the visual features, the CNN model will deliver its relevant parameters in the back propagation process; marking whether the collected food material images are pictures in a clean goods receiving environment by using c e to {0, 1}, and marking the food material types by using a multi-type label K to e to { 1.. K }, wherein K is the number of all the types; finally, training the whole network and parameters iteratively until convergence;

when testing, firstly, verifying whether the image is a clean receiving environment picture; if the image is predicted to be a non-clean image, the system will take a new image to continue the process until a clean image is found within a given time; second, if the image is predicted to be a vegetable category, the model will filter other labels in the vegetable in the given order; finally, through a priori tree CNN model combined with order and weight information, the clean image can obtain the food material category and the predicted score;

the prior tree CNN model has the following formula expression mode:

firstly, supposing that a prior tree is provided, all food categories are composed of three layers of trees, the prior tree has K +1 leaf nodes, and K food category labels and 1 neglected label correspond to the prior tree respectively; the leaf nodes are connected with 7 father nodes of the second layer, the K food material categories have 4 fathers and comprise S1, S2, S3 and S4, and the receiving environment categories are divided into 3 groups comprising N1, N2 and N3; for the hierarchical classification, if the input images are classified into vegetables with high confidence, other categories in the order are filtered out for prediction, and the categories such as meat, aquatic products and the like do not need to be concerned;

before the softmax function, the leaf tag node k is associated with a weight vector

Associating; each super class node s and vector

inferring the value of { W, β, α } from MAP, maximizing:

logP(k|I，t，W，β)+logP(k|I，W，β)+logP(W)+logP(β|α)+logP(α)； (3)

from the point of view of the loss function, minimize:

here, by fixing the value of α to 0, the loss function will be reduced to the standard loss function, let C_sThen { k | paramt (k) ═ s }, then

Therefore, the loss function in equation (4) is optimized by iteratively performing the following two steps; first, maximize W and β, fix α using standard random gradient descent of a standard loss function; second, β is fixed by equation (5), thereby maximizing α.