CN114758167B - Dish identification method based on self-adaptive contrast learning - Google Patents

Dish identification method based on self-adaptive contrast learning Download PDF

Info

Publication number
CN114758167B
CN114758167B CN202210163470.4A CN202210163470A CN114758167B CN 114758167 B CN114758167 B CN 114758167B CN 202210163470 A CN202210163470 A CN 202210163470A CN 114758167 B CN114758167 B CN 114758167B
Authority
CN
China
Prior art keywords
dish
pictures
self
identification method
dish identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210163470.4A
Other languages
Chinese (zh)
Other versions
CN114758167A (en
Inventor
胡海苗
徐振博
黄龚
姜宏旭
李明竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Shifang Technology Co ltd
Hangzhou Innovation Research Institute of Beihang University
Original Assignee
Hangzhou Shifang Technology Co ltd
Hangzhou Innovation Research Institute of Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Shifang Technology Co ltd, Hangzhou Innovation Research Institute of Beihang University filed Critical Hangzhou Shifang Technology Co ltd
Priority to CN202210163470.4A priority Critical patent/CN114758167B/en
Publication of CN114758167A publication Critical patent/CN114758167A/en
Application granted granted Critical
Publication of CN114758167B publication Critical patent/CN114758167B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a dish identification method based on self-adaptive contrast learning, which is different from the traditional dish identification method, is based on a neural network of self-adaptive contrast learning, does not need on-line training, has lower requirements on reasoning environment, and provides a multi-scale triplet loss function so as to lead the neural network to self-adaptively learn the loss of different scale differences, thereby better distinguishing the fine differences among dishes; the multi-scale triplet loss function consists of triplet loss functions comprising three boundaries and a maximum value selection function, and boundary values of triplet loss can be selected in a self-adaptive mode; according to the invention, the offline reasoning of the dish identification is realized in a self-adaptive comparison learning mode, the restriction of the dish type is avoided, the real-time change of the type can be dealt with, and the computational power requirement of the dish identification application environment is greatly reduced by the offline reasoning; according to the invention, the low-similarity sample is introduced in the feedback process to automatically delete, so that the dish identification method can be stably operated for a long time.

Description

Dish identification method based on self-adaptive contrast learning
Technical Field
The invention relates to a dish identification method based on self-adaptive contrast learning.
Background
The existing classical dish identification method is often realized by classifying different dishes based on a neural network, and the method is often realized by retraining parameters of the neural network, and has the advantages of large calculation power and long training time depending on cloud or side ends. Because of the long time required for training network parameters, traditional dishes cannot be newly added in real time. Traditional schemes based on contrast learning often do not consider the similarity between dishes, and the loss function is calculated by using the boundary value of the same distance, so that the feature extraction network predicted features are not very distinguishable. In addition, dish identification schemes based on comparison learning often accumulate errors in the identification process, so that the accuracy of dish identification can be deteriorated with the use time.
Disclosure of Invention
It is an object of the present invention to address at least one of the above problems and/or disadvantages and to provide at least the advantages described below.
It is still another object of the present invention to provide a dish identification method based on adaptive contrast learning, which can optimize the distinguishability of the feature extraction network prediction features by using the triple loss function of the adaptive boundary, and ensure high accuracy of dish identification. By introducing a strategy of automatically deleting the low-similarity samples, the problem of error accumulation in dish identification pushing is effectively solved.
To achieve these objects and other advantages and in accordance with the purpose of the invention, a dish identification method based on adaptive contrast learning is provided, comprising: in the training process, a training method of a feature extraction model based on a self-adaptive contrast learning loss function is provided, the three-tuple loss based on three different boundaries is calculated for each three-tuple at the same time, and then a larger loss value in three loss values is selected for back propagation for each three-tuple; the neural network parameters are fixed, only reasoning is carried out, and the updated parameters do not need to be trained; in the reasoning stage, in order to prevent error accumulation, a low-similarity sample is introduced in the feedback process to automatically delete, so that the dish identification method can stably run for a long time.
The input of the training process comprises a plurality of menu categories, and the number of images in each category is not less than two. Every two images of the same category and one image of a different category form a triplet. During training, the triple loss based on multiple boundaries is calculated simultaneously for each triplet, and then a larger loss value between the two is selected for back propagation for each triplet.
Preferably, a triplet (a, p, n) is assumed, where a and p are the same dish category and n belongs to a different dish category. The larger boundary triplet loss function is L B=max{d(a,p)-d(a,n)+MB,0},MB as the larger boundary constant. The medium boundary triplet loss function is L I=g*max{d(a,p)-d(a,n)+MI,0},MI a medium boundary constant. The smaller boundary triplet loss function is L S=f*max{d(a,p)-d(a,n)+MS,0},MS, which is a smaller boundary constant, where f, g are constants. The self-adaptive contrast learning loss function is L=max { L B,LI,LS };
the reasoning phase consists of three processes: a feature extraction process, a comparison process and a feedback process. Firstly, in the feature extraction process, feature extraction is carried out on an input image based on a feature extraction model optimized in a training stage, so as to obtain a feature M. And then, taking out all the features cached in the feature cache region, calculating the distance based on the similarity degree and the current features, and taking the category of the features corresponding to the minimum distance D between all the features in the feature cache region and the current features as the recognition result. And then, if the minimum distance is smaller than the threshold value T, storing the currently identified features into a feature cache area, otherwise, discarding the features to complete the reasoning process.
Preferably, the training process further comprises a data enhancement step of preprocessing the dish identification image: performing random horizontal/vertical overturn on an input image; adding random contrast, saturation or brightness noise to the input image.
The invention at least comprises the following beneficial effects: because the self-adaptive contrast learning loss function is introduced in the training stage, the loss functions of different boundaries are selected for different triples, so that the neural network achieves a better contrast learning effect, and the accuracy of dish identification is improved; the neural network parameters are fixed, only reasoning is carried out, and the updated parameters do not need to be trained, so that the calculation force requirement on the computing equipment can be greatly reduced; in the reasoning stage, in order to prevent error accumulation, a low-similarity sample is introduced in the feedback process to automatically delete, so that the dish identification method can stably run for a long time
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.
Drawings
FIG. 1 is a training flow chart of a dish identification method based on adaptive contrast learning in one embodiment of the invention;
FIG. 2 is a flowchart of an application of a dish identification method based on adaptive contrast learning according to an embodiment of the present invention;
FIG. 3 is a graph of a loss function calculation for adaptive contrast learning according to one embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings, so as to enable those skilled in the art to refer to the description.
It will be understood that terms, such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other elements or groups thereof.
Fig. 1 and 2 illustrate a dish identification method based on adaptive contrast learning according to an embodiment of the present invention, which includes: in the training process, for each triplet, the triplet loss of the three boundaries is calculated, and then a larger loss value is selected as the final loss value for the loss value of each triplet. The final loss value is used for optimizing the neural network parameters; in the reasoning process, firstly, feature extraction is carried out on an input image based on a feature extraction model optimized in a training stage, then the extracted features and all the features cached in a feature cache region are calculated according to the similarity degree, and the category of the features corresponding to the minimum distance D between all the features in the feature cache region and the current features is taken as the recognition result. And then, if the minimum distance is smaller than the threshold value T, storing the currently identified features into a feature cache area, otherwise, discarding the features to complete the reasoning process.
The dish identification method based on self-adaptive contrast learning adopts a feature extraction network ResNet < 18 >, and the specific method comprises the following implementation processes:
1. Training process
Randomly selecting 32 different dish categories from the training set, then randomly taking 8 pictures from each category, carrying out data enhancement on 256 pictures in total, and comprising the following steps:
step one, horizontally overturning the 256 pictures with the probability of Q1 to obtain 256 pictures after random horizontal overturning;
step two, vertically overturning the 256 pictures obtained in the step one with the probability of Q2 to obtain 256 pictures after random vertical overturning;
Step three, adding random contrast noise, saturation noise and brightness noise to the 256 pictures obtained in the step two according to the probability of Q3 in sequence to obtain 256 pictures with random added random noise;
Step four, resampling the image and normalizing the pixel values, resampling the 256 pictures obtained in the step three to obtain 256 pictures with 224 pixels in width and height, and normalizing the pixel values of each picture to be between 0 and 1;
Inputting the 256 pictures subjected to resampling and normalization processing into ResNet networks to obtain the characteristics with the size of (256,1000);
Step six, finding out all triples (a, p, n) existing in the 256 pictures according to the menu IDs of the 256 pictures, wherein a is a feature extracted based on a template picture, p is a feature extracted based on any one input picture with the same category as a, and n is a feature extracted based on any one input picture with a different category from a. We calculate the larger boundary triplet loss L B=max{d(a,p)-d(a,n)+MB, 0, the medium boundary triplet loss L I=g*max{d(a,p)-d(a,n)+MI, 0, and the smaller boundary triplet loss L S=f*max{d(a,p)-d(a,n)+MS, 0 for each triplet, where g, f are constants, preferably 2 and 4, respectively. d (x, y) is the Euclidean distance of x and y. Subscripts B, I, S represent the larger, medium, and smaller boundaries, respectively. Then for each triplet (a, p, n), leave l=max { L B,LI,LS } as the final penalty;
And seventhly, calculating the gradient of the neural network parameters based on the AdamW optimizer and the final loss, and optimizing the parameters of the model.
2. Dish identification process
Step one, resampling an unknown menu image and normalizing pixel values to obtain an image tensor P with the size of (1,3,224,224), inputting the image tensor P into a neural network optimized based on self-adaptive contrast learning, and obtaining a feature vector M with the size of (1,1000) after calculation of the neural network;
And step two, if the category of the dishes appears for the first time or the characteristic buffer area is empty, the dishes are considered to be a new category. Otherwise, calculating Euclidean distances between M and all the features in the feature cache region, and taking the dish category corresponding to the minimum value D as a final recognition result;
And step three, executing a low-similarity sample automatic deleting strategy, if the minimum distance D is smaller than a preset threshold T, preferably 0.1, storing the currently identified features and the identification result into a feature cache area, otherwise, discarding the features and the identification result, and completing the identification process.
To further illustrate the effect of the invention, two examples are listed as follows:
Three restaurant data which come from different areas and are randomly selected A, B, C are adopted for comparison experiments of dish identification, in order to facilitate the accuracy of the verification method, the data selected by the experiments are respectively dish identification records of three restaurants in a time dimension for one month, and more dish types with similar appearance exist in the dish types of each restaurant. Wherein, restaurant A is used as training set, 600 dishes are all provided, and the total number of pictures is up to 80,000. Restaurant B and restaurant C serve as a validation set and a test set, respectively. The number of data set samples is shown in table 1.
Table 1 dataset composition
Training set-A restaurant Verification set-B restaurant Test set-C restaurant
Category number 600 200 300
Total number of pictures 80,000 21,000 24,000
In order to show the advantages of the dish identification method based on adaptive contrast learning, we select the commonly used reference method ResNet as the feature extraction network. In addition, in order to show that the method of the present invention has an improved effect on other models, the present inventors simultaneously selected ResNet and ResNet networks to compare the scheme according to the present invention with the prior art scheme. The operation of the ResNet embodiment is consistent with ResNet. The reference method only adopts the triple loss of a smaller boundary as the final loss, adopts the common recognition accuracy, namely the average probability of the correct result of each recognition, as a measurement standard, and adds all recognition results into the characteristic cache region in the recognition process. Based on the same experimental configuration of training/verification/testing, the influence of adaptive contrast learning (abbreviated as +ada in table 2) on the verification set and the test set and the automatic deletion (abbreviated as +T in table 2) of the low similarity sample introduced in the feedback process on the accuracy of the fresh identification method is compared. The comparative experiments are shown in table 2.
Table 2 results of comparative experiments with or without adaptive convolution kernels for different data sets
As shown in table 2, the baseline method consisting of the fixed triplet loss optimized feature extraction network and the recognition error accumulation was limited in dish recognition effect of less than 75% in both the validation set B restaurant and the test set C restaurant, whether ResNet or ResNet. After the training method based on self-adaptive contrast learning provided by the invention is adopted, the test results on the restaurant of the test set C show that the dish identification accuracy of ResNet and ResNet is respectively improved by 7.4% and 7.0%. After the strategy of automatically deleting the low-similarity samples in the feedback process provided by the invention is adopted, the problem of accumulation of identification errors is effectively relieved, the identification accuracy is greatly improved, the identification accuracy of ResNet50 on a restaurant in a verification set B and a restaurant in a test set C is respectively improved by 11.5% and 10.8%, 90.9% and 92.4% respectively, and the fact that the strategy of automatically deleting the low-similarity samples provided by the invention has a remarkable improvement effect on improving the identification accuracy of dishes when dishes are identified across restaurants is proved, and the embodiments of 'ResNet 18 +ada+T' and 'ResNet 50 +ada+T' are preferred embodiments of the invention.
As described above, according to the invention, a training scheme based on self-adaptive contrast learning and a low-similarity sample automatic deletion strategy are adopted, so that a high-precision dish identification method without training can be ensured. The dish identification method not only can support real-time dish addition, but also can stably identify dishes for a long time, and can be widely applied to application scenes such as social meals, intelligent campuses and the like which need to use dish identification to improve the digitalization of dishes.
Although embodiments of the invention have been disclosed above, they are not limited to the use listed in the specification and embodiments. It can be applied to various fields suitable for the present invention. Additional modifications will readily occur to those skilled in the art. Therefore, the invention is not to be limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.

Claims (3)

1. A dish identification method based on self-adaptive contrast learning is characterized by comprising the following steps:
A) The training step includes randomly selecting 32 different dish categories from a training set, then randomly taking 8 pictures from each dish category, carrying out data enhancement on 256 pictures in total, and comprises the following steps:
A1 Horizontally turning over the 256 pictures with the probability of Q1 to obtain 256 pictures after random horizontal turning over;
a2 C), vertically overturning the 256 pictures obtained in the step A1) with the probability of Q2 to obtain 256 random vertically overturned pictures;
A3 Adding random contrast noise, saturation noise and brightness noise to the 256 pictures obtained in the step A2) according to the probability of Q3 in sequence to obtain 256 pictures with random added random noise;
A4 Resampling the images and normalizing the pixel values, wherein the resampling comprises the steps of evenly resampling 256 pictures obtained in the step A3), obtaining 256 pictures with 224 pixels in width and height, and normalizing the pixel values of each picture to be between 0 and 1;
A5 256 pictures subjected to resampling and pixel value normalization are input into a feature extraction network, wherein the feature extraction network can be any neural network which can be used for image classification, the invention takes ResNet and ResNet as examples to obtain feature vectors with the size of (256, V), and the V can be any length, and the invention takes common 1000 as examples;
A6 Finding all triples (a, p, n) existing in 256 pictures according to the dish IDs of the 256 pictures, wherein a is a feature extracted based on a template picture, p is a feature extracted based on any one input picture similar to a, n is a feature extracted based on any one input picture different from a, multi-scale triplet loss L B=max{d(a,p)-d(a,n)+MB, 0, medium boundary triplet loss L I=g*max{d(a,p)-d(a,n)+MI, 0 and small boundary triplet loss L S=f*max{d(a,p)-d(a,n)+MS, 0 of each triplet are calculated, g and f are constants, d (x, y) is the Euclidean distance between x and y, subscript B, I, S represents a large boundary, a medium boundary and a small boundary respectively, and then L=max { L B,LI,LS } is reserved for each triplet (a, p, n) as a final loss;
a7 Calculating a gradient of the neural network parameters based on AdamW optimizers and final losses, optimizing the parameters of the model,
B) A dish identification step comprising:
B1 Resampling and normalizing pixel values of an unknown menu image to obtain an image tensor P with the size of (1,3,224,224), inputting the image tensor P into a neural network optimized based on self-adaptive contrast learning, and obtaining a feature vector M with the size of (1,1000) after calculation of the neural network;
B2 If the dish of the dish category appears for the first time or the characteristic cache area is empty, the dish category is considered to be a new dish category, the characteristic vector and the new category are added into the characteristic library without identification, otherwise, the Euclidean distance between M and all the characteristics in the characteristic cache area is calculated, and the dish category corresponding to the minimum value D is taken as a final identification result;
and step three, executing a low-similarity sample automatic deleting strategy, if the minimum distance D is smaller than a preset threshold value T, storing the currently identified features and the identification result into a feature cache region, otherwise, discarding the features and the identification result, and completing the identification process.
2. The adaptive contrast learning-based dish identification method as claimed in claim 1, wherein:
The preset threshold is preferably 0.1.
3. A dish identification method based on adaptive contrast learning as claimed in claim 1 or 2, wherein:
The constants g and f are preferably 2 and 4, respectively.
CN202210163470.4A 2022-02-22 2022-02-22 Dish identification method based on self-adaptive contrast learning Active CN114758167B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210163470.4A CN114758167B (en) 2022-02-22 2022-02-22 Dish identification method based on self-adaptive contrast learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210163470.4A CN114758167B (en) 2022-02-22 2022-02-22 Dish identification method based on self-adaptive contrast learning

Publications (2)

Publication Number Publication Date
CN114758167A CN114758167A (en) 2022-07-15
CN114758167B true CN114758167B (en) 2024-04-26

Family

ID=82325913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210163470.4A Active CN114758167B (en) 2022-02-22 2022-02-22 Dish identification method based on self-adaptive contrast learning

Country Status (1)

Country Link
CN (1) CN114758167B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399428A (en) * 2018-02-09 2018-08-14 哈尔滨工业大学深圳研究生院 A kind of triple loss function design method based on mark than criterion
KR20190140824A (en) * 2018-05-31 2019-12-20 한국과학기술원 Training method of deep learning models for ordinal classification using triplet-based loss and training apparatus thereof
CN113313149A (en) * 2021-05-14 2021-08-27 华南理工大学 Dish identification method based on attention mechanism and metric learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008842A (en) * 2019-03-09 2019-07-12 同济大学 A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108399428A (en) * 2018-02-09 2018-08-14 哈尔滨工业大学深圳研究生院 A kind of triple loss function design method based on mark than criterion
KR20190140824A (en) * 2018-05-31 2019-12-20 한국과학기술원 Training method of deep learning models for ordinal classification using triplet-based loss and training apparatus thereof
CN113313149A (en) * 2021-05-14 2021-08-27 华南理工大学 Dish identification method based on attention mechanism and metric learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吕永强 ; 闵巍庆 ; 段华 ; 蒋树强 ; .融合三元卷积神经网络与关系网络的小样本食品图像识别.计算机科学.2020,(第01期),全文. *
朱瑶 ; 刘一茳 ; .基于深度卷积神经网络的菜品识别.常州信息职业技术学院学报.2020,(第04期),全文. *

Also Published As

Publication number Publication date
CN114758167A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
CN110163234B (en) Model training method and device and storage medium
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN110555399B (en) Finger vein identification method and device, computer equipment and readable storage medium
EP3171332A1 (en) Methods and systems for inspecting goods
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN109684922B (en) Multi-model finished dish identification method based on convolutional neural network
CN110781921A (en) Depth residual error network and transfer learning-based muscarinic image identification method and device
CN110188763B (en) Image significance detection method based on improved graph model
CN109165658B (en) Strong negative sample underwater target detection method based on fast-RCNN
Prost et al. Learning local regularization for variational image restoration
CN112200293A (en) CART-AMV improved random forest algorithm
CN106780501A (en) Based on the image partition method for improving artificial bee colony algorithm
CN113378620B (en) Cross-camera pedestrian re-identification method in surveillance video noise environment
Li et al. A novelty harmony search algorithm of image segmentation for multilevel thresholding using learning experience and search space constraints
CN113159159B (en) Small sample image classification method based on improved CNN
CN113033345B (en) V2V video face recognition method based on public feature subspace
Jiao et al. [Retracted] An Improved Cuckoo Search Algorithm for Multithreshold Image Segmentation
CN114758167B (en) Dish identification method based on self-adaptive contrast learning
CN116664643A (en) Railway train image registration method and equipment based on SuperPoint algorithm
CN115546626B (en) Data double imbalance-oriented depolarization scene graph generation method and system
CN110781936A (en) Construction method of threshold learnable local binary network based on texture description and deep learning and remote sensing image classification method
Zheng et al. Improvement of grayscale image segmentation based on pso algorithm
CN112381161B (en) Neural network training method
CN114677535A (en) Training method of domain-adaptive image classification network, image classification method and device
CN114399780A (en) Table detection method, table detection model training method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant