CN113449654A - Intelligent canteen food detection method based on depth model and quantification technology - Google Patents

Intelligent canteen food detection method based on depth model and quantification technology Download PDF

Info

Publication number
CN113449654A
CN113449654A CN202110743126.8A CN202110743126A CN113449654A CN 113449654 A CN113449654 A CN 113449654A CN 202110743126 A CN202110743126 A CN 202110743126A CN 113449654 A CN113449654 A CN 113449654A
Authority
CN
China
Prior art keywords
model
network
food
quantification
method based
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110743126.8A
Other languages
Chinese (zh)
Inventor
刘宁钟
彭耿
林龚伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202110743126.8A priority Critical patent/CN113449654A/en
Publication of CN113449654A publication Critical patent/CN113449654A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a smart canteen food detection method based on a depth model and a quantification technology, belongs to the technical field of computer vision, and reduces the requirement of model deployment on hardware while improving the accuracy and speed of current food detection. The invention comprises the following steps: firstly, collecting a large number of canteen tray food images, and marking the types and positions of food in the images; then, sending the data into a convolutional neural network designed for food detection to train until the network converges to obtain a weight file; then reducing the size of the model through a quantization technology, and deploying the model into an embedded mainboard; finally, the food in the food image can be rapidly detected through the neural network and the weight file. The invention solves the problems of low accuracy and slow reasoning speed of the existing food identification, and effectively solves the problems of difficult deployment of a depth model and higher dependence on computing power and video memory of hardware.

Description

Intelligent canteen food detection method based on depth model and quantification technology
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to an intelligent canteen food detection method based on a depth model and a quantification technology.
Background
In recent years, due to rapid development of computer vision, scenes such as intelligent transportation, unmanned supermarkets and the like are realistic, and the concept of a smart city is gradually deepened into people while people enjoy convenience. In some large-scale food trading places, such as canteens, many single trades are processed each day. According to the survey, at present, a plurality of canteens are in a self-service meal selection and manual settlement mode. Because the settlement platform is all manual operation at present, often efficiency is not high, can lead to team very long. This wastes not only customer time but also valuable human resources. Therefore, a feasible scheme is provided, and a canteen unmanned charge settlement system is designed to intelligentize the settlement process. This can not only raise the efficiency, liberate the manpower, still promoted the wisdom dining room construction in wisdom city.
At present, the canteen food identification method mainly comprises a detection method based on a convolutional neural network and an identification method based on RFID. The detection algorithm based on the convolutional neural network is divided into a one-stage detection algorithm and a two-stage detection algorithm. The two-stage algorithm is a convolutional neural network based on region suggestion, which first calculates the region (probable region) where the object may exist, and then detects the specific type and specific position of the object in the region. Another one-phase algorithm, which directly predicts the classes and positions of different objects using only the convolutional neural network CNN. In comparison, the single-stage algorithm has a significant speed advantage, but the accuracy is not as good as the two-stage algorithm in many tasks. The food identification method based on the RFID uses the radio frequency technology to identify the food. This is a very cumbersome and traditional method, and a smart canteen based on this technology needs to tag every food item with an RFID.
In comparison with the technical development, the canteen food identification based on the convolutional neural network is a better choice. In 2018, in IEEE Transactions on Multimedia, "Grab, pay, and eat: semiconductor food detection for smart restaurants", food recognition in smart restaurants was studied based on a convolutional neural network, and in their method, a picture was passed through a CNN food segmentation module and a food detection module, which are parallel modules. Background elimination and non-maxima suppression were then performed to achieve semantic food detection. Mainly based on the improvement of the one-stage algorithm YOLOv 2. In the process, a plurality of modules are needed for processing, the reasoning speed is obviously poor, meanwhile, the restaurant charging help is really used, and the accuracy rate is insufficient. An automatic purchasing scheme for Food raw materials is proposed in a paper "ADeep Transfer Learning Solution for Food Material Recognition Using Electronic Scales" in IEEE Transactions on Industrial information, journal in 2020. The type of the food raw materials is identified through the depth model, and the purpose of automatically purchasing the food raw materials by using the electronic scale is achieved by combining the weighing function of the electronic scale. The authors used deeper neural networks to improve accuracy, resulting in higher model weights, which greatly increased deployment difficulties.
Therefore, the method in the prior art mainly has the following defects: the method based on the multi-module convolutional neural network has the advantages of low model accuracy and robustness, high time consumption, large depth model weight and difficult deployment.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, the invention provides an intelligent canteen food detection method based on a depth model and a quantification technology, which solves the problems of low food identification accuracy and low reasoning speed at present, and effectively solves the problems of difficult depth model deployment and high dependence on computing power and video memory of hardware.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
an intelligent canteen food detection method based on a depth model and a quantification technology comprises the following steps:
(1) an image acquisition process: collecting a large number of canteen tray food images, and marking the types and positions of food in the images;
(2) a neural network training process: sending the image data marked in the step (1) into a convolutional neural network designed for food detection to train until the network converges to obtain a weight file;
(3) depth model quantification and deployment process: the size of the model is properly reduced through a quantization technology, and the model is deployed in an embedded mainboard;
(4) and (3) testing an image detection process: and detecting the food target in the test image by using the neural network and the weight file, and outputting a detection result.
In the above steps, the collected image is cleaned in step (1), the pictures which do not meet the requirements, such as fuzzy pictures, incomplete foods and the like, are filtered, then the dish targets in the rest images are labeled, and the types and positions of the foods in the images are labeled;
the neural network training method in the step (2) is a one-stage target detection yolo method, compared with other target detection methods, the yolo method has quite high accuracy and obvious reasoning speed advantage, and the step (2) specifically comprises the following steps:
(21) a residual error module is used in a backbone network, so that image information is effectively extracted, and meanwhile, the calculated amount is greatly reduced;
(22) the SPP module is used for carrying out multiple reception field fusion, so that the aim of helping the network to carry out multi-scale identification is fulfilled;
(23) changing the sock part of the yolo network, and replacing the PANet by BIFPN;
(24) before the network is trained, anchors of the data set are recalculated, so that the network is easier to converge and faster to converge, and IoU of the model is improved;
(25) when the network is trained, the food images input into the network are augmented, and 4 images are spliced and input into the network in a random scaling, random cutting and random arrangement mode while the color gamut of the images is changed;
(26) the CIoU Loss is used as a Loss function of the network, so that the problems that the conventional IoU Loss calculation method is slow in computational convergence and cannot be converged under certain conditions can be effectively solved, and the network convergence is faster;
(27) setting a training hyper-parameter of the network, and training to obtain a network file and a weight file which can be used for food detection;
further, a residual convolution module similar to CSPNet is used in the step (21), and feature maps with different sizes are output through multiple dimensionality reduction;
further, in the step (23), the Neck uses a three-layer BIFPN structure to perform feature fusion, and four feature graphs of different layers output in the backbone network are fused;
further, in the step (27), ImageNet pre-training weight is used as initial weight, the learning rate is set to be 0.0001, the iteration frequency is set to be 20 ten thousand, the batch size is set to be 128, and when the loss function converges or reaches the maximum iteration frequency, the training is stopped to obtain a network file and a weight file which can be used for canteen food detection;
the step (3) specifically comprises the following steps:
(31) the stored floating point type model is quantized into an integer model through quantization, and the size of the model is reduced;
(32) converting the weight model into an RKNN model through model conversion, and deploying the RKNN model to an embedded mainboard AIO-3399 Pro;
further, the quantization mode used in step (31) is Post training quantization, a training-then-quantization mode, which quantizes the model of float64 to asymmetric _ quantized-u8, and the calculation formula is as follows:
Figure BDA0003143407970000041
quant=cast_to_bw
wherein quant represents the quantized number, float _ num represents the floating-point type value, scale is a float32 type, zero _ points is an int32 type, representing the corresponding quantized value when the real number is 0, and finally the quant is saturated to [ range _ min, range _ max ]
rang_max=255
range_min=0
Corresponding inverse quantization:
float_num=scale(quant-zero_point)
the step (4) specifically comprises the following steps:
(41) sending the test image into an improved yolo backbone network to obtain a convolution characteristic diagram;
(42) processing the convolution characteristic graph through a yolo algorithm, and outputting a prediction boundary value and a classification value;
(43) and setting a threshold value, and filtering out a final detection result through non-maximum suppression.
Has the advantages that: the invention provides an intelligent canteen food detection method based on a depth model and a quantification technology, which reduces the calculated amount of a network and improves the detection speed by improving a backbone network of a one-stage method yolo; by improving the feature fusion module, more useful features can be extracted, and the accuracy and speed of food detection are improved; the method is combined with a quantification technology, the requirement of model deployment on hardware is reduced, the difficulty of model deployment is greatly reduced, and an actually available smart canteen charging scheme is provided, so that an important technical progress is provided for smart canteen construction of a smart city.
Drawings
FIG. 1 is an overall flow diagram of an embodiment of the present invention;
FIG. 2 is a flow chart of step 2 of an embodiment of the present invention;
FIG. 3 is a flowchart of step 3 of an embodiment of the present invention;
FIG. 4 is a flowchart of step 4 of an embodiment of the present invention;
FIGS. 5 and 6 are graphs of the results of detection in the embodiment of the present invention;
FIG. 7 is a diagram showing the detection result after amplification in the embodiment of the present invention.
Detailed Description
The invention is described in detail below with reference to the following figures and specific examples:
as shown in fig. 1, the intelligent canteen food detection method based on the depth model and the quantification technology includes the following steps:
step 1: collecting a large number of canteen tray food images, and marking the types and positions of food in the images;
step 2: sending the data into a convolutional neural network designed for food detection to train until the network converges to obtain a weight file;
and step 3: the size of the model is properly reduced through a quantization technology, and the model is deployed in an embedded mainboard;
and 4, step 4: and detecting the food target in the test image by using the neural network and the weight file, and outputting a detection result.
In this example, step 1 employs the following scheme:
and cleaning the acquired images, filtering out pictures which are fuzzy, incomplete in food and the like and do not meet requirements, labeling dish targets in the rest images, and labeling the types and positions of the food in the images.
In this embodiment, the following scheme is adopted in step 2:
firstly, the neural network method is a one-stage target detection yolo method, and compared with other target detection methods, the yolo method has quite high accuracy and obvious reasoning speed advantage.
As shown in fig. 2, the specific unfolding steps of step 2 are as follows:
step 201: the method comprises the following steps of improving a yolo network structure, and mainly improving the backbone network and the nic part structure;
further, the backbone network outputs feature maps of different sizes through multiple dimensionality reduction mainly by using a residual convolution module similar to the CSPNet in step 201. The Neck part uses a three-layer BIFPN structure to perform feature fusion and fuses feature graphs of four different layers output in a backbone network;
step 202: before the network is trained, the anchors of the data set are recalculated to replace the original anchors, so that the network is easier to converge and faster to converge, and IoU of the model is improved;
step 203: the CIoU Loss is used as a Loss function of the network, so that the problems that the conventional IoU Loss calculation method is slow in computational convergence and cannot be converged under certain conditions can be effectively solved, and the network convergence is faster;
step 204: using ImageNet pre-training weight as initial weight, setting learning rate to be 0.0001, setting iteration times to be 20 ten thousand, and setting batch size to be 128;
step 205: performing mosaic enhancement on the input image, training, and stopping training when the loss function converges or reaches the maximum iteration times to obtain a network file and a weight file which can be used for detecting the food in the canteen.
As shown in fig. 3, the specific unfolding steps of step 3 are as follows:
step 301: setting the quantization type as the asymmetric _ quantized-u8 type by using the quantization mode of Post tracking quantization;
further, the quantization calculation formula in step 301 is as follows:
Figure BDA0003143407970000061
quant=cast_to_bw
wherein quant represents the quantized number, float _ num represents the floating-point type value, scale is a float32 type, zero _ points is an int32 type, representing the corresponding quantized value when the real number is 0, and finally the quant is saturated to [ range _ min, range _ max ]
rang_max=255
range_min=0
Corresponding inverse quantization:
float_num=scale(quant-zero_point)
step 302: importing the original model into an RKNN Toolkit, and converting the original model into an RKNN model file which can be used by an NPU;
step 303: and deploying the converted rknn model into the embedded mainboard aio-3399 pro.
As shown in fig. 4, the specific unfolding steps of step 4 are as follows:
(401) sending the test image into a network to obtain a convolution characteristic diagram;
(402) processing the convolution characteristic graph through a yolo algorithm, and outputting a prediction boundary value and a classification value;
(403) the threshold was set to 0.5 and the final detection results were filtered out by non-maximum suppression.
Fig. 5 and 6 are images of canteen food and the detection results using the method of the present invention, respectively, which have been examined to achieve 99.6% food detection accuracy with a small number of samples and faster inference speed. Fig. 7 shows the result of detection after the image has been enlarged. The method has excellent detection effect on the augmentation graph with obvious changes in color, position and shape, and shows that the food detection model has extremely high robustness.
The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and the scope of the present invention should not be limited thereto, and any modifications made on the basis of the technical solutions according to the technical ideas presented by the present invention are within the scope of the present invention.

Claims (10)

1. An intelligent canteen food detection method based on a depth model and a quantification technology is characterized by comprising the following steps:
(1) an image acquisition process: collecting a large number of canteen tray food images, and marking the types and positions of food in the images;
(2) a neural network training process: sending the image data marked in the step (1) into a convolutional neural network designed for food detection to train until the network converges to obtain a weight file;
(3) depth model quantification and deployment process: the size of the model is properly reduced through a quantization technology, and the model is deployed in an embedded mainboard;
(4) and (3) testing an image detection process: and detecting the food target in the test image by using the neural network and the weight file, and outputting a detection result.
2. The intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 1, wherein in step (1), the collected images are cleaned to filter out photos that do not meet requirements, and then dish objects in the remaining images are labeled to mark the types and positions of food in the images.
3. The intelligent canteen food detection method based on depth model and quantification technique of claim 1, wherein the neural network training method in step (2) is a one-stage target detection yolo method.
4. The intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 1 or 3, wherein the step (2) comprises the following steps:
(21) a residual error module is used in a backbone network, so that image information is effectively extracted, and meanwhile, the calculated amount is greatly reduced;
(22) the SPP module is used for carrying out multiple reception field fusion, so that the aim of helping the network to carry out multi-scale identification is fulfilled;
(23) changing the sock part of the yolo network, and replacing the PANet by BIFPN;
(24) before the network is trained, anchors of the data set are recalculated, so that the network is easier to converge and faster to converge, and IoU of the model is improved;
(25) when the network is trained, the food images input into the network are augmented, and 4 images are spliced and input into the network in a random scaling, random cutting and random arrangement mode while the color gamut of the images is changed;
(26) the CIoU Loss is used as a Loss function of the network, so that the problems that the conventional IoU Loss calculation method is slow in computational convergence and cannot be converged under certain conditions can be effectively solved, and the network convergence is faster;
(27) and setting a training hyper-parameter of the network, and training to obtain a network file and a weight file which can be used for food detection.
5. The intelligent canteen food detection method based on depth modeling and quantification techniques of claim 4, wherein in step (21), a residual convolution module similar to CSPNet is used to output feature maps of different sizes through multiple dimensionality reduction.
6. The intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 4, wherein the Neck performs feature fusion by using a three-layer BIFPN structure in step (23), and fuses four feature maps of different layers output in a backbone network.
7. The intelligent canteen food detection method based on the depth model and the quantification technology of claim 4 is characterized in that in step (27), ImageNet pre-training weight is used as initial weight, learning rate is set to be 0.0001, iteration number is set to be 20 ten thousand, batch size is set to be 128, and when the loss function converges or the maximum iteration number is reached, training is stopped to obtain a network file and a weight file which can be used for canteen food detection.
8. The intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 1, wherein the step (3) specifically comprises the following steps:
(31) the stored floating point type model is quantized into an integer model through quantization, and the size of the model is reduced;
(32) and converting the weight model into an RKNN model through model conversion, and deploying the RKNN model to the embedded mainboard AIO-3399 Pro.
9. The intelligent canteen food detection method based on depth model and quantification technology as claimed in claim 8, wherein the quantification method used in step (31) is Post training quantification, a method of training and then quantification, and the model of float64 is quantified as asymmetric _ quantified-u 8, and the calculation formula is as follows:
Figure FDA0003143407960000021
quant=cast_to_bw
wherein quant represents the quantized number, float _ num represents the floating-point type value, scale is a float32 type, zero _ points is an int32 type, representing the corresponding quantized value when the real number is 0, and finally the quant is saturated to [ range _ min, range _ max ]
rang_max=255
range_min=0
Corresponding inverse quantization:
float_hum=scale(quant-zero_point)。
10. the intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 1, wherein the step (4) comprises the following steps:
(41) sending the test image into an improved yolo backbone network to obtain a convolution characteristic diagram;
(42) processing the convolution characteristic graph through a yolo algorithm, and outputting a prediction boundary value and a classification value;
(43) and setting a threshold value, and filtering out a final detection result through non-maximum suppression.
CN202110743126.8A 2021-07-01 2021-07-01 Intelligent canteen food detection method based on depth model and quantification technology Pending CN113449654A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110743126.8A CN113449654A (en) 2021-07-01 2021-07-01 Intelligent canteen food detection method based on depth model and quantification technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110743126.8A CN113449654A (en) 2021-07-01 2021-07-01 Intelligent canteen food detection method based on depth model and quantification technology

Publications (1)

Publication Number Publication Date
CN113449654A true CN113449654A (en) 2021-09-28

Family

ID=77814657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110743126.8A Pending CN113449654A (en) 2021-07-01 2021-07-01 Intelligent canteen food detection method based on depth model and quantification technology

Country Status (1)

Country Link
CN (1) CN113449654A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114587416A (en) * 2022-03-10 2022-06-07 山东大学齐鲁医院 Gastrointestinal tract submucosal tumor diagnosis system based on deep learning multi-target detection
CN115082751A (en) * 2022-05-07 2022-09-20 长春工业大学 Improved YOLOv 4-based mobile robot target detection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114587416A (en) * 2022-03-10 2022-06-07 山东大学齐鲁医院 Gastrointestinal tract submucosal tumor diagnosis system based on deep learning multi-target detection
CN115082751A (en) * 2022-05-07 2022-09-20 长春工业大学 Improved YOLOv 4-based mobile robot target detection method

Similar Documents

Publication Publication Date Title
Sakr et al. Comparing deep learning and support vector machines for autonomous waste sorting
CN109165623B (en) Rice disease spot detection method and system based on deep learning
CN111126472A (en) Improved target detection method based on SSD
US20170169315A1 (en) Deeply learned convolutional neural networks (cnns) for object localization and classification
Lyu et al. Esnet: Edge-based segmentation network for real-time semantic segmentation in traffic scenes
CN109858569A (en) Multi-tag object detecting method, system, device based on target detection network
CN109685765B (en) X-ray film pneumonia result prediction device based on convolutional neural network
CN113449654A (en) Intelligent canteen food detection method based on depth model and quantification technology
CN109446922B (en) Real-time robust face detection method
Wang et al. Fast and precise detection of litchi fruits for yield estimation based on the improved YOLOv5 model
CN108133235B (en) Pedestrian detection method based on neural network multi-scale feature map
CN111507399A (en) Cloud recognition and model training method, device, terminal and medium based on deep learning
CN115272652A (en) Dense object image detection method based on multiple regression and adaptive focus loss
CN104063686A (en) System and method for performing interactive diagnosis on crop leaf segment disease images
CN111723666A (en) Signal identification method and device based on semi-supervised learning
CN113096085A (en) Container surface damage detection method based on two-stage convolutional neural network
Feng et al. Garbage disposal of complex background based on deep learning with limited hardware resources
CN116452966A (en) Target detection method, device and equipment for underwater image and storage medium
CN114140413A (en) Food material image detection method for optimizing small target and improving missing detection problem
Li et al. Image feature fusion method based on edge detection
Li et al. IIE-SegNet: Deep semantic segmentation network with enhanced boundary based on image information entropy
Ye et al. PlantBiCNet: a new paradigm in plant science with bi-directional cascade neural network for detection and counting
CN113128308B (en) Pedestrian detection method, device, equipment and medium in port scene
CN112837281A (en) Pin defect identification method, device and equipment based on cascade convolutional neural network
CN111339985A (en) Gesture detection method based on mixed convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination