CN113449654A

CN113449654A - Intelligent canteen food detection method based on depth model and quantification technology

Info

Publication number: CN113449654A
Application number: CN202110743126.8A
Authority: CN
Inventors: 刘宁钟; 彭耿; 林龚伟
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-07-01
Filing date: 2021-07-01
Publication date: 2021-09-28

Abstract

The invention discloses a smart canteen food detection method based on a depth model and a quantification technology, belongs to the technical field of computer vision, and reduces the requirement of model deployment on hardware while improving the accuracy and speed of current food detection. The invention comprises the following steps: firstly, collecting a large number of canteen tray food images, and marking the types and positions of food in the images; then, sending the data into a convolutional neural network designed for food detection to train until the network converges to obtain a weight file; then reducing the size of the model through a quantization technology, and deploying the model into an embedded mainboard; finally, the food in the food image can be rapidly detected through the neural network and the weight file. The invention solves the problems of low accuracy and slow reasoning speed of the existing food identification, and effectively solves the problems of difficult deployment of a depth model and higher dependence on computing power and video memory of hardware.

Description

Intelligent canteen food detection method based on depth model and quantification technology

Technical Field

The invention belongs to the technical field of computer vision, and particularly relates to an intelligent canteen food detection method based on a depth model and a quantification technology.

Background

In recent years, due to rapid development of computer vision, scenes such as intelligent transportation, unmanned supermarkets and the like are realistic, and the concept of a smart city is gradually deepened into people while people enjoy convenience. In some large-scale food trading places, such as canteens, many single trades are processed each day. According to the survey, at present, a plurality of canteens are in a self-service meal selection and manual settlement mode. Because the settlement platform is all manual operation at present, often efficiency is not high, can lead to team very long. This wastes not only customer time but also valuable human resources. Therefore, a feasible scheme is provided, and a canteen unmanned charge settlement system is designed to intelligentize the settlement process. This can not only raise the efficiency, liberate the manpower, still promoted the wisdom dining room construction in wisdom city.

At present, the canteen food identification method mainly comprises a detection method based on a convolutional neural network and an identification method based on RFID. The detection algorithm based on the convolutional neural network is divided into a one-stage detection algorithm and a two-stage detection algorithm. The two-stage algorithm is a convolutional neural network based on region suggestion, which first calculates the region (probable region) where the object may exist, and then detects the specific type and specific position of the object in the region. Another one-phase algorithm, which directly predicts the classes and positions of different objects using only the convolutional neural network CNN. In comparison, the single-stage algorithm has a significant speed advantage, but the accuracy is not as good as the two-stage algorithm in many tasks. The food identification method based on the RFID uses the radio frequency technology to identify the food. This is a very cumbersome and traditional method, and a smart canteen based on this technology needs to tag every food item with an RFID.

In comparison with the technical development, the canteen food identification based on the convolutional neural network is a better choice. In 2018, in IEEE Transactions on Multimedia, "Grab, pay, and eat: semiconductor food detection for smart restaurants", food recognition in smart restaurants was studied based on a convolutional neural network, and in their method, a picture was passed through a CNN food segmentation module and a food detection module, which are parallel modules. Background elimination and non-maxima suppression were then performed to achieve semantic food detection. Mainly based on the improvement of the one-stage algorithm YOLOv 2. In the process, a plurality of modules are needed for processing, the reasoning speed is obviously poor, meanwhile, the restaurant charging help is really used, and the accuracy rate is insufficient. An automatic purchasing scheme for Food raw materials is proposed in a paper "ADeep Transfer Learning Solution for Food Material Recognition Using Electronic Scales" in IEEE Transactions on Industrial information, journal in 2020. The type of the food raw materials is identified through the depth model, and the purpose of automatically purchasing the food raw materials by using the electronic scale is achieved by combining the weighing function of the electronic scale. The authors used deeper neural networks to improve accuracy, resulting in higher model weights, which greatly increased deployment difficulties.

Therefore, the method in the prior art mainly has the following defects: the method based on the multi-module convolutional neural network has the advantages of low model accuracy and robustness, high time consumption, large depth model weight and difficult deployment.

Disclosure of Invention

In order to solve the technical problems mentioned in the background art, the invention provides an intelligent canteen food detection method based on a depth model and a quantification technology, which solves the problems of low food identification accuracy and low reasoning speed at present, and effectively solves the problems of difficult depth model deployment and high dependence on computing power and video memory of hardware.

In order to achieve the technical purpose, the technical scheme of the invention is as follows:

an intelligent canteen food detection method based on a depth model and a quantification technology comprises the following steps:

(1) an image acquisition process: collecting a large number of canteen tray food images, and marking the types and positions of food in the images;

(2) a neural network training process: sending the image data marked in the step (1) into a convolutional neural network designed for food detection to train until the network converges to obtain a weight file;

(3) depth model quantification and deployment process: the size of the model is properly reduced through a quantization technology, and the model is deployed in an embedded mainboard;

(4) and (3) testing an image detection process: and detecting the food target in the test image by using the neural network and the weight file, and outputting a detection result.

In the above steps, the collected image is cleaned in step (1), the pictures which do not meet the requirements, such as fuzzy pictures, incomplete foods and the like, are filtered, then the dish targets in the rest images are labeled, and the types and positions of the foods in the images are labeled;

the neural network training method in the step (2) is a one-stage target detection yolo method, compared with other target detection methods, the yolo method has quite high accuracy and obvious reasoning speed advantage, and the step (2) specifically comprises the following steps:

(21) a residual error module is used in a backbone network, so that image information is effectively extracted, and meanwhile, the calculated amount is greatly reduced;

(22) the SPP module is used for carrying out multiple reception field fusion, so that the aim of helping the network to carry out multi-scale identification is fulfilled;

(23) changing the sock part of the yolo network, and replacing the PANet by BIFPN;

(24) before the network is trained, anchors of the data set are recalculated, so that the network is easier to converge and faster to converge, and IoU of the model is improved;

(25) when the network is trained, the food images input into the network are augmented, and 4 images are spliced and input into the network in a random scaling, random cutting and random arrangement mode while the color gamut of the images is changed;

(26) the CIoU Loss is used as a Loss function of the network, so that the problems that the conventional IoU Loss calculation method is slow in computational convergence and cannot be converged under certain conditions can be effectively solved, and the network convergence is faster;

(27) setting a training hyper-parameter of the network, and training to obtain a network file and a weight file which can be used for food detection;

further, a residual convolution module similar to CSPNet is used in the step (21), and feature maps with different sizes are output through multiple dimensionality reduction;

further, in the step (23), the Neck uses a three-layer BIFPN structure to perform feature fusion, and four feature graphs of different layers output in the backbone network are fused;

further, in the step (27), ImageNet pre-training weight is used as initial weight, the learning rate is set to be 0.0001, the iteration frequency is set to be 20 ten thousand, the batch size is set to be 128, and when the loss function converges or reaches the maximum iteration frequency, the training is stopped to obtain a network file and a weight file which can be used for canteen food detection;

the step (3) specifically comprises the following steps:

(31) the stored floating point type model is quantized into an integer model through quantization, and the size of the model is reduced;

(32) converting the weight model into an RKNN model through model conversion, and deploying the RKNN model to an embedded mainboard AIO-3399 Pro;

further, the quantization mode used in step (31) is Post training quantization, a training-then-quantization mode, which quantizes the model of float64 to asymmetric _ quantized-u8, and the calculation formula is as follows:

quant＝cast_to_bw

wherein quant represents the quantized number, float _ num represents the floating-point type value, scale is a float32 type, zero _ points is an int32 type, representing the corresponding quantized value when the real number is 0, and finally the quant is saturated to [ range _ min, range _ max ]

rang_max＝255

range_min＝0

Corresponding inverse quantization:

float_num＝scale(quant-zero_point)

the step (4) specifically comprises the following steps:

(41) sending the test image into an improved yolo backbone network to obtain a convolution characteristic diagram;

(42) processing the convolution characteristic graph through a yolo algorithm, and outputting a prediction boundary value and a classification value;

(43) and setting a threshold value, and filtering out a final detection result through non-maximum suppression.

Has the advantages that: the invention provides an intelligent canteen food detection method based on a depth model and a quantification technology, which reduces the calculated amount of a network and improves the detection speed by improving a backbone network of a one-stage method yolo; by improving the feature fusion module, more useful features can be extracted, and the accuracy and speed of food detection are improved; the method is combined with a quantification technology, the requirement of model deployment on hardware is reduced, the difficulty of model deployment is greatly reduced, and an actually available smart canteen charging scheme is provided, so that an important technical progress is provided for smart canteen construction of a smart city.

Drawings

FIG. 1 is an overall flow diagram of an embodiment of the present invention;

FIG. 2 is a flow chart of step 2 of an embodiment of the present invention;

FIG. 3 is a flowchart of step 3 of an embodiment of the present invention;

FIG. 4 is a flowchart of step 4 of an embodiment of the present invention;

FIGS. 5 and 6 are graphs of the results of detection in the embodiment of the present invention;

FIG. 7 is a diagram showing the detection result after amplification in the embodiment of the present invention.

Detailed Description

The invention is described in detail below with reference to the following figures and specific examples:

as shown in fig. 1, the intelligent canteen food detection method based on the depth model and the quantification technology includes the following steps:

step 1: collecting a large number of canteen tray food images, and marking the types and positions of food in the images;

step 2: sending the data into a convolutional neural network designed for food detection to train until the network converges to obtain a weight file;

and step 3: the size of the model is properly reduced through a quantization technology, and the model is deployed in an embedded mainboard;

and 4, step 4: and detecting the food target in the test image by using the neural network and the weight file, and outputting a detection result.

In this example, step 1 employs the following scheme:

and cleaning the acquired images, filtering out pictures which are fuzzy, incomplete in food and the like and do not meet requirements, labeling dish targets in the rest images, and labeling the types and positions of the food in the images.

In this embodiment, the following scheme is adopted in step 2:

firstly, the neural network method is a one-stage target detection yolo method, and compared with other target detection methods, the yolo method has quite high accuracy and obvious reasoning speed advantage.

As shown in fig. 2, the specific unfolding steps of step 2 are as follows:

step 201: the method comprises the following steps of improving a yolo network structure, and mainly improving the backbone network and the nic part structure;

further, the backbone network outputs feature maps of different sizes through multiple dimensionality reduction mainly by using a residual convolution module similar to the CSPNet in step 201. The Neck part uses a three-layer BIFPN structure to perform feature fusion and fuses feature graphs of four different layers output in a backbone network;

step 202: before the network is trained, the anchors of the data set are recalculated to replace the original anchors, so that the network is easier to converge and faster to converge, and IoU of the model is improved;

step 203: the CIoU Loss is used as a Loss function of the network, so that the problems that the conventional IoU Loss calculation method is slow in computational convergence and cannot be converged under certain conditions can be effectively solved, and the network convergence is faster;

step 204: using ImageNet pre-training weight as initial weight, setting learning rate to be 0.0001, setting iteration times to be 20 ten thousand, and setting batch size to be 128;

step 205: performing mosaic enhancement on the input image, training, and stopping training when the loss function converges or reaches the maximum iteration times to obtain a network file and a weight file which can be used for detecting the food in the canteen.

As shown in fig. 3, the specific unfolding steps of step 3 are as follows:

step 301: setting the quantization type as the asymmetric _ quantized-u8 type by using the quantization mode of Post tracking quantization;

further, the quantization calculation formula in step 301 is as follows:

quant＝cast_to_bw

rang_max＝255

range_min＝0

Corresponding inverse quantization:

float_num＝scale(quant-zero_point)

step 302: importing the original model into an RKNN Toolkit, and converting the original model into an RKNN model file which can be used by an NPU;

step 303: and deploying the converted rknn model into the embedded mainboard aio-3399 pro.

As shown in fig. 4, the specific unfolding steps of step 4 are as follows:

(401) sending the test image into a network to obtain a convolution characteristic diagram;

(402) processing the convolution characteristic graph through a yolo algorithm, and outputting a prediction boundary value and a classification value;

(403) the threshold was set to 0.5 and the final detection results were filtered out by non-maximum suppression.

Fig. 5 and 6 are images of canteen food and the detection results using the method of the present invention, respectively, which have been examined to achieve 99.6% food detection accuracy with a small number of samples and faster inference speed. Fig. 7 shows the result of detection after the image has been enlarged. The method has excellent detection effect on the augmentation graph with obvious changes in color, position and shape, and shows that the food detection model has extremely high robustness.

The above-mentioned embodiments are merely illustrative of the preferred embodiments of the present invention, and the scope of the present invention should not be limited thereto, and any modifications made on the basis of the technical solutions according to the technical ideas presented by the present invention are within the scope of the present invention.

Claims

1. An intelligent canteen food detection method based on a depth model and a quantification technology is characterized by comprising the following steps:

2. The intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 1, wherein in step (1), the collected images are cleaned to filter out photos that do not meet requirements, and then dish objects in the remaining images are labeled to mark the types and positions of food in the images.

3. The intelligent canteen food detection method based on depth model and quantification technique of claim 1, wherein the neural network training method in step (2) is a one-stage target detection yolo method.

4. The intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 1 or 3, wherein the step (2) comprises the following steps:

(27) and setting a training hyper-parameter of the network, and training to obtain a network file and a weight file which can be used for food detection.

5. The intelligent canteen food detection method based on depth modeling and quantification techniques of claim 4, wherein in step (21), a residual convolution module similar to CSPNet is used to output feature maps of different sizes through multiple dimensionality reduction.

6. The intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 4, wherein the Neck performs feature fusion by using a three-layer BIFPN structure in step (23), and fuses four feature maps of different layers output in a backbone network.

7. The intelligent canteen food detection method based on the depth model and the quantification technology of claim 4 is characterized in that in step (27), ImageNet pre-training weight is used as initial weight, learning rate is set to be 0.0001, iteration number is set to be 20 ten thousand, batch size is set to be 128, and when the loss function converges or the maximum iteration number is reached, training is stopped to obtain a network file and a weight file which can be used for canteen food detection.

8. The intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 1, wherein the step (3) specifically comprises the following steps:

(32) and converting the weight model into an RKNN model through model conversion, and deploying the RKNN model to the embedded mainboard AIO-3399 Pro.

9. The intelligent canteen food detection method based on depth model and quantification technology as claimed in claim 8, wherein the quantification method used in step (31) is Post training quantification, a method of training and then quantification, and the model of float64 is quantified as asymmetric _ quantified-u 8, and the calculation formula is as follows:

quant＝cast_to_bw

rang_max＝255

range_min＝0

Corresponding inverse quantization:

float_hum＝scale(quant-zero_point)。

10. the intelligent canteen food detection method based on the depth model and the quantification technology as claimed in claim 1, wherein the step (4) comprises the following steps: