CN110705621A

CN110705621A - Food image identification method and system based on DCNN and food calorie calculation method

Info

Publication number: CN110705621A
Application number: CN201910914054.1A
Authority: CN
Inventors: 陈庶
Original assignee: Beijing Yingpu Technology Co Ltd
Current assignee: Beijing Yingpu Technology Co Ltd
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2020-01-17

Abstract

The application discloses a food image identification method based on DCNN, which comprises the following steps: classifying the collected food initial images to obtain a training data set and a testing data set; constructing a DCNN deep convolution neural network by using a training data set, and training the DCNN deep convolution neural network to generate a training model of the food image; inputting a test sample in the test data set into a training model of a food image, and judging the type of an object in the food image to generate a test result; obtaining a recognition model of the food image according to the test result; and identifying the food type in the food image to be detected by using the identification model of the food image. The invention also discloses an identification system and a food calorie calculation method. The invention can quickly and accurately identify the type of food, so that a user can further judge the nutrition, the heat and the like contained in the food.

Description

Food image identification method and system based on DCNN and food calorie calculation method

Technical Field

The present application relates to the field of image recognition, and in particular, to a DCNN-based food image recognition method and system and a food calorie calculation method.

Background

Obesity and other health problems are increasing in real life today. Obesity has doubled over 70 countries since 1980 and may lead to other types of chronic diseases such as heart disease, diabetes, arthritis, etc. Therefore, nowadays people pay more attention to the nutritional value of food to prevent these diseases.

Diet management is a key to standardizing the dietary habits of people, and if people know the nutritional information of food being eaten, it will help people who need diet management, so in order to obtain the nutritional information about food, a recognition system of food image is needed to detect the food in the image and then analyze the nutritional, calorie information, etc. of the food.

Classifying food from images is a challenging task because the image of the same type of food may vary greatly. One of the prior art food image recognition methods is to use k-nearest neighbor algorithm and lexical tree algorithm, which classifies the categories of 42 foods by 1453 food images, and in order to measure euclidean distance, L1 norm is selected for random coordinate descent (SCD), entropy classification and fractal dimension (EFD) and Gabor-based image decomposition and fractal dimension estimation (GFD) features, euclidean distance (L2 norm) is selected for two-coordinate descent (DCD) features, and SIFT, SCD, Scale-invariant feature transformation (SIFT) functions are combined by two-coordinate descent (DCD), image-based Message digest algorithm-Scale invariant feature transformation (Message-Di texture, Scale-invariant feature transformation), and the recognition accuracy of this method is 84.2%.

There is also a method of recognizing food using an SVM classifier with PFI data set, which applies a method of SIFT and Local Binary Pattern (LBP) features, wherein the SIFT features are used to detect and describe local features in food images, but this method recognizes with low accuracy.

In addition, there is a method for classifying food images by using a spherical support vector machine, which collects a FoodLog (a health management software) data set consisting of 6512 images and classifies the food images by using an FCM algorithm (fuzzy c-Means, which is a clustering algorithm). The FCM algorithm is similar to the k-means clustering algorithm, in which a first coefficient is randomly assigned to each data point to be in a cluster, then the centroid of each cluster is calculated, and the coefficient is calculated in the cluster until convergence for each data point, and after the FCM algorithm is applied to segment the food image, the segmented image is classified using a spherical Support Vector Machine (SVM), with which the accuracy is 85%.

However, the above three methods still have the following technical problems: namely, the recognition method of the food images has slow speed of feature extraction when the recognition model is trained, and the classification of the food images by using the trained recognition model is not accurate enough and has low accuracy.

Disclosure of Invention

It is an object of the present application to overcome the above problems or to at least partially solve or mitigate the above problems.

According to an aspect of the present application, there is provided a DCNN-based food image recognition method, including:

classifying the collected food initial images, putting the classified food initial images into an initial image data set, preprocessing the food initial images in the initial image data set to generate a preprocessed image data set, and dividing the preprocessed image data set into a training data set and a test data set;

constructing a DCNN deep convolution neural network by using the training data set, and training the DCNN deep convolution neural network to generate a training model of a food image;

inputting the test sample in the test data set into the training model of the food image, and judging the type of the object in the food image to generate a test result; iteratively updating the training model of the food image according to the test result to obtain a recognition model of the food image;

and identifying the food type in the food image to be detected by using the identification model of the food image.

Optionally, the image in the initial image data set is pre-processed using ZCA whitening, the pre-processing comprising the sub-steps of:

carrying out brightness and contrast normalization processing on an initial image data set containing n food initial images, and calculating a covariance matrix of the initial image data set;

carrying out SVD singular value decomposition on the covariance matrix to obtain a characteristic vector matrixU, by calculating U^TX, obtaining a result of the initial image data set after rotation;

PCA whitening is carried out on the initial image data set based on the rotated result, and a result of the food initial image after PCA whitening is obtained;

and (4) the PCA whitened result is subjected to left multiplication by a feature vector matrix to generate a preprocessed image data set.

Optionally, the training model of the food image is generated by the following sub-steps:

constructing a DCNN deep convolutional neural network structure by using the training data set, wherein the DCNN deep convolutional neural network structure comprises seven layers;

wherein the first layer is a convolutional layer, and a first feature map of the food image in the training data set is created by using the convolutional layer;

the second layer is a maximum pooling layer, and the maximum pooling layer is utilized to extract the optimal features in the first feature map of the food images in the training data set so as to obtain a second feature map of the food images;

the third layer is an average pooling layer, and the average pooling layer is used for downsampling the second feature map of the food image to obtain a third feature map of the food image;

the fourth layer is a splicing layer, and the third feature maps of the plurality of food images are spliced by using the splicing layer to obtain a fourth feature map of the food image;

a fifth layer of conjugates with which to randomly update network parameters in the DCNN deep convolutional neural network;

the sixth layer is a full connection layer, and the features of the fourth feature map of the food image and the features of the food image in the training data set are connected by using the full connection layer to obtain joint features;

the seventh layer is a Softmax layer, and the combined features are input into the Softmax layer to obtain the food type of the food image;

and inputting the classification result of the food image into the DCNN deep convolution neural network and training to obtain a training model of the food image.

Optionally, the optimal features include vertical edge and/or horizontal edge features of the food image.

According to another aspect of the present application, there is provided a DCNN-based food image recognition system, including an acquisition module, a training model generation module, a recognition model generation module, and a recognition module:

the acquisition module performs the following operations: classifying the collected food initial images, putting the classified food initial images into an initial image data set, preprocessing the food initial images in the initial image data set to generate a preprocessed image data set, and dividing the preprocessed image data set into a training data set and a test data set;

the training model generation module performs the following operations: constructing a DCNN deep convolution neural network by using the training data set, and training the DCNN deep convolution neural network to generate a training model of a food image;

the recognition model generation module performs the following operations: inputting the test sample in the test data set into the training model of the food image, and judging the type of the object in the food image to generate a test result; iteratively updating the training model of the food image according to the test result to obtain a recognition model of the food image;

the identification module performs the following operations: and identifying the food type in the food image to be detected by using the identification model of the food image.

Optionally, the acquisition module includes a preprocessing module; the preprocessing module preprocesses the image in the initial image dataset by using ZCA whitening; the preprocessing module performs the following operations:

SVD singular value decomposition is carried out on the covariance matrix to obtain a characteristic vector matrix U, and the characteristic vector matrix U is calculated^TX-derived initial image dataset is rotatedThe latter result;

Optionally, the training model generating module includes a building module and an input module;

the building module builds a DCNN deep convolutional neural network structure by using the training data set, wherein the DCNN deep convolutional neural network structure comprises seven layers;

and the input module inputs the classification result of the food image into the DCNN deep convolution neural network and trains the classification result to obtain a training model of the food image.

According to another aspect of the application, a food calorie calculation method is provided, which includes the identification method as described above, and according to the food type in the identified food image, the calorie table and the food density table corresponding to the food are compared with the food type to obtain the calorie contained in the food.

According to another aspect of the application, a computer electronic device is provided, comprising a memory, a processor and a computer program stored in said memory and executable by said processor, the computer program being stored in a space in the memory for program code, the computer program realizing the steps for performing any of the identification method steps according to the invention when executed by the processor.

According to another aspect of the application, a computer-readable storage medium is provided, comprising a storage unit for program code, the storage unit being provided with a program for performing the steps of the method according to the invention, the program being executed by a processor.

According to another aspect of the application, a computer program product comprising instructions for causing a computer to perform the steps of the identification method according to the invention when the computer program product is run on a computer is provided.

According to the embodiment, the initial images of the food are collected and classified, the DCNN deep neural network is constructed according to the classified images, the network is trained to finally obtain the identification model of the food images, and the type of the food can be quickly and accurately identified through the identification model, so that the user can further judge the nutrition, the heat and the like contained in the food.

The above and other objects, advantages and features of the present application will become more apparent to those skilled in the art from the following detailed description of specific embodiments thereof, taken in conjunction with the accompanying drawings.

Drawings

Some specific embodiments of the present application will be described in detail hereinafter by way of illustration and not limitation with reference to the accompanying drawings. The same reference numbers in the drawings identify the same or similar elements or components. Those skilled in the art will appreciate that the drawings are not necessarily drawn to scale. In the drawings:

fig. 1 is a schematic flowchart of a DCNN-based food image recognition method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a DCNN-based food image recognition system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a computing device according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a computer-readable storage medium according to an embodiment of the application.

Detailed Description

One embodiment of the present invention utilizes a Deep Convolutional Neural Network (DCNN) to identify food images. Since the recognition of the food image is a fine-grained visual recognition, it is a relatively difficult problem compared to conventional image recognition. In order to solve the problem, the embodiment identifies the type of the food image based on the DCNN deep convolutional neural network, and the method can accurately identify the food image; moreover, the DCNN deep neural network-based food image method is very suitable for large-scale image data, because it takes only 0.03 second to classify one food photo by using a GPU (Graphics Processing Unit), and thus the recognition efficiency is very high.

Fig. 1 is a schematic flowchart of a DCNN-based food image recognition method according to an embodiment of the present application. As can be seen from fig. 1, the DCNN-based food image recognition method provided in the embodiment of the present application may include the following steps:

s100, classifying the acquired food initial images, putting the classified food initial images into an initial image data set, preprocessing the food initial images in the initial image data set to generate a preprocessed image data set, and dividing the preprocessed image data set into a training data set and a testing data set;

specifically, in the present embodiment, the samples of the training data set and the test data set may be divided according to a ratio of 7:3, and it is understood that in other embodiments, the samples may be divided into other ratios.

Optionally, the image in the initial image data set is preprocessed by ZCA whitening, and the specific preprocessing method includes the following sub-steps S110 to S140:

s110, carrying out brightness and contrast normalization processing on an initial image data set containing n food initial images, and calculating a covariance matrix of the initial image data set;

s120, carrying out SVD singular value decomposition on the covariance matrix to obtain a characteristic vector matrix U, and calculating U^TX, obtaining a result of the initial image data set after rotation;

s130, PCA whitening is carried out on the initial image data set based on the rotated result to obtain a result of the food initial image after PCA whitening;

and S140, the PCA whitened result is subjected to left multiplication by a feature vector matrix to generate a preprocessed image data set.

Preferably, in this embodiment, before the ZCA whitening, the method further includes inputting a first initial value to the initial image data set, and setting the first initial value to 0; and setting the initial value of each sample in the initial image dataset to 0 as well; partitioning the input by a standard deviation of an initial image dataset; specifically, the size of the acquired food initial image can be preprocessed to be 299(mm) × 3(mm) so as to increase the aging of the preprocessing; it will be appreciated that the size of the pre-processed food image is also suitable for the inclusion v3 network model.

In this embodiment, the food image in the generated preprocessed image dataset may be randomly rotated within a range of 0 to 180 degrees, for example, the food initial image may be subjected to operations such as horizontal random movement, vertical random movement, or random inversion, and the operations are favorable for the post-stage DCNN deep convolutional neural network model to generate an insensitivity characteristic to the precise position of the object in the food image, that is, the position of the identified object in the food image is consistent with the position of the object in the food image;

moreover, the ZCA whitening preprocessing method is used for reducing the redundancy of the pixels of the food initial image in the image matrix, and highlighting the structure and the characteristics from the food initial image to the DCNN deep convolution neural network, so that the training time of the training model of the food image to the training samples in the training data set in the embodiment can be shortened, and the efficiency of the whole recognition system is improved.

S200, constructing a DCNN deep convolutional neural network by using the training data set, and training the DCNN deep convolutional neural network to generate a training model of a food image;

the DCNN (deep Convolutional Neural network) deep Convolutional Neural network model is a mainstream method for image classification and recognition, generates a classification result by simulating a human visual system, and fuses feature extraction and image classification.

Wherein the step S200 includes the following substeps:

s210, constructing a DCNN deep convolution neural network structure, wherein the DCNN deep convolution neural network structure comprises seven layers;

wherein the first layer is a convolutional layer, and a characteristic diagram of the food image in the training data set is created by using the convolutional layer; specifically, a food image with a size of 299(mm) × 3(mm) in a training data set is input into the convolutional layer, so as to create a first feature map of the food image in the training data set;

wherein the maximum pooling operation performed by using the maximum pooling layer is a discretization process based on the food image in the training data set, and the maximum pooling operation is performed by applying a maximum filter to non-overlapping sub-areas of the input matrix, and the optimal feature in the first feature map of the food image in the training data set, such as the vertical edge and/or the horizontal edge of the food image, can be extracted by using the maximum pooling layer (Max-pooling);

the third layer is an average pooling layer, and the average pooling layer is used for downsampling the second feature map of the food image to obtain a third feature map of the food image, wherein the third feature map is formed by reducing the dimension of the second feature map;

dividing a second feature map of a food image in an input training data set into a plurality of rectangular pool areas, calculating an average value of each rectangular pool area to perform downsampling on the input second feature map, and reducing the variance and complexity of each data in the second feature map by using the average pooling layer;

the fourth layer is a splicing (concat) layer, and the third feature maps of the food images are spliced by using the splicing layer to obtain a fourth feature map of the food image;

because for the DCNN deep convolutional neural network, when the number of iterations is increased, the DCNN deep convolutional neural network may fit the training data set well, but fit the test data set poorly, the ability of generalization of the DCNN deep convolutional neural network may be increased when the network parameters in the DCNN deep convolutional neural network are randomly updated by using the conjugate layer;

the seventh layer is a Softmax layer, and the combination characteristics are input into the Softmax layer to obtain the food type of the food image.

S220, inputting the classification result of the food image into the DCNN deep convolution neural network structure and training to obtain a training model of the food image.

S300, inputting the test sample in the test data set into the training model of the food image, and judging the type of the object in the food image to generate a test result; iteratively updating the training model of the food image according to the test result to obtain a recognition model of the food image;

that is, the food image of the test data set in the present embodiment is input to the training model of the food image, and it is determined whether the food image of the test data set matches the training sample of the training model of the food image, and if so, the type of the food image of the test sample is determined according to the type of the food image of the training sample.

And S400, identifying the food type in the food image to be detected by using the identification model of the food image.

Based on the same inventive concept, as shown in fig. 2, an embodiment of the present application further provides a DCNN-based food image recognition system, which includes an acquisition module, a training model generation module, a recognition model generation module, and a recognition module:

SVD singular value decomposition is carried out on the covariance matrix to obtain a characteristic vector matrix U, and the characteristic vector matrix U is calculated^TX, obtaining a result of the initial image data set after rotation;

The identification system provided in this embodiment may perform the method provided in any one of the DCNN-based food image identification methods, and the detailed process is described in the method embodiment and is not repeated herein.

The embodiment of the application also provides a calculation method for obtaining the food calorie, which comprises the step of comparing the food type with a heat meter and a food density meter corresponding to the food according to the identification method of the food image based on the DCNN provided by the embodiment, so as to obtain the calorie contained in the food, and the method is favorable for ensuring the nutrition and health of a user.

An embodiment of the present application further provides a computing device, referring to fig. 3, comprising a memory 520, a processor 510 and a computer program stored in said memory 520 and executable by said processor 510, the computer program being stored in a space 530 for program code in the memory 520, the computer program, when executed by the processor 510, implementing the steps 531 for performing any of the identification methods according to the present invention.

The embodiment of the application also provides a computer readable storage medium. Referring to fig. 4, the computer readable storage medium comprises a storage unit for program code provided with a program 531' for performing the steps of the identification method according to the invention, which program is executed by a processor.

The embodiment of the application also provides a computer program product containing instructions. Which, when run on a computer, causes the computer to carry out the steps of the method according to the invention.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed by a computer, cause the computer to perform, in whole or in part, the procedures or functions described in accordance with the embodiments of the application. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by a program, and the program may be stored in a computer-readable storage medium, where the storage medium is a non-transitory medium, such as a random access memory, a read only memory, a flash memory, a hard disk, a solid state disk, a magnetic tape (magnetic tape), a floppy disk (floppy disk), an optical disk (optical disk), and any combination thereof.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A food image identification method based on DCNN, the identification method comprises:

2. The identification method according to claim 1,

pre-processing an image in the initial image data set using ZCA whitening, the pre-processing comprising the sub-steps of:

3. The recognition method according to claim 1 or 2, characterized in that the training model of the food image is generated by the following sub-steps:

4. The identification method of claim 3, wherein the optimal features include vertical edge and/or horizontal edge features of the food image.

5. A food image recognition system based on DCNN comprises an acquisition module, a training model generation module, a recognition model generation module and a recognition module:

6. The identification system of claim 1, wherein the acquisition module comprises a pre-processing module; the preprocessing module preprocesses the image in the initial image dataset by using ZCA whitening; the preprocessing module performs the following operations:

7. The recognition system of claim 5 or 6, wherein the training model generation module comprises a construction module and an input module;

8. The identification system of claim 7, wherein the optimal features include vertical edge and/or horizontal edge features of the food image.

9. A method of calculating the calorie of a food, the method being as defined in any one of claims 1 to 4, wherein the calorie contained in the food is obtained by comparing the food type with a calorie table and a food density table corresponding to the food, based on the food type in the recognized food image.

10. A computing device comprising a memory, a processor and a computer program stored in the memory and executable by the processor, the computer program being stored in a space in the memory for program code, the computer program, when executed by the processor, implementing a method for performing the identification method of any one of claims 1-4.