CN111523483A - Chinese food dish image identification method and device - Google Patents
Chinese food dish image identification method and device Download PDFInfo
- Publication number
- CN111523483A CN111523483A CN202010334520.1A CN202010334520A CN111523483A CN 111523483 A CN111523483 A CN 111523483A CN 202010334520 A CN202010334520 A CN 202010334520A CN 111523483 A CN111523483 A CN 111523483A
- Authority
- CN
- China
- Prior art keywords
- chinese food
- image
- layer
- model
- chinese
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
- G06V20/36—Indoor scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/68—Food, e.g. fruit or vegetables
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the invention provides a Chinese food image identification method and a device, wherein the method comprises the following steps: acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image; inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result; the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1. The embodiment of the invention can accurately detect and identify various Chinese food, and has wide identification types and high identification accuracy.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for identifying Chinese food dish images.
Background
With the rapid development of deep learning algorithm, computer vision becomes the field with fastest artificial intelligence development and widest landing, and has been widely applied to various aspects in people's life, wherein food recognition is a new topic of great concern in the field of computer vision at present.
At present, many recognition algorithm researches aiming at western food and Japanese dishes exist, but the researches on a relatively mature method for recognizing Chinese dish images are not much, not only because the classification data set of the large-scale Chinese dish is few, but also the Chinese dish is more difficult to recognize compared with the western food or the Japanese dish, and the Chinese dish of the same category can present various different forms. Meanwhile, the images of the Chinese dishes are influenced by background noises such as the color, light and shade of the dinner plate and the like; in addition, different Chinese dishes may look similar.
For these reasons, the existing technologies capable of accurately identifying the Chinese food are very limited, and these situations all increase the difficulty of accurately identifying the Chinese food images. Therefore, a method for accurately detecting and identifying Chinese food is needed.
Disclosure of Invention
In order to solve or at least partially solve the above problem, embodiments of the present invention provide a method and an apparatus for identifying images of Chinese food dishes.
In a first aspect, an embodiment of the present invention provides a method for identifying an image of a Chinese food item, including:
acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image;
inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result;
the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
Optionally, the step of inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a recognition result specifically includes:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic mapping chart through the operations of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first feature mapping chart into a maximum pooling layer of the Chinese food image identification model to obtain a second feature mapping chart;
inputting the second feature mapping chart into a first dense connecting block of the Chinese food image identification model, and then obtaining a third feature mapping chart through the operation of a first transition layer;
inputting the third feature mapping chart into a second dense connecting block of the Chinese food image identification model, and then obtaining a fourth feature mapping chart through operation of a second transition layer;
inputting the fourth feature mapping chart into a third dense connecting block of the Chinese food image identification model, and then obtaining a fifth feature mapping chart through the operation of a third transition layer;
inputting the fifth feature mapping map into a fourth dense connecting block of the Chinese food image identification model, and then obtaining a sixth feature mapping map through the operation of a fourth transition layer;
and inputting the sixth feature mapping chart into a second batch normalization layer of the Chinese food image identification model, and then obtaining a Chinese food identification result through the operation of a full connection layer and a classifier.
Optionally, the first densely connected block, the second densely connected block, the third densely connected block, and the fourth densely connected block each include a plurality of densely connected bottleneck layers, each of the bottleneck layers has a complex function including a plurality of operations, and the plurality of operations include: batch normalized BN, ReLU activation function, and 3 × 3 convolution.
Optionally, the plurality of operations further comprises: 1 × 1 convolution.
Optionally, the first transition layer, the second transition layer, the third transition layer and the fourth transition layer each perform the following operations: batch normalized BN, ReLU activation function, 1 × 1 convolution and 2 × 2 average pooling, step size of 2.
Optionally, before the step of obtaining the target chinese food item image and performing the preprocessing operation on the target chinese food item image, the method further includes:
constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first dense connecting block, a first transition layer, a second dense connecting block, a second transition layer, a third dense connecting block, a third transition layer, a fourth dense connecting block, a second batch normalization layer, a full connecting layer and a classifier which are sequentially connected;
acquiring a Chinese food image sample, and preprocessing the Chinese food image sample;
inputting the preprocessed Chinese food image samples into the DenseNet model to obtain an output result;
calculating a loss function value by using a cross entropy loss function based on the output result and the Chinese food category label corresponding to the Chinese food image sample;
adjusting, based on an Adam optimization algorithm, respective parameters of the densely-connected convolutional neural network from an output layer of the DenseNet model so as to move the loss function value toward a minimization direction;
and judging whether the training end condition is met, if so, saving the parameters of the current iteration DenseNet model, and obtaining a Chinese food image recognition model after training.
Optionally, a preprocessing operation is performed on the target Chinese food image, specifically:
carrying out random center rotation on the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food dish image subjected to random center rotation according to a preset length-width ratio;
horizontally overturning the randomly cut target Chinese food dish image according to a preset probability;
and normalizing the horizontally overturned target Chinese food dish image.
In a second aspect, an embodiment of the present invention provides an image recognition apparatus for a Chinese meal dish, including:
the preprocessing module is used for acquiring a target Chinese food dish image and executing preprocessing operation on the target Chinese food dish image;
the identification module is used for inputting the preprocessed target Chinese food dish image into a Chinese food dish image identification model to obtain a Chinese food dish identification result;
the Chinese food image recognition model is obtained based on preprocessed Chinese food image sample training, the Chinese food image recognition model is constructed based on a DenseNet model, and a network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the image recognition method for Chinese food items as provided in the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the image recognition method for Chinese food items as provided in the first aspect.
The Chinese food image identification method and device provided by the embodiment of the invention can accurately detect and identify various Chinese foods, and have the advantages of wide identification types and high identification accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for identifying images of Chinese food dishes according to an embodiment of the present invention;
fig. 2 is a schematic network structure diagram of a Chinese food image recognition model according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a dense connecting block dense block;
FIG. 4 is a schematic diagram of a bottleneck layer;
FIG. 5 is a schematic structural diagram of an image recognition apparatus for Chinese food dishes according to an embodiment of the present invention
Fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for identifying images of Chinese food dishes according to an embodiment of the present invention, including:
specifically, in the embodiment of the invention, a camera with a fixed position is adopted to acquire a single target Chinese food image, and then preprocessing operation is performed on the target Chinese food image, wherein the preprocessing operation comprises data enhancement operation. Common basic data enhancement operations include the following: rotation, translation, scaling, random shielding, horizontal turning, color difference, noise disturbance and the like, and some data enhancement methods can be selected to perform preprocessing operation on the target Chinese food image.
specifically, in the embodiment of the present invention, the target Chinese food dish image obtained through the preprocessing operation is input into a pre-trained Chinese food dish image recognition model, so that a Chinese food dish recognition result can be obtained.
The Chinese food image recognition model is obtained based on preprocessed Chinese food image samples and corresponding Chinese food category labels through training.
Compared with a common food image, the Chinese food image generally does not show unique spatial layout and obvious semantic features like most western food, and semantic information of the Chinese food image is more difficult to extract. Therefore, in the embodiment of the invention, the Chinese food image recognition model is constructed based on the DenseNet model, because the DenseNet model does not simply obtain the characterization capability through a very deep or very wide network, but combines and connects the features of different layers by repeatedly using the features from the lower layer to the higher layer, thereby increasing the diversity of the input of the later layer and realizing the extreme utilization of the image features. And compared with other networks, the DenseNet model has fewer parameters, prevents gradient disappearance, reduces overfitting on a small sample data set, and is simpler and more efficient.
Further, based on the DenseNet network model, the network structure of the Chinese food image recognition model comprises: n dense connection blocks for implementing feature multiplexing and N-1 transition layers for compressing the number of parameters.
Different from other convolutional neural networks, the invention realizes feature multiplexing by applying a dense connection mode, utilizes the image features to the utmost extent, can better extract image semantic information and realizes precise identification with higher probability. The dense connecting block is used for relieving gradient disappearance, reducing training parameters, resisting overfitting and realizing feature multiplexing, and the transition layer is used for compressing the number of parameters and reducing the problem of model complication caused by introducing the dense connecting block.
The Chinese food dish image identification method provided by the embodiment of the invention can accurately detect and identify various Chinese foods, and has the advantages of wide identification types and high identification accuracy.
Based on the content of the above embodiment, the step of inputting the preprocessed target chinese food dish image into a chinese food dish image recognition model to obtain a recognition result specifically includes:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic mapping chart through the operations of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first feature mapping chart into a maximum pooling layer of the Chinese food image identification model to obtain a second feature mapping chart;
inputting the second feature mapping chart into a first dense connecting block of the Chinese food image identification model, and then obtaining a third feature mapping chart through the operation of a first transition layer;
inputting the third feature mapping chart into a second dense connecting block of the Chinese food image identification model, and then obtaining a fourth feature mapping chart through operation of a second transition layer;
inputting the fourth feature mapping chart into a third dense connecting block of the Chinese food image identification model, and then obtaining a fifth feature mapping chart through the operation of a third transition layer;
inputting the fifth feature mapping map into a fourth dense connecting block of the Chinese food image identification model, and then obtaining a sixth feature mapping map through the operation of a fourth transition layer;
and inputting the sixth feature mapping chart into a second batch normalization layer of the Chinese food image identification model, and then obtaining a Chinese food identification result through the operation of a full connection layer and a classifier.
Fig. 2 is a schematic network structure diagram of a chinese food image recognition model according to an embodiment of the present invention, where the chinese food image recognition model includes a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first dense connection block, a first transition layer, a second dense connection block, a second transition layer, a third dense connection block, a third transition layer, a fourth dense connection block, a second batch normalization layer, a full connection layer, and a classifier, which are connected in sequence.
Specifically, a target Chinese food image is preprocessed and then input into a Chinese food image recognition model, dimension reduction is achieved after convolution operation of a first convolution layer, BN operation of a first batch normalization layer and RELU activation function operation of an excitation layer, a first feature map is obtained, then the first feature map is input into a maximum pooling layer, the maximum pooling layer is used for down-sampling the feature map, unnecessary redundant information in the map is removed, a second feature map is obtained, the second feature map sequentially passes through four dense connecting block dense connecting blocks, and a transition layer is arranged between every two dense block layers.
In a specific embodiment, the convolution, BN and ReLU operations in fig. 2 are performed on the target Chinese meal image with a pixel of 224 × 224 in sequence to implement dimension reduction, and a first feature map with a pixel of 112 × 112 is obtained. The first feature map is then input into the max pooling layer, which is convolved 3 x 3 with a step size of 2. A second feature map with 56 x 56 pixels is obtained as input to the first densely connected block.
Fig. 3 is a schematic structural diagram of a dense connecting block, and one layer in the dense connecting block is called a bottleneck bottleeck layer. The reason why DenseNet is preferred over other convolutional neural networks is the densely connected block dense. With dense block, DenseNet has the advantages of relieving gradient disappearance, reducing parameters, resisting overfitting, reusing characteristics and the like.
Suppose a dense block has l layers, x0Is the input of the dense block. Each layer has a complex function H comprising three operationsl(r.) three operations are respectively convolution of BN, ReLU and 3 × 3 DenseBlock, DenseNet proposes a method of convolution with DenseBlock to better improve information transfer between DenseBlockMany different connection methods: and (4) densely connecting. Dense connection is to connect each layer in a dense block with all the following layers to realize feature multiplexing, as shown in fig. 3. Thus, the l-th layer maps the features x of all previous layers0,...,xl-1As inputs:
xl=Hl([x0,x1,...,xl-1])
wherein, [ x ]0,x1,...,xl-1]The feature map representing the output of the 0 th, 1 th and l-1 th layers is combined and connected to be used as the input of the l-th layer.
Optionally, each of the bottleneck layers has a complex function including a plurality of operations, the plurality of operations including: batch normalized BN, ReLU activation function, and 3 × 3 convolution.
Fig. 4 is a schematic diagram of a bottleneck layer structure. Considering that the number of feature maps will be large after dense connections are used, adding a 1 × 1 convolution before the 3 × 3 convolution of the bottleeck layer can reduce the amount of computation in order to reduce the number of feature maps and reduce the dimensionality of each feature map.
Further, the first, second, third and fourth transition layers each perform the following operations: batch normalized BN, ReLU activation function, 1 × 1 convolution and 2 × 2 average pooling, step size of 2. The method has the advantages that the number of parameters is further reduced, the dimension and the channel number of the output feature mapping chart of each dense block are increased sharply, the problem that the channel number of the feature mapping chart is too large can be solved by performing dimension reduction and average pooling on the feature mapping chart through convolution operation of the transition layer, and therefore the problem that the model is complicated after too many dense blocks is solved.
If m feature maps are generated through a dense block, theta m feature maps are generated after a transition layer, wherein theta is a compression coefficient, and theta is greater than 0 and less than or equal to 1. When θ is 1, the number of feature maps passing through the transition layer is unchanged. In the embodiment of the invention, the value theta is set to be 0.5, and the number of the feature maps after passing through the transition layer is reduced by half.
In a specific embodiment, the pixels of the feature map after passing through the four dense blocks are 56 × 56, 28 × 28, 14 × 14, and 7 × 7, respectively. The output of the fully connected layer is set to the total number of categories of the meal items using the BN and softmax classifiers after the last dense block.
Before the trained Chinese food image recognition model is used for recognizing the target Chinese food image, the Chinese food image recognition model needs to be trained.
Based on the content of the above embodiment, before the step of obtaining the target chinese food item image and performing the preprocessing operation on the target chinese food item image, the method further includes:
200, constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first intensive connection block, a first transition layer, a second intensive connection block, a second transition layer, a third intensive connection block, a third transition layer, a fourth intensive connection block, a second batch normalization layer, a full connection layer and a classifier which are sequentially connected;
specifically, the densnet model in this example is a modified densnet 169 model, having a network structure as shown in fig. 3.
Step 201, obtaining a Chinese food image sample, and preprocessing the Chinese food image sample;
the purpose of the pre-processing is to achieve image enhancement.
Step 202, inputting the preprocessed Chinese food image samples into the DenseNet model to obtain an output result;
step 203, calculating a loss function value by using a cross entropy loss function based on the output result and the Chinese food category label corresponding to the Chinese food image sample;
and the loss function adopts a cross entropy model to accelerate the convergence speed and the updating speed of the weight matrix.
Step 204, based on Adam optimization algorithm, starting from the output layer of the DenseNet model, adjusting each parameter of the dense connection type convolutional neural network so as to move the loss function value towards the minimization direction;
an optimizer in the training model adopts an Adam algorithm, so that the self-adaptive learning rate is realized, the training speed is increased, and the robustness of the network is enhanced.
And step 205, judging whether a training end condition is met, if so, saving parameters of the currently iterated DenseNet model, and obtaining a Chinese food image recognition model after training.
Specifically, a camera with a fixed position is used for collecting a plurality of single dish images, storing the single dish images into a database, adding a category label for each image, if the database does not have the image of the category, adding the category label for the image of the category to create a new category, and dividing the database into a training set and a testing set according to a proportion. During training, in order to make the model have better classification performance on a data set, the network parameters are adjusted as follows: epoch is set to 150; batch size 64; the optimizer selects Adam, can provide self-adaptive learning rate, the initial learning rate is 1e-4, the training speed is greatly improved, and the robustness of the network is enhanced; because the invention aims at the classification problem, the loss function adopts the cross entropy model, so that the learning rate can be accelerated when the convergence effect of the model is poor, and the learning rate is slowed when the effect of the model is good. And after 150 epochs pass through the DenseNet169, taking the optimal model as the Chinese food image recognition model after the final training. After training, testing can be carried out, and during testing, the test set is input into the optimal model for testing, so that a test result can be obtained.
The Chinese food dish image identification method provided by the embodiment of the invention fully utilizes the advantage that the DenseNet dense connection mode realizes feature multiplexing, and the adjustment of the network super-parameters not only greatly reduces the number of training parameters and the redundancy of the training network, but also enables the dish image features to be utilized extremely, is beneficial to capturing the semantic information of the dish image, and can obtain a training model with high identification accuracy and excellent performance through multiple iterative training. Because of the strong generalization ability of DenseNet, the invention is not only suitable for identifying Chinese meals with high difficulty, but also can be applied to identifying more foods in principle only through training of other food data sets.
Based on the content of the above embodiment, the preprocessing operation is performed on the target Chinese meal dish image, specifically:
carrying out random center rotation on the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food dish image subjected to random center rotation according to a preset length-width ratio;
horizontally overturning the randomly cut target Chinese food dish image according to a preset probability;
and normalizing the horizontally overturned target Chinese food dish image.
Specifically, the target Chinese food image is subjected to random center rotation according to a preset angle, such as-10 degrees to 10 degrees;
randomly cutting the target Chinese food image with the random center rotated according to a preset length-width ratio, for example, a length-width ratio of 224 × 224;
horizontally turning the target Chinese food dish image which is cut randomly according to a preset probability, such as the probability of 0.5;
and normalizing the horizontally overturned target Chinese food dish image to eliminate dimensional influence among data characteristics.
The preprocessing operation steps provided by the embodiment of the invention are beneficial to obtaining accurate training models and Chinese food dish identification results.
Fig. 5 is a schematic structural diagram of a device for recognizing images of chinese food items according to an embodiment of the present invention, including: a pre-processing module 510 and a recognition module 520, wherein,
the preprocessing module 510 is configured to obtain an image of a target Chinese food dish, and perform a preprocessing operation on the image of the target Chinese food dish;
the identification module 520 is used for inputting the preprocessed target Chinese food dish image into a Chinese food dish image identification model to obtain a Chinese food dish identification result;
the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
The Chinese food image identification device provided by the embodiment of the invention is used for realizing the Chinese food image identification method embodiment, so that the understanding of each functional module in the embodiment of the invention can refer to the method embodiment, and the details are not repeated herein.
The Chinese food dish image recognition device provided by the embodiment of the invention can accurately detect and recognize various Chinese foods, and has the advantages of wide recognition types and high recognition accuracy.
Fig. 6 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke a computer program stored on the memory 630 and operable on the processor 610 to perform the Chinese food image recognition methods provided by the above-described method embodiments, including, for example: acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image; inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result; the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for identifying images of chinese food items provided in the foregoing method embodiments, for example, the method includes: acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image; inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result; the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A Chinese food dish image identification method is characterized by comprising the following steps:
acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image;
inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result;
the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
2. The method for recognizing Chinese food dish images according to claim 1, wherein the step of inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a recognition result specifically comprises:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic mapping chart through the operations of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first feature mapping chart into a maximum pooling layer of the Chinese food image identification model to obtain a second feature mapping chart;
inputting the second feature mapping chart into a first dense connecting block of the Chinese food image identification model, and then obtaining a third feature mapping chart through the operation of a first transition layer;
inputting the third feature mapping chart into a second dense connecting block of the Chinese food image identification model, and then obtaining a fourth feature mapping chart through operation of a second transition layer;
inputting the fourth feature mapping chart into a third dense connecting block of the Chinese food image identification model, and then obtaining a fifth feature mapping chart through the operation of a third transition layer;
inputting the fifth feature mapping map into a fourth dense connecting block of the Chinese food image identification model, and then obtaining a sixth feature mapping map through the operation of a fourth transition layer;
and inputting the sixth feature mapping chart into a second batch normalization layer of the Chinese food image identification model, and then obtaining a Chinese food identification result through the operation of a full connection layer and a classifier.
3. The method of claim 2, wherein the first, second, third and fourth densely connected blocks each comprise a plurality of densely connected bottleneck layers, each of the bottleneck layers having a complex function comprising a plurality of operations, the plurality of operations comprising: batch normalized BN, ReLU activation function, and 3 × 3 convolution.
4. The method of claim 3, wherein the plurality of operations further comprise: 1 × 1 convolution.
5. The Chinese meal image identification method according to claim 2, wherein the first transition layer, the second transition layer, the third transition layer and the fourth transition layer each perform the following operations: batch normalized BN, ReLU activation function, 1 × 1 convolution and 2 × 2 average pooling, step size of 2.
6. The method for identifying Chinese food dish image according to claim 1, wherein before the step of obtaining the target Chinese food dish image and performing preprocessing operation on the target Chinese food dish image, the method further comprises:
constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first dense connecting block, a first transition layer, a second dense connecting block, a second transition layer, a third dense connecting block, a third transition layer, a fourth dense connecting block, a second batch normalization layer, a full connecting layer and a classifier which are sequentially connected;
acquiring a Chinese food image sample, and preprocessing the Chinese food image sample;
inputting the preprocessed Chinese food image samples into the DenseNet model to obtain an output result;
calculating a loss function value by using a cross entropy loss function based on the output result and the Chinese food category label corresponding to the Chinese food image sample;
adjusting, based on an Adam optimization algorithm, respective parameters of the densely-connected convolutional neural network from an output layer of the DenseNet model so as to move the loss function value toward a minimization direction;
and judging whether the training end condition is met, if so, saving the parameters of the current iteration DenseNet model, and obtaining a Chinese food image recognition model after training.
7. The Chinese food dish image identification method according to claim 1, wherein the preprocessing operation is performed on the target Chinese food dish image, and specifically comprises:
carrying out random center rotation on the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food dish image subjected to random center rotation according to a preset length-width ratio;
horizontally overturning the randomly cut target Chinese food dish image according to a preset probability;
and normalizing the horizontally overturned target Chinese food dish image.
8. An image recognition device for Chinese food dishes, comprising:
the preprocessing module is used for acquiring a target Chinese food dish image and executing preprocessing operation on the target Chinese food dish image;
the identification module is used for inputting the preprocessed target Chinese food dish image into a Chinese food dish image identification model to obtain a Chinese food dish identification result;
the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method for image recognition of Chinese food items according to any one of claims 1 to 7 are implemented when the program is executed by the processor.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for image recognition of chinese meal dishes according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010334520.1A CN111523483B (en) | 2020-04-24 | 2020-04-24 | Chinese meal dish image recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010334520.1A CN111523483B (en) | 2020-04-24 | 2020-04-24 | Chinese meal dish image recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111523483A true CN111523483A (en) | 2020-08-11 |
CN111523483B CN111523483B (en) | 2023-10-03 |
Family
ID=71904579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010334520.1A Active CN111523483B (en) | 2020-04-24 | 2020-04-24 | Chinese meal dish image recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111523483B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112115906A (en) * | 2020-09-25 | 2020-12-22 | 广州市派客朴食信息科技有限责任公司 | Open dish identification method based on deep learning target detection and metric learning |
CN112115903A (en) * | 2020-09-25 | 2020-12-22 | 广州市派客朴食信息科技有限责任公司 | Method for improving dish identification system identification precision based on deep learning |
CN113033706A (en) * | 2021-04-23 | 2021-06-25 | 广西师范大学 | Multi-source two-stage dish identification method based on visual target detection and re-identification |
CN117975445A (en) * | 2024-03-29 | 2024-05-03 | 江南大学 | Food identification method, system, equipment and medium |
CN117975445B (en) * | 2024-03-29 | 2024-05-31 | 江南大学 | Food identification method, system, equipment and medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491765A (en) * | 2018-03-05 | 2018-09-04 | 中国农业大学 | A kind of classifying identification method and system of vegetables image |
CN109620152A (en) * | 2018-12-16 | 2019-04-16 | 北京工业大学 | A kind of electrocardiosignal classification method based on MutiFacolLoss-Densenet |
CN109949824A (en) * | 2019-01-24 | 2019-06-28 | 江南大学 | City sound event classification method based on N-DenseNet and higher-dimension mfcc feature |
CN110097564A (en) * | 2019-04-04 | 2019-08-06 | 平安科技(深圳)有限公司 | Image labeling method, device, computer equipment and storage medium based on multi-model fusion |
CN110176002A (en) * | 2019-06-05 | 2019-08-27 | 深圳大学 | A kind of the lesion detection method and terminal device of radioscopic image |
CN110472668A (en) * | 2019-07-22 | 2019-11-19 | 华北电力大学(保定) | A kind of image classification method |
CN110689085A (en) * | 2019-09-30 | 2020-01-14 | 天津大学 | Garbage classification method based on deep cross-connection network and loss function design |
CN110766063A (en) * | 2019-10-17 | 2020-02-07 | 南京信息工程大学 | Image classification method based on compressed excitation and tightly-connected convolutional neural network |
CN110942105A (en) * | 2019-12-13 | 2020-03-31 | 东华大学 | Mixed pooling method based on maximum pooling and average pooling |
WO2020073951A1 (en) * | 2018-10-10 | 2020-04-16 | 腾讯科技(深圳)有限公司 | Method and apparatus for training image recognition model, network device, and storage medium |
-
2020
- 2020-04-24 CN CN202010334520.1A patent/CN111523483B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108491765A (en) * | 2018-03-05 | 2018-09-04 | 中国农业大学 | A kind of classifying identification method and system of vegetables image |
WO2020073951A1 (en) * | 2018-10-10 | 2020-04-16 | 腾讯科技(深圳)有限公司 | Method and apparatus for training image recognition model, network device, and storage medium |
CN109620152A (en) * | 2018-12-16 | 2019-04-16 | 北京工业大学 | A kind of electrocardiosignal classification method based on MutiFacolLoss-Densenet |
CN109949824A (en) * | 2019-01-24 | 2019-06-28 | 江南大学 | City sound event classification method based on N-DenseNet and higher-dimension mfcc feature |
CN110097564A (en) * | 2019-04-04 | 2019-08-06 | 平安科技(深圳)有限公司 | Image labeling method, device, computer equipment and storage medium based on multi-model fusion |
CN110176002A (en) * | 2019-06-05 | 2019-08-27 | 深圳大学 | A kind of the lesion detection method and terminal device of radioscopic image |
CN110472668A (en) * | 2019-07-22 | 2019-11-19 | 华北电力大学(保定) | A kind of image classification method |
CN110689085A (en) * | 2019-09-30 | 2020-01-14 | 天津大学 | Garbage classification method based on deep cross-connection network and loss function design |
CN110766063A (en) * | 2019-10-17 | 2020-02-07 | 南京信息工程大学 | Image classification method based on compressed excitation and tightly-connected convolutional neural network |
CN110942105A (en) * | 2019-12-13 | 2020-03-31 | 东华大学 | Mixed pooling method based on maximum pooling and average pooling |
Non-Patent Citations (1)
Title |
---|
付杰: "基于密集型网络的人脸年龄估计", pages 3 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112115906A (en) * | 2020-09-25 | 2020-12-22 | 广州市派客朴食信息科技有限责任公司 | Open dish identification method based on deep learning target detection and metric learning |
CN112115903A (en) * | 2020-09-25 | 2020-12-22 | 广州市派客朴食信息科技有限责任公司 | Method for improving dish identification system identification precision based on deep learning |
CN113033706A (en) * | 2021-04-23 | 2021-06-25 | 广西师范大学 | Multi-source two-stage dish identification method based on visual target detection and re-identification |
CN117975445A (en) * | 2024-03-29 | 2024-05-03 | 江南大学 | Food identification method, system, equipment and medium |
CN117975445B (en) * | 2024-03-29 | 2024-05-31 | 江南大学 | Food identification method, system, equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN111523483B (en) | 2023-10-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110223292B (en) | Image evaluation method, device and computer readable storage medium | |
CN106599883B (en) | CNN-based multilayer image semantic face recognition method | |
CN110097554B (en) | Retina blood vessel segmentation method based on dense convolution and depth separable convolution | |
CN109711426B (en) | Pathological image classification device and method based on GAN and transfer learning | |
CN110599409A (en) | Convolutional neural network image denoising method based on multi-scale convolutional groups and parallel | |
CN110059586B (en) | Iris positioning and segmenting system based on cavity residual error attention structure | |
CN111523483A (en) | Chinese food dish image identification method and device | |
CN109948692B (en) | Computer-generated picture detection method based on multi-color space convolutional neural network and random forest | |
CN110473142B (en) | Single image super-resolution reconstruction method based on deep learning | |
CN111340814A (en) | Multi-mode adaptive convolution-based RGB-D image semantic segmentation method | |
CN110084266B (en) | Dynamic emotion recognition method based on audio-visual feature deep fusion | |
CN110400288B (en) | Sugar network disease identification method and device fusing binocular features | |
CN106503661B (en) | Face gender identification method based on fireworks deepness belief network | |
CN111815562A (en) | Retinal vessel segmentation method combining U-Net and self-adaptive PCNN | |
CN112784929A (en) | Small sample image classification method and device based on double-element group expansion | |
CN111832650A (en) | Image classification method based on generation of confrontation network local aggregation coding semi-supervision | |
CN111694977A (en) | Vehicle image retrieval method based on data enhancement | |
CN111160130A (en) | Multi-dimensional collision recognition method for multi-platform virtual identity account | |
CN110991554B (en) | Improved PCA (principal component analysis) -based deep network image classification method | |
CN112580502A (en) | SICNN-based low-quality video face recognition method | |
CN113987236B (en) | Unsupervised training method and unsupervised training device for visual retrieval model based on graph convolution network | |
CN113033345B (en) | V2V video face recognition method based on public feature subspace | |
Lukic et al. | Galaxy classifications with deep learning | |
CN113361346A (en) | Scale parameter self-adaptive face recognition method for replacing adjustment parameters | |
CN114049675B (en) | Facial expression recognition method based on light-weight two-channel neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |