CN111523483A - Chinese food dish image identification method and device - Google Patents

Chinese food dish image identification method and device Download PDF

Info

Publication number
CN111523483A
CN111523483A CN202010334520.1A CN202010334520A CN111523483A CN 111523483 A CN111523483 A CN 111523483A CN 202010334520 A CN202010334520 A CN 202010334520A CN 111523483 A CN111523483 A CN 111523483A
Authority
CN
China
Prior art keywords
chinese food
image
layer
model
chinese
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010334520.1A
Other languages
Chinese (zh)
Other versions
CN111523483B (en
Inventor
高伟东
郝然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202010334520.1A priority Critical patent/CN111523483B/en
Publication of CN111523483A publication Critical patent/CN111523483A/en
Application granted granted Critical
Publication of CN111523483B publication Critical patent/CN111523483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/68Food, e.g. fruit or vegetables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a Chinese food image identification method and a device, wherein the method comprises the following steps: acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image; inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result; the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1. The embodiment of the invention can accurately detect and identify various Chinese food, and has wide identification types and high identification accuracy.

Description

Chinese food dish image identification method and device
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for identifying Chinese food dish images.
Background
With the rapid development of deep learning algorithm, computer vision becomes the field with fastest artificial intelligence development and widest landing, and has been widely applied to various aspects in people's life, wherein food recognition is a new topic of great concern in the field of computer vision at present.
At present, many recognition algorithm researches aiming at western food and Japanese dishes exist, but the researches on a relatively mature method for recognizing Chinese dish images are not much, not only because the classification data set of the large-scale Chinese dish is few, but also the Chinese dish is more difficult to recognize compared with the western food or the Japanese dish, and the Chinese dish of the same category can present various different forms. Meanwhile, the images of the Chinese dishes are influenced by background noises such as the color, light and shade of the dinner plate and the like; in addition, different Chinese dishes may look similar.
For these reasons, the existing technologies capable of accurately identifying the Chinese food are very limited, and these situations all increase the difficulty of accurately identifying the Chinese food images. Therefore, a method for accurately detecting and identifying Chinese food is needed.
Disclosure of Invention
In order to solve or at least partially solve the above problem, embodiments of the present invention provide a method and an apparatus for identifying images of Chinese food dishes.
In a first aspect, an embodiment of the present invention provides a method for identifying an image of a Chinese food item, including:
acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image;
inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result;
the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
Optionally, the step of inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a recognition result specifically includes:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic mapping chart through the operations of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first feature mapping chart into a maximum pooling layer of the Chinese food image identification model to obtain a second feature mapping chart;
inputting the second feature mapping chart into a first dense connecting block of the Chinese food image identification model, and then obtaining a third feature mapping chart through the operation of a first transition layer;
inputting the third feature mapping chart into a second dense connecting block of the Chinese food image identification model, and then obtaining a fourth feature mapping chart through operation of a second transition layer;
inputting the fourth feature mapping chart into a third dense connecting block of the Chinese food image identification model, and then obtaining a fifth feature mapping chart through the operation of a third transition layer;
inputting the fifth feature mapping map into a fourth dense connecting block of the Chinese food image identification model, and then obtaining a sixth feature mapping map through the operation of a fourth transition layer;
and inputting the sixth feature mapping chart into a second batch normalization layer of the Chinese food image identification model, and then obtaining a Chinese food identification result through the operation of a full connection layer and a classifier.
Optionally, the first densely connected block, the second densely connected block, the third densely connected block, and the fourth densely connected block each include a plurality of densely connected bottleneck layers, each of the bottleneck layers has a complex function including a plurality of operations, and the plurality of operations include: batch normalized BN, ReLU activation function, and 3 × 3 convolution.
Optionally, the plurality of operations further comprises: 1 × 1 convolution.
Optionally, the first transition layer, the second transition layer, the third transition layer and the fourth transition layer each perform the following operations: batch normalized BN, ReLU activation function, 1 × 1 convolution and 2 × 2 average pooling, step size of 2.
Optionally, before the step of obtaining the target chinese food item image and performing the preprocessing operation on the target chinese food item image, the method further includes:
constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first dense connecting block, a first transition layer, a second dense connecting block, a second transition layer, a third dense connecting block, a third transition layer, a fourth dense connecting block, a second batch normalization layer, a full connecting layer and a classifier which are sequentially connected;
acquiring a Chinese food image sample, and preprocessing the Chinese food image sample;
inputting the preprocessed Chinese food image samples into the DenseNet model to obtain an output result;
calculating a loss function value by using a cross entropy loss function based on the output result and the Chinese food category label corresponding to the Chinese food image sample;
adjusting, based on an Adam optimization algorithm, respective parameters of the densely-connected convolutional neural network from an output layer of the DenseNet model so as to move the loss function value toward a minimization direction;
and judging whether the training end condition is met, if so, saving the parameters of the current iteration DenseNet model, and obtaining a Chinese food image recognition model after training.
Optionally, a preprocessing operation is performed on the target Chinese food image, specifically:
carrying out random center rotation on the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food dish image subjected to random center rotation according to a preset length-width ratio;
horizontally overturning the randomly cut target Chinese food dish image according to a preset probability;
and normalizing the horizontally overturned target Chinese food dish image.
In a second aspect, an embodiment of the present invention provides an image recognition apparatus for a Chinese meal dish, including:
the preprocessing module is used for acquiring a target Chinese food dish image and executing preprocessing operation on the target Chinese food dish image;
the identification module is used for inputting the preprocessed target Chinese food dish image into a Chinese food dish image identification model to obtain a Chinese food dish identification result;
the Chinese food image recognition model is obtained based on preprocessed Chinese food image sample training, the Chinese food image recognition model is constructed based on a DenseNet model, and a network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the image recognition method for Chinese food items as provided in the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the image recognition method for Chinese food items as provided in the first aspect.
The Chinese food image identification method and device provided by the embodiment of the invention can accurately detect and identify various Chinese foods, and have the advantages of wide identification types and high identification accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for identifying images of Chinese food dishes according to an embodiment of the present invention;
fig. 2 is a schematic network structure diagram of a Chinese food image recognition model according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a dense connecting block dense block;
FIG. 4 is a schematic diagram of a bottleneck layer;
FIG. 5 is a schematic structural diagram of an image recognition apparatus for Chinese food dishes according to an embodiment of the present invention
Fig. 6 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for identifying images of Chinese food dishes according to an embodiment of the present invention, including:
step 100, acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image;
specifically, in the embodiment of the invention, a camera with a fixed position is adopted to acquire a single target Chinese food image, and then preprocessing operation is performed on the target Chinese food image, wherein the preprocessing operation comprises data enhancement operation. Common basic data enhancement operations include the following: rotation, translation, scaling, random shielding, horizontal turning, color difference, noise disturbance and the like, and some data enhancement methods can be selected to perform preprocessing operation on the target Chinese food image.
Step 101, inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result;
specifically, in the embodiment of the present invention, the target Chinese food dish image obtained through the preprocessing operation is input into a pre-trained Chinese food dish image recognition model, so that a Chinese food dish recognition result can be obtained.
The Chinese food image recognition model is obtained based on preprocessed Chinese food image samples and corresponding Chinese food category labels through training.
Compared with a common food image, the Chinese food image generally does not show unique spatial layout and obvious semantic features like most western food, and semantic information of the Chinese food image is more difficult to extract. Therefore, in the embodiment of the invention, the Chinese food image recognition model is constructed based on the DenseNet model, because the DenseNet model does not simply obtain the characterization capability through a very deep or very wide network, but combines and connects the features of different layers by repeatedly using the features from the lower layer to the higher layer, thereby increasing the diversity of the input of the later layer and realizing the extreme utilization of the image features. And compared with other networks, the DenseNet model has fewer parameters, prevents gradient disappearance, reduces overfitting on a small sample data set, and is simpler and more efficient.
Further, based on the DenseNet network model, the network structure of the Chinese food image recognition model comprises: n dense connection blocks for implementing feature multiplexing and N-1 transition layers for compressing the number of parameters.
Different from other convolutional neural networks, the invention realizes feature multiplexing by applying a dense connection mode, utilizes the image features to the utmost extent, can better extract image semantic information and realizes precise identification with higher probability. The dense connecting block is used for relieving gradient disappearance, reducing training parameters, resisting overfitting and realizing feature multiplexing, and the transition layer is used for compressing the number of parameters and reducing the problem of model complication caused by introducing the dense connecting block.
The Chinese food dish image identification method provided by the embodiment of the invention can accurately detect and identify various Chinese foods, and has the advantages of wide identification types and high identification accuracy.
Based on the content of the above embodiment, the step of inputting the preprocessed target chinese food dish image into a chinese food dish image recognition model to obtain a recognition result specifically includes:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic mapping chart through the operations of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first feature mapping chart into a maximum pooling layer of the Chinese food image identification model to obtain a second feature mapping chart;
inputting the second feature mapping chart into a first dense connecting block of the Chinese food image identification model, and then obtaining a third feature mapping chart through the operation of a first transition layer;
inputting the third feature mapping chart into a second dense connecting block of the Chinese food image identification model, and then obtaining a fourth feature mapping chart through operation of a second transition layer;
inputting the fourth feature mapping chart into a third dense connecting block of the Chinese food image identification model, and then obtaining a fifth feature mapping chart through the operation of a third transition layer;
inputting the fifth feature mapping map into a fourth dense connecting block of the Chinese food image identification model, and then obtaining a sixth feature mapping map through the operation of a fourth transition layer;
and inputting the sixth feature mapping chart into a second batch normalization layer of the Chinese food image identification model, and then obtaining a Chinese food identification result through the operation of a full connection layer and a classifier.
Fig. 2 is a schematic network structure diagram of a chinese food image recognition model according to an embodiment of the present invention, where the chinese food image recognition model includes a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first dense connection block, a first transition layer, a second dense connection block, a second transition layer, a third dense connection block, a third transition layer, a fourth dense connection block, a second batch normalization layer, a full connection layer, and a classifier, which are connected in sequence.
Specifically, a target Chinese food image is preprocessed and then input into a Chinese food image recognition model, dimension reduction is achieved after convolution operation of a first convolution layer, BN operation of a first batch normalization layer and RELU activation function operation of an excitation layer, a first feature map is obtained, then the first feature map is input into a maximum pooling layer, the maximum pooling layer is used for down-sampling the feature map, unnecessary redundant information in the map is removed, a second feature map is obtained, the second feature map sequentially passes through four dense connecting block dense connecting blocks, and a transition layer is arranged between every two dense block layers.
In a specific embodiment, the convolution, BN and ReLU operations in fig. 2 are performed on the target Chinese meal image with a pixel of 224 × 224 in sequence to implement dimension reduction, and a first feature map with a pixel of 112 × 112 is obtained. The first feature map is then input into the max pooling layer, which is convolved 3 x 3 with a step size of 2. A second feature map with 56 x 56 pixels is obtained as input to the first densely connected block.
Fig. 3 is a schematic structural diagram of a dense connecting block, and one layer in the dense connecting block is called a bottleneck bottleeck layer. The reason why DenseNet is preferred over other convolutional neural networks is the densely connected block dense. With dense block, DenseNet has the advantages of relieving gradient disappearance, reducing parameters, resisting overfitting, reusing characteristics and the like.
Suppose a dense block has l layers, x0Is the input of the dense block. Each layer has a complex function H comprising three operationsl(r.) three operations are respectively convolution of BN, ReLU and 3 × 3 DenseBlock, DenseNet proposes a method of convolution with DenseBlock to better improve information transfer between DenseBlockMany different connection methods: and (4) densely connecting. Dense connection is to connect each layer in a dense block with all the following layers to realize feature multiplexing, as shown in fig. 3. Thus, the l-th layer maps the features x of all previous layers0,...,xl-1As inputs:
xl=Hl([x0,x1,...,xl-1])
wherein, [ x ]0,x1,...,xl-1]The feature map representing the output of the 0 th, 1 th and l-1 th layers is combined and connected to be used as the input of the l-th layer.
Optionally, each of the bottleneck layers has a complex function including a plurality of operations, the plurality of operations including: batch normalized BN, ReLU activation function, and 3 × 3 convolution.
Fig. 4 is a schematic diagram of a bottleneck layer structure. Considering that the number of feature maps will be large after dense connections are used, adding a 1 × 1 convolution before the 3 × 3 convolution of the bottleeck layer can reduce the amount of computation in order to reduce the number of feature maps and reduce the dimensionality of each feature map.
Further, the first, second, third and fourth transition layers each perform the following operations: batch normalized BN, ReLU activation function, 1 × 1 convolution and 2 × 2 average pooling, step size of 2. The method has the advantages that the number of parameters is further reduced, the dimension and the channel number of the output feature mapping chart of each dense block are increased sharply, the problem that the channel number of the feature mapping chart is too large can be solved by performing dimension reduction and average pooling on the feature mapping chart through convolution operation of the transition layer, and therefore the problem that the model is complicated after too many dense blocks is solved.
If m feature maps are generated through a dense block, theta m feature maps are generated after a transition layer, wherein theta is a compression coefficient, and theta is greater than 0 and less than or equal to 1. When θ is 1, the number of feature maps passing through the transition layer is unchanged. In the embodiment of the invention, the value theta is set to be 0.5, and the number of the feature maps after passing through the transition layer is reduced by half.
In a specific embodiment, the pixels of the feature map after passing through the four dense blocks are 56 × 56, 28 × 28, 14 × 14, and 7 × 7, respectively. The output of the fully connected layer is set to the total number of categories of the meal items using the BN and softmax classifiers after the last dense block.
Before the trained Chinese food image recognition model is used for recognizing the target Chinese food image, the Chinese food image recognition model needs to be trained.
Based on the content of the above embodiment, before the step of obtaining the target chinese food item image and performing the preprocessing operation on the target chinese food item image, the method further includes:
200, constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first intensive connection block, a first transition layer, a second intensive connection block, a second transition layer, a third intensive connection block, a third transition layer, a fourth intensive connection block, a second batch normalization layer, a full connection layer and a classifier which are sequentially connected;
specifically, the densnet model in this example is a modified densnet 169 model, having a network structure as shown in fig. 3.
Step 201, obtaining a Chinese food image sample, and preprocessing the Chinese food image sample;
the purpose of the pre-processing is to achieve image enhancement.
Step 202, inputting the preprocessed Chinese food image samples into the DenseNet model to obtain an output result;
step 203, calculating a loss function value by using a cross entropy loss function based on the output result and the Chinese food category label corresponding to the Chinese food image sample;
and the loss function adopts a cross entropy model to accelerate the convergence speed and the updating speed of the weight matrix.
Step 204, based on Adam optimization algorithm, starting from the output layer of the DenseNet model, adjusting each parameter of the dense connection type convolutional neural network so as to move the loss function value towards the minimization direction;
an optimizer in the training model adopts an Adam algorithm, so that the self-adaptive learning rate is realized, the training speed is increased, and the robustness of the network is enhanced.
And step 205, judging whether a training end condition is met, if so, saving parameters of the currently iterated DenseNet model, and obtaining a Chinese food image recognition model after training.
Specifically, a camera with a fixed position is used for collecting a plurality of single dish images, storing the single dish images into a database, adding a category label for each image, if the database does not have the image of the category, adding the category label for the image of the category to create a new category, and dividing the database into a training set and a testing set according to a proportion. During training, in order to make the model have better classification performance on a data set, the network parameters are adjusted as follows: epoch is set to 150; batch size 64; the optimizer selects Adam, can provide self-adaptive learning rate, the initial learning rate is 1e-4, the training speed is greatly improved, and the robustness of the network is enhanced; because the invention aims at the classification problem, the loss function adopts the cross entropy model, so that the learning rate can be accelerated when the convergence effect of the model is poor, and the learning rate is slowed when the effect of the model is good. And after 150 epochs pass through the DenseNet169, taking the optimal model as the Chinese food image recognition model after the final training. After training, testing can be carried out, and during testing, the test set is input into the optimal model for testing, so that a test result can be obtained.
The Chinese food dish image identification method provided by the embodiment of the invention fully utilizes the advantage that the DenseNet dense connection mode realizes feature multiplexing, and the adjustment of the network super-parameters not only greatly reduces the number of training parameters and the redundancy of the training network, but also enables the dish image features to be utilized extremely, is beneficial to capturing the semantic information of the dish image, and can obtain a training model with high identification accuracy and excellent performance through multiple iterative training. Because of the strong generalization ability of DenseNet, the invention is not only suitable for identifying Chinese meals with high difficulty, but also can be applied to identifying more foods in principle only through training of other food data sets.
Based on the content of the above embodiment, the preprocessing operation is performed on the target Chinese meal dish image, specifically:
carrying out random center rotation on the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food dish image subjected to random center rotation according to a preset length-width ratio;
horizontally overturning the randomly cut target Chinese food dish image according to a preset probability;
and normalizing the horizontally overturned target Chinese food dish image.
Specifically, the target Chinese food image is subjected to random center rotation according to a preset angle, such as-10 degrees to 10 degrees;
randomly cutting the target Chinese food image with the random center rotated according to a preset length-width ratio, for example, a length-width ratio of 224 × 224;
horizontally turning the target Chinese food dish image which is cut randomly according to a preset probability, such as the probability of 0.5;
and normalizing the horizontally overturned target Chinese food dish image to eliminate dimensional influence among data characteristics.
The preprocessing operation steps provided by the embodiment of the invention are beneficial to obtaining accurate training models and Chinese food dish identification results.
Fig. 5 is a schematic structural diagram of a device for recognizing images of chinese food items according to an embodiment of the present invention, including: a pre-processing module 510 and a recognition module 520, wherein,
the preprocessing module 510 is configured to obtain an image of a target Chinese food dish, and perform a preprocessing operation on the image of the target Chinese food dish;
the identification module 520 is used for inputting the preprocessed target Chinese food dish image into a Chinese food dish image identification model to obtain a Chinese food dish identification result;
the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
The Chinese food image identification device provided by the embodiment of the invention is used for realizing the Chinese food image identification method embodiment, so that the understanding of each functional module in the embodiment of the invention can refer to the method embodiment, and the details are not repeated herein.
The Chinese food dish image recognition device provided by the embodiment of the invention can accurately detect and recognize various Chinese foods, and has the advantages of wide recognition types and high recognition accuracy.
Fig. 6 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 6, the electronic device may include: a processor (processor)610, a communication Interface (Communications Interface)620, a memory (memory)630 and a communication bus 640, wherein the processor 610, the communication Interface 620 and the memory 630 communicate with each other via the communication bus 640. The processor 610 may invoke a computer program stored on the memory 630 and operable on the processor 610 to perform the Chinese food image recognition methods provided by the above-described method embodiments, including, for example: acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image; inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result; the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
In addition, the logic instructions in the memory 630 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for identifying images of chinese food items provided in the foregoing method embodiments, for example, the method includes: acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image; inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result; the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A Chinese food dish image identification method is characterized by comprising the following steps:
acquiring a target Chinese food dish image, and executing preprocessing operation on the target Chinese food dish image;
inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a Chinese food dish recognition result;
the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
2. The method for recognizing Chinese food dish images according to claim 1, wherein the step of inputting the preprocessed target Chinese food dish image into a Chinese food dish image recognition model to obtain a recognition result specifically comprises:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic mapping chart through the operations of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first feature mapping chart into a maximum pooling layer of the Chinese food image identification model to obtain a second feature mapping chart;
inputting the second feature mapping chart into a first dense connecting block of the Chinese food image identification model, and then obtaining a third feature mapping chart through the operation of a first transition layer;
inputting the third feature mapping chart into a second dense connecting block of the Chinese food image identification model, and then obtaining a fourth feature mapping chart through operation of a second transition layer;
inputting the fourth feature mapping chart into a third dense connecting block of the Chinese food image identification model, and then obtaining a fifth feature mapping chart through the operation of a third transition layer;
inputting the fifth feature mapping map into a fourth dense connecting block of the Chinese food image identification model, and then obtaining a sixth feature mapping map through the operation of a fourth transition layer;
and inputting the sixth feature mapping chart into a second batch normalization layer of the Chinese food image identification model, and then obtaining a Chinese food identification result through the operation of a full connection layer and a classifier.
3. The method of claim 2, wherein the first, second, third and fourth densely connected blocks each comprise a plurality of densely connected bottleneck layers, each of the bottleneck layers having a complex function comprising a plurality of operations, the plurality of operations comprising: batch normalized BN, ReLU activation function, and 3 × 3 convolution.
4. The method of claim 3, wherein the plurality of operations further comprise: 1 × 1 convolution.
5. The Chinese meal image identification method according to claim 2, wherein the first transition layer, the second transition layer, the third transition layer and the fourth transition layer each perform the following operations: batch normalized BN, ReLU activation function, 1 × 1 convolution and 2 × 2 average pooling, step size of 2.
6. The method for identifying Chinese food dish image according to claim 1, wherein before the step of obtaining the target Chinese food dish image and performing preprocessing operation on the target Chinese food dish image, the method further comprises:
constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first dense connecting block, a first transition layer, a second dense connecting block, a second transition layer, a third dense connecting block, a third transition layer, a fourth dense connecting block, a second batch normalization layer, a full connecting layer and a classifier which are sequentially connected;
acquiring a Chinese food image sample, and preprocessing the Chinese food image sample;
inputting the preprocessed Chinese food image samples into the DenseNet model to obtain an output result;
calculating a loss function value by using a cross entropy loss function based on the output result and the Chinese food category label corresponding to the Chinese food image sample;
adjusting, based on an Adam optimization algorithm, respective parameters of the densely-connected convolutional neural network from an output layer of the DenseNet model so as to move the loss function value toward a minimization direction;
and judging whether the training end condition is met, if so, saving the parameters of the current iteration DenseNet model, and obtaining a Chinese food image recognition model after training.
7. The Chinese food dish image identification method according to claim 1, wherein the preprocessing operation is performed on the target Chinese food dish image, and specifically comprises:
carrying out random center rotation on the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food dish image subjected to random center rotation according to a preset length-width ratio;
horizontally overturning the randomly cut target Chinese food dish image according to a preset probability;
and normalizing the horizontally overturned target Chinese food dish image.
8. An image recognition device for Chinese food dishes, comprising:
the preprocessing module is used for acquiring a target Chinese food dish image and executing preprocessing operation on the target Chinese food dish image;
the identification module is used for inputting the preprocessed target Chinese food dish image into a Chinese food dish image identification model to obtain a Chinese food dish identification result;
the Chinese food image recognition model is obtained by training based on a preprocessed Chinese food image sample and a corresponding Chinese food category label, the Chinese food image recognition model is constructed based on a DenseNet model, and the network structure of the Chinese food image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing the number of parameters; n is a natural number greater than 1.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method for image recognition of Chinese food items according to any one of claims 1 to 7 are implemented when the program is executed by the processor.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for image recognition of chinese meal dishes according to any one of claims 1 to 7.
CN202010334520.1A 2020-04-24 2020-04-24 Chinese meal dish image recognition method and device Active CN111523483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010334520.1A CN111523483B (en) 2020-04-24 2020-04-24 Chinese meal dish image recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010334520.1A CN111523483B (en) 2020-04-24 2020-04-24 Chinese meal dish image recognition method and device

Publications (2)

Publication Number Publication Date
CN111523483A true CN111523483A (en) 2020-08-11
CN111523483B CN111523483B (en) 2023-10-03

Family

ID=71904579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010334520.1A Active CN111523483B (en) 2020-04-24 2020-04-24 Chinese meal dish image recognition method and device

Country Status (1)

Country Link
CN (1) CN111523483B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115906A (en) * 2020-09-25 2020-12-22 广州市派客朴食信息科技有限责任公司 Open dish identification method based on deep learning target detection and metric learning
CN112115903A (en) * 2020-09-25 2020-12-22 广州市派客朴食信息科技有限责任公司 Method for improving dish identification system identification precision based on deep learning
CN113033706A (en) * 2021-04-23 2021-06-25 广西师范大学 Multi-source two-stage dish identification method based on visual target detection and re-identification
CN117975445A (en) * 2024-03-29 2024-05-03 江南大学 Food identification method, system, equipment and medium
CN117975445B (en) * 2024-03-29 2024-05-31 江南大学 Food identification method, system, equipment and medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491765A (en) * 2018-03-05 2018-09-04 中国农业大学 A kind of classifying identification method and system of vegetables image
CN109620152A (en) * 2018-12-16 2019-04-16 北京工业大学 A kind of electrocardiosignal classification method based on MutiFacolLoss-Densenet
CN109949824A (en) * 2019-01-24 2019-06-28 江南大学 City sound event classification method based on N-DenseNet and higher-dimension mfcc feature
CN110097564A (en) * 2019-04-04 2019-08-06 平安科技(深圳)有限公司 Image labeling method, device, computer equipment and storage medium based on multi-model fusion
CN110176002A (en) * 2019-06-05 2019-08-27 深圳大学 A kind of the lesion detection method and terminal device of radioscopic image
CN110472668A (en) * 2019-07-22 2019-11-19 华北电力大学(保定) A kind of image classification method
CN110689085A (en) * 2019-09-30 2020-01-14 天津大学 Garbage classification method based on deep cross-connection network and loss function design
CN110766063A (en) * 2019-10-17 2020-02-07 南京信息工程大学 Image classification method based on compressed excitation and tightly-connected convolutional neural network
CN110942105A (en) * 2019-12-13 2020-03-31 东华大学 Mixed pooling method based on maximum pooling and average pooling
WO2020073951A1 (en) * 2018-10-10 2020-04-16 腾讯科技(深圳)有限公司 Method and apparatus for training image recognition model, network device, and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491765A (en) * 2018-03-05 2018-09-04 中国农业大学 A kind of classifying identification method and system of vegetables image
WO2020073951A1 (en) * 2018-10-10 2020-04-16 腾讯科技(深圳)有限公司 Method and apparatus for training image recognition model, network device, and storage medium
CN109620152A (en) * 2018-12-16 2019-04-16 北京工业大学 A kind of electrocardiosignal classification method based on MutiFacolLoss-Densenet
CN109949824A (en) * 2019-01-24 2019-06-28 江南大学 City sound event classification method based on N-DenseNet and higher-dimension mfcc feature
CN110097564A (en) * 2019-04-04 2019-08-06 平安科技(深圳)有限公司 Image labeling method, device, computer equipment and storage medium based on multi-model fusion
CN110176002A (en) * 2019-06-05 2019-08-27 深圳大学 A kind of the lesion detection method and terminal device of radioscopic image
CN110472668A (en) * 2019-07-22 2019-11-19 华北电力大学(保定) A kind of image classification method
CN110689085A (en) * 2019-09-30 2020-01-14 天津大学 Garbage classification method based on deep cross-connection network and loss function design
CN110766063A (en) * 2019-10-17 2020-02-07 南京信息工程大学 Image classification method based on compressed excitation and tightly-connected convolutional neural network
CN110942105A (en) * 2019-12-13 2020-03-31 东华大学 Mixed pooling method based on maximum pooling and average pooling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
付杰: "基于密集型网络的人脸年龄估计", pages 3 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115906A (en) * 2020-09-25 2020-12-22 广州市派客朴食信息科技有限责任公司 Open dish identification method based on deep learning target detection and metric learning
CN112115903A (en) * 2020-09-25 2020-12-22 广州市派客朴食信息科技有限责任公司 Method for improving dish identification system identification precision based on deep learning
CN113033706A (en) * 2021-04-23 2021-06-25 广西师范大学 Multi-source two-stage dish identification method based on visual target detection and re-identification
CN117975445A (en) * 2024-03-29 2024-05-03 江南大学 Food identification method, system, equipment and medium
CN117975445B (en) * 2024-03-29 2024-05-31 江南大学 Food identification method, system, equipment and medium

Also Published As

Publication number Publication date
CN111523483B (en) 2023-10-03

Similar Documents

Publication Publication Date Title
CN110223292B (en) Image evaluation method, device and computer readable storage medium
CN106599883B (en) CNN-based multilayer image semantic face recognition method
CN110097554B (en) Retina blood vessel segmentation method based on dense convolution and depth separable convolution
CN109711426B (en) Pathological image classification device and method based on GAN and transfer learning
CN110599409A (en) Convolutional neural network image denoising method based on multi-scale convolutional groups and parallel
CN110059586B (en) Iris positioning and segmenting system based on cavity residual error attention structure
CN111523483A (en) Chinese food dish image identification method and device
CN109948692B (en) Computer-generated picture detection method based on multi-color space convolutional neural network and random forest
CN110473142B (en) Single image super-resolution reconstruction method based on deep learning
CN111340814A (en) Multi-mode adaptive convolution-based RGB-D image semantic segmentation method
CN110084266B (en) Dynamic emotion recognition method based on audio-visual feature deep fusion
CN110400288B (en) Sugar network disease identification method and device fusing binocular features
CN106503661B (en) Face gender identification method based on fireworks deepness belief network
CN111815562A (en) Retinal vessel segmentation method combining U-Net and self-adaptive PCNN
CN112784929A (en) Small sample image classification method and device based on double-element group expansion
CN111832650A (en) Image classification method based on generation of confrontation network local aggregation coding semi-supervision
CN111694977A (en) Vehicle image retrieval method based on data enhancement
CN111160130A (en) Multi-dimensional collision recognition method for multi-platform virtual identity account
CN110991554B (en) Improved PCA (principal component analysis) -based deep network image classification method
CN112580502A (en) SICNN-based low-quality video face recognition method
CN113987236B (en) Unsupervised training method and unsupervised training device for visual retrieval model based on graph convolution network
CN113033345B (en) V2V video face recognition method based on public feature subspace
Lukic et al. Galaxy classifications with deep learning
CN113361346A (en) Scale parameter self-adaptive face recognition method for replacing adjustment parameters
CN114049675B (en) Facial expression recognition method based on light-weight two-channel neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant