CN111523483B - Chinese meal dish image recognition method and device - Google Patents

Chinese meal dish image recognition method and device Download PDF

Info

Publication number
CN111523483B
CN111523483B CN202010334520.1A CN202010334520A CN111523483B CN 111523483 B CN111523483 B CN 111523483B CN 202010334520 A CN202010334520 A CN 202010334520A CN 111523483 B CN111523483 B CN 111523483B
Authority
CN
China
Prior art keywords
chinese
image recognition
image
layer
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010334520.1A
Other languages
Chinese (zh)
Other versions
CN111523483A (en
Inventor
高伟东
郝然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202010334520.1A priority Critical patent/CN111523483B/en
Publication of CN111523483A publication Critical patent/CN111523483A/en
Application granted granted Critical
Publication of CN111523483B publication Critical patent/CN111523483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/35Categorising the entire scene, e.g. birthday party or wedding scene
    • G06V20/36Indoor scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/68Food, e.g. fruit or vegetables

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a Chinese meal image recognition method and device, wherein the method comprises the following steps: acquiring a target Chinese food image, and performing preprocessing operation on the target Chinese food image; inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a Chinese food recognition result; the Chinese meal image recognition model is obtained based on a preprocessed Chinese meal image sample and a corresponding Chinese meal category label, and is constructed based on a DenseNet model, and the network structure of the Chinese meal image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity; n is a natural number greater than 1. The embodiment of the invention can accurately detect and identify various Chinese meal, and has wide identification variety and high identification accuracy.

Description

Chinese meal dish image recognition method and device
Technical Field
The invention relates to the technical field of computers, in particular to a Chinese meal dish image recognition method and device.
Background
With the rapid development of deep learning algorithms, computer vision is the most rapidly developed and widely landed field of artificial intelligence, and has been widely applied to aspects of people's life, wherein food recognition is an emerging topic of interest in the field of computer vision at present.
At present, many research on identification algorithms of western-style dishes and Japanese-style dishes is carried out, but the research on a more mature method of image identification of Chinese-style dishes is not so much carried out, not only because of the small classification data set of large-sized Chinese-style dishes disclosed, but also because Chinese-style dishes are more difficult to identify relative to western-style dishes or Japanese-style dishes, the Chinese-style dishes of the same category may take various forms. Meanwhile, the Chinese meal image is also affected by background noise such as dinner plate color, light brightness and the like; in addition, the dishes of different Chinese foods can also look very similar.
For these reasons, the existing technologies capable of accurately identifying Chinese dishes are very limited, and these situations increase the difficulty of accurately identifying Chinese dish images. Therefore, a method for accurately detecting and identifying the Chinese dishes is needed.
Disclosure of Invention
In order to solve or at least partially solve the above problems, an embodiment of the present invention provides a method and an apparatus for identifying a Chinese food image.
In a first aspect, an embodiment of the present invention provides a method for identifying a Chinese food image, including:
acquiring a target Chinese food image, and performing preprocessing operation on the target Chinese food image;
inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a Chinese food recognition result;
the Chinese meal image recognition model is obtained based on a preprocessed Chinese meal image sample and a corresponding Chinese meal category label, and is constructed based on a DenseNet model, and the network structure of the Chinese meal image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity; n is a natural number greater than 1.
Optionally, the step of inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a recognition result specifically includes:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic map through the operation of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first characteristic map to a maximum pooling layer of the Chinese dish image recognition model to obtain a second characteristic map;
inputting the second characteristic map to a first intensive connection block of the Chinese dish image recognition model, and then obtaining a third characteristic map through the operation of a first transition layer;
inputting the third characteristic map to a second intensive connection block of the Chinese dish image recognition model, and then obtaining a fourth characteristic map through the operation of a second transition layer;
inputting the fourth characteristic map to a third intensive connection block of the Chinese meal dish image recognition model, and then obtaining a fifth characteristic map through the operation of a third transition layer;
inputting the fifth characteristic map to a fourth intensive connection block of the Chinese meal dish image recognition model, and then obtaining a sixth characteristic map through the operation of a fourth transition layer;
and inputting the sixth characteristic map to a second batch normalization layer of the Chinese food dish image recognition model, and then obtaining a Chinese food dish recognition result through the operation of the full connection layer and the classifier.
Optionally, each of the first, second, third and fourth dense connection blocks includes a plurality of bottleneck layers of dense connection, each of the bottleneck layers having a composite function including a plurality of operations, the plurality of operations including: batch normalization BN, reLU activation function and 3 x 3 convolution.
Optionally, the plurality of operations further comprises: 1 x 1 convolution.
Optionally, the first transition layer, the second transition layer, the third transition layer, and the fourth transition layer each perform the following operations: batch normalization BN, reLU activation function, 1×1 convolution and 2×2 average pooling, step size 2.
Optionally, before the step of acquiring the target Chinese food item image and performing the preprocessing operation on the target Chinese food item image, the method further includes:
constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first intensive connection block, a first transition layer, a second intensive connection block, a second transition layer, a third intensive connection block, a third transition layer, a fourth intensive connection block, a second batch normalization layer, a full connection layer and a classifier which are sequentially connected;
acquiring a Chinese meal dish image sample, and preprocessing the Chinese meal dish image sample;
inputting the preprocessed Chinese meal image sample into the DenseNet model to obtain an output result;
calculating a loss function value by using a cross entropy loss function based on the output result and a Chinese food category label corresponding to the Chinese food image sample;
based on an Adam optimization algorithm, starting from an output layer of the DenseNet model, adjusting various parameters of the densely connected convolutional neural network so as to move the loss function value towards a minimizing direction;
judging whether the training ending condition is reached, if so, saving the parameters of the DenseNet model of the current iteration to obtain a training-completed Chinese dish image recognition model.
Optionally, preprocessing operation is performed on the target Chinese food image, specifically:
randomly and centrally rotating the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food image after random center rotation according to a preset length-width ratio;
performing horizontal overturning on the target Chinese food image subjected to random clipping according to preset probability;
and normalizing the target Chinese food image subjected to horizontal overturning.
In a second aspect, an embodiment of the present invention provides a device for identifying a Chinese food image, including:
the preprocessing module is used for acquiring a target Chinese food image and executing preprocessing operation on the target Chinese food image;
the identification module is used for inputting the preprocessed target Chinese food image into a Chinese food image identification model to obtain a Chinese food identification result;
the Chinese meal image recognition model is obtained based on the training of the preprocessed Chinese meal image sample, and is constructed based on a DenseNet model, and the network structure of the Chinese meal image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity; n is a natural number greater than 1.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method for identifying a chinese meal image as provided in the first aspect when the program is executed.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the chinese meal image recognition method as provided in the first aspect.
The Chinese meal image recognition method and device provided by the embodiment of the invention can accurately detect and recognize various Chinese meal, and is wide in recognition variety and high in recognition accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a method for identifying Chinese food images according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a network structure of a Chinese food image recognition model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the structure of a dense block;
FIG. 4 is a schematic diagram of the bottleneck layer;
fig. 5 is a schematic structural diagram of a Chinese food image recognition device according to an embodiment of the present invention
Fig. 6 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flow chart of a method for identifying Chinese food images, which includes:
step 100, acquiring a target Chinese food image, and executing preprocessing operation on the target Chinese food image;
specifically, in the embodiment of the invention, a camera with a fixed position is adopted to collect a single target Chinese food item image, and then a preprocessing operation is carried out on the target Chinese food item image, wherein the preprocessing operation comprises a data enhancement operation. Common basic data enhancement operations include the following: rotation, translation, scaling, random shielding, horizontal overturning, color difference, noise disturbance and the like, and a plurality of data enhancement methods can be selected to execute preprocessing operation on the target Chinese food image.
Step 101, inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a Chinese food recognition result;
specifically, the embodiment of the invention inputs the target Chinese food image obtained through the pretreatment operation into a Chinese food image recognition model trained in advance, and then a Chinese food recognition result can be obtained.
The Chinese food image recognition model is obtained based on the preprocessed Chinese food image sample and the corresponding Chinese food category label.
Compared with the general food image, the Chinese food image generally does not show unique space layout and obvious semantic features like most western foods, and the semantic information of the Chinese food image is more difficult to extract. Therefore, in the embodiment of the invention, the Chinese dish image recognition model is constructed based on the DenseNet model, because the DenseNet model does not simply obtain the characterization capability through a very deep or very wide network, but combines and connects the characteristics of different layers through repeated use of the characteristics of the lower layer to the characteristics of the higher layer, the input diversity of the later layers is increased, and the extreme utilization of the image characteristics is realized. Compared with other networks, the DenseNet model parameters are fewer, gradient disappearance is prevented, overfitting on a small sample data set is reduced, and simplicity and high efficiency are achieved.
Further, based on the DenseNet network model, the network structure of the Chinese meal image recognition model comprises: n dense connection blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity.
Different from other convolutional neural networks, the invention realizes feature multiplexing by using a dense connection mode, utilizes image features to the extreme, can better extract semantic information of an image pair, and realizes precise identification with larger probability. The dense connecting blocks are used for relieving gradient disappearance, reducing training parameters, resisting overfitting and realizing feature multiplexing, the transition layers are used for compressing the number of parameters, and the problem of model complexity caused by introducing the dense connecting blocks is reduced.
The Chinese meal image recognition method provided by the embodiment of the invention can accurately detect and recognize various Chinese meal, and has wide recognition variety and high recognition accuracy.
Based on the foregoing embodiments, the step of inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a recognition result specifically includes:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic map through the operation of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first characteristic map to a maximum pooling layer of the Chinese dish image recognition model to obtain a second characteristic map;
inputting the second characteristic map to a first intensive connection block of the Chinese dish image recognition model, and then obtaining a third characteristic map through the operation of a first transition layer;
inputting the third characteristic map to a second intensive connection block of the Chinese dish image recognition model, and then obtaining a fourth characteristic map through the operation of a second transition layer;
inputting the fourth characteristic map to a third intensive connection block of the Chinese meal dish image recognition model, and then obtaining a fifth characteristic map through the operation of a third transition layer;
inputting the fifth characteristic map to a fourth intensive connection block of the Chinese meal dish image recognition model, and then obtaining a sixth characteristic map through the operation of a fourth transition layer;
and inputting the sixth characteristic map to a second batch normalization layer of the Chinese food dish image recognition model, and then obtaining a Chinese food dish recognition result through the operation of the full connection layer and the classifier.
Fig. 2 is a schematic diagram of a network structure of a Chinese food image recognition model according to an embodiment of the present invention, where the Chinese food image recognition model includes a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first dense connection block, a first transition layer, a second dense connection block, a second transition layer, a third dense connection block, a third transition layer, a fourth dense connection block, a second batch normalization layer, a full connection layer, and a classifier, which are sequentially connected.
Specifically, after preprocessing, the target Chinese dish image is input into a Chinese dish image recognition model, the first characteristic map is obtained through the convolution operation of a first convolution layer, the BN operation of a first batch normalization layer and the RELU activation function operation of an excitation layer, then the first characteristic map is input into a maximum pooling layer, the maximum pooling layer is used for downsampling the characteristic map, unnecessary redundant information in the map is removed, a second characteristic map is obtained, and the second characteristic map sequentially passes through four dense connecting blocks, wherein transition layers are arranged among the dense block layers.
In a specific embodiment, the convolution, BN and ReLU operations in fig. 2 are sequentially performed on the target Chinese menu image with 224×224 pixels, so as to implement dimension reduction, and obtain a first feature map with 112×112 pixels. The first feature map is then input to a max pooling layer, which uses a 3 x 3 convolution with a step size of 2. A second feature map with pixels 56 x 56 is obtained as input to the first dense block.
Fig. 3 is a schematic diagram of the structure of dense block blocks, one of which is called bottleneck layer. The reason for making DenseNet superior to other convolutional neural networks is the dense block of connected blocks. With dense block, denseNet has the advantages of gradient extinction alleviation, parameter reduction, overfitting resistance, characteristic multiplexing and the like.
Let us assume that a dense block has l layers, x 0 Is the input of dense block. Each layer has a complex function H comprising three operations l (. Cndot.) three operations are respectively: BN, reLU and 3 x 3. To better improve the information transfer between dense blocks, denseNet proposes a distinctive way of connection: and (5) densely connecting. The dense connection is to connect each layer in a dense block with all subsequent layers, so as to implement feature multiplexing, as shown in fig. 3. Thus, layer I maps the features of all layers before x 0 ,...,x l-1 As input:
x l =H l ([x 0 ,x 1 ,...,x l-1 ])
wherein [ x ] 0 ,x 1 ,...,x l-1 ]The feature map representing the output of layer 0.
Optionally, each bottleneck layer has a composite function including a plurality of operations, the plurality of operations including: batch normalization BN, reLU activation function and 3 x 3 convolution.
Fig. 4 is a schematic diagram of the bottleneck layer structure. Considering that the number of feature maps will be large after dense connections are employed, adding a 1 x 1 convolution before the 3 x 3 convolution of the bottleneck layer can reduce the amount of computation in order to reduce the number of feature maps and reduce the dimension of each feature map.
Further, the first transition layer, the second transition layer, the third transition layer, and the fourth transition layer each perform the following operations: batch normalization BN, reLU activation function, 1×1 convolution and 2×2 average pooling, step size 2. The function of the method is to further compress the parameter quantity, the dimension and the channel number of the output feature map of each dense block are increased rapidly, and the convolution operation of the transition layer can reduce the dimension and average the pooling of the feature map, so that the problem of excessive channel number of the feature map can be solved, and the problem of model complexity after excessive dense blocks are prevented.
If m feature maps are generated through one dense block, generating theta m feature maps through one transition layer, wherein theta is a compression coefficient, and 0 is more than or equal to theta and less than or equal to 1. When θ=1, the number of feature maps passing through the transition layer is unchanged. In the embodiment of the invention, θ=0.5 is set, and the number of the feature maps is reduced by half after passing through the transition layer.
In one specific embodiment, the pixels of the feature map after four dense blocks are 56×56, 28×28, 14×14,7×7, respectively. The output of the fully connected layer was set to the total number of categories of Chinese dishes using the BN and softmax classifier after the last dense block.
Before the trained Chinese food image recognition model is used for recognizing the target Chinese food image, the Chinese food image recognition model is also required to be trained.
Based on the foregoing embodiment, before the step of acquiring the target chinese meal menu image and performing the preprocessing operation on the target chinese meal menu image, the method further includes:
step 200, constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first intensive connection block, a first transition layer, a second intensive connection block, a second transition layer, a third intensive connection block, a third transition layer, a fourth intensive connection block, a second batch normalization layer, a full connection layer and a classifier which are sequentially connected;
specifically, the DenseNet model in this example is a modified DenseNet169 model, having a network structure as shown in FIG. 3.
Step 201, acquiring a Chinese meal dish image sample, and preprocessing the Chinese meal dish image sample;
the purpose of the preprocessing is to achieve image enhancement.
Step 202, inputting the preprocessed Chinese meal image sample into the DenseNet model to obtain an output result;
step 203, calculating a loss function value by using a cross entropy loss function based on the output result and the Chinese food category label corresponding to the Chinese food image sample;
the loss function adopts a cross entropy model to accelerate the convergence speed and the updating speed of the weight matrix.
Step 204, based on Adam optimization algorithm, starting from the output layer of the DenseNet model, adjusting each parameter of the densely connected convolutional neural network so as to move the loss function value towards a minimizing direction;
the optimizer in the training model adopts an Adam algorithm to realize self-adaptive learning rate, so that the training speed is increased, and the robustness of the network is enhanced.
And 205, judging whether the training ending condition is met, if so, storing parameters of the DenseNet model of the current iteration, and obtaining a training-completed Chinese dish image recognition model.
Specifically, a plurality of Shan Cai images are collected by using a camera at a fixed position and stored in a database, and a category label is added for each image, if the images of the category are not in the database, a new category is added for the images of the category label, and the database is proportionally divided into a training set and a testing set. During training, in order to make the model have better classification performance on the data set, the following adjustment is made on the network parameters: epoch is set to 150; batch size 64; the optimizer selects Adam, so that self-adaptive learning rate can be provided, the initial learning rate is 1e-4, the training speed is greatly improved, and the robustness of the network is enhanced; because the invention aims at the classification problem, the loss function adopts a cross entropy model, so that the learning rate can be accelerated when the model convergence effect is poor, and the learning rate can be slowed down when the model effect is good. After 150 epochs, the DenseNet169 takes the optimal model as the final training Chinese dish image recognition model. After training, testing can be performed, and during testing, the testing set is input into the optimal model for testing, so that a testing result can be obtained.
The Chinese dish image recognition method provided by the embodiment of the invention fully utilizes the advantage of realizing feature multiplexing by a DenseNet network dense connection mode, and the adjustment of network super-parameters not only greatly reduces the number of training parameters and the redundancy of a training network, but also enables the dish image features to be utilized extremely, is favorable for capturing the semantic information of dish images, and can obtain a training model with high recognition accuracy and excellent performance through repeated iterative training. Because DenseNet has strong generalization capability, the invention is not only suitable for Chinese meal with high recognition difficulty, but also can be applied to recognition of more foods in principle only by training other types of food data sets.
Based on the content of the above embodiment, preprocessing operation is performed on the target Chinese dish image, specifically:
randomly and centrally rotating the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food image after random center rotation according to a preset length-width ratio;
performing horizontal overturning on the target Chinese food image subjected to random clipping according to preset probability;
and normalizing the target Chinese food image subjected to horizontal overturning.
Specifically, the target Chinese food image is randomly rotated at a preset angle, for example, between-10 degrees and 10 degrees;
randomly cutting the target Chinese food image after random center rotation according to a preset length-width ratio, such as 224×224 length-width ratio;
according to a preset probability, for example, a probability of 0.5, horizontally turning over the target Chinese food image subjected to random clipping;
normalizing the target Chinese food image subjected to horizontal overturning, and eliminating dimension influence among data features.
The pretreatment operation steps provided by the embodiment of the invention are beneficial to obtaining accurate training models and Chinese dish identification results.
Fig. 5 is a schematic structural diagram of a device for identifying a Chinese food image according to an embodiment of the present invention, including: a preprocessing module 510, and an identification module 520, wherein,
the preprocessing module 510 is configured to obtain a target Chinese food image, and perform a preprocessing operation on the target Chinese food image;
the recognition module 520 is configured to input the preprocessed target Chinese food image into a Chinese food image recognition model, so as to obtain a Chinese food recognition result;
the Chinese meal image recognition model is obtained based on a preprocessed Chinese meal image sample and a corresponding Chinese meal category label, and is constructed based on a DenseNet model, and the network structure of the Chinese meal image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity; n is a natural number greater than 1.
The Chinese food image recognition device provided by the embodiment of the invention is used for realizing the embodiment of the Chinese food image recognition method, so that the understanding of each functional module in the embodiment of the invention can refer to the embodiment of the method, and the description is omitted herein.
The Chinese meal image recognition device provided by the embodiment of the invention can accurately detect and recognize various Chinese meal, and has wide recognition variety and high recognition accuracy.
Fig. 6 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 6, the electronic device may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. Processor 610 may invoke a computer program stored in memory 630 and executable on processor 610 to perform the method for identifying a chinese meal image provided by the above-described method embodiments, for example, including: acquiring a target Chinese food image, and performing preprocessing operation on the target Chinese food image; inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a Chinese food recognition result; the Chinese meal image recognition model is obtained based on a preprocessed Chinese meal image sample and a corresponding Chinese meal category label, and is constructed based on a DenseNet model, and the network structure of the Chinese meal image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity; n is a natural number greater than 1.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art or a part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The embodiment of the invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method for identifying Chinese food images provided by the above method embodiments, for example, including: acquiring a target Chinese food image, and performing preprocessing operation on the target Chinese food image; inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a Chinese food recognition result; the Chinese meal image recognition model is obtained based on a preprocessed Chinese meal image sample and a corresponding Chinese meal category label, and is constructed based on a DenseNet model, and the network structure of the Chinese meal image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity; n is a natural number greater than 1.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. The Chinese meal image recognition method is characterized by comprising the following steps of:
acquiring a target Chinese food image, and performing preprocessing operation on the target Chinese food image;
inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a Chinese food recognition result;
the Chinese meal image recognition model is obtained based on a preprocessed Chinese meal image sample and a corresponding Chinese meal category label, and is constructed based on a DenseNet model, and the network structure of the Chinese meal image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity; n is a natural number greater than 1;
the step of inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a recognition result specifically comprises the following steps:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic map through the operation of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first characteristic map to a maximum pooling layer of the Chinese dish image recognition model to obtain a second characteristic map;
inputting the second characteristic map to a first intensive connection block of the Chinese dish image recognition model, and then obtaining a third characteristic map through the operation of a first transition layer;
inputting the third characteristic map to a second intensive connection block of the Chinese dish image recognition model, and then obtaining a fourth characteristic map through the operation of a second transition layer;
inputting the fourth characteristic map to a third intensive connection block of the Chinese meal dish image recognition model, and then obtaining a fifth characteristic map through the operation of a third transition layer;
inputting the fifth characteristic map to a fourth intensive connection block of the Chinese dish image recognition model to obtain a sixth characteristic map;
inputting the sixth characteristic map to a second batch normalization layer of the Chinese meal dish image recognition model, and then obtaining a Chinese meal dish recognition result through the operation of a full connection layer and a classifier;
the preprocessing operation is executed on the target Chinese food image, specifically:
randomly and centrally rotating the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food image after random center rotation according to a preset length-width ratio;
performing horizontal overturning on the target Chinese food image subjected to random clipping according to preset probability;
and normalizing the target Chinese food image subjected to horizontal overturning.
2. The method of claim 1, wherein the first, second, third, and fourth densely connected blocks each comprise a plurality of densely connected bottleneck layers, each bottleneck layer having a composite function comprising a plurality of operations, the plurality of operations comprising: batch normalization BN, reLU activation function and 3 x 3 convolution.
3. The method of claim 2, wherein the plurality of operations further comprises: 1 x 1 convolution.
4. The method of claim 1, wherein the first transition layer, the second transition layer, and the third transition layer each perform the following operations: batch normalization BN, reLU activation function, 1×1 convolution and 2×2 average pooling, step size 2.
5. The method of claim 1, further comprising, prior to the step of obtaining the target chinese meal image and performing a preprocessing operation on the target chinese meal image:
constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first intensive connection block, a first transition layer, a second intensive connection block, a second transition layer, a third intensive connection block, a third transition layer, a fourth intensive connection block, a second batch normalization layer, a full connection layer and a classifier which are sequentially connected;
acquiring a Chinese meal dish image sample, and preprocessing the Chinese meal dish image sample;
inputting the preprocessed Chinese meal image sample into the DenseNet model to obtain an output result;
calculating a loss function value by using a cross entropy loss function based on the output result and a Chinese food category label corresponding to the Chinese food image sample;
based on an Adam optimization algorithm, starting from an output layer of the DenseNet model, adjusting various parameters of the densely connected convolutional neural network so as to move the loss function value towards a minimizing direction;
judging whether the training ending condition is reached, if so, saving the parameters of the DenseNet model of the current iteration to obtain a training-completed Chinese dish image recognition model.
6. A Chinese meal image recognition device, comprising:
the preprocessing module is used for acquiring a target Chinese food image and executing preprocessing operation on the target Chinese food image;
the identification module is used for inputting the preprocessed target Chinese food image into a Chinese food image identification model to obtain a Chinese food identification result;
the Chinese meal image recognition model is obtained based on a preprocessed Chinese meal image sample and a corresponding Chinese meal category label, and is constructed based on a DenseNet model, and the network structure of the Chinese meal image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity; n is a natural number greater than 1;
the step of inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a recognition result specifically comprises the following steps:
inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic map through the operation of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;
inputting the first characteristic map to a maximum pooling layer of the Chinese dish image recognition model to obtain a second characteristic map;
inputting the second characteristic map to a first intensive connection block of the Chinese dish image recognition model, and then obtaining a third characteristic map through the operation of a first transition layer;
inputting the third characteristic map to a second intensive connection block of the Chinese dish image recognition model, and then obtaining a fourth characteristic map through the operation of a second transition layer;
inputting the fourth characteristic map to a third intensive connection block of the Chinese meal dish image recognition model, and then obtaining a fifth characteristic map through the operation of a third transition layer;
inputting the fifth characteristic map to a fourth intensive connection block of the Chinese dish image recognition model to obtain a sixth characteristic map;
inputting the sixth characteristic map to a second batch normalization layer of the Chinese meal dish image recognition model, and then obtaining a Chinese meal dish recognition result through the operation of a full connection layer and a classifier;
the preprocessing operation is executed on the target Chinese food image, specifically:
randomly and centrally rotating the target Chinese food image according to a preset angle;
randomly cutting the target Chinese food image after random center rotation according to a preset length-width ratio;
performing horizontal overturning on the target Chinese food image subjected to random clipping according to preset probability;
and normalizing the target Chinese food image subjected to horizontal overturning.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the method for identifying a chinese meal image according to any one of claims 1 to 5 when the program is executed.
8. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the chinese meal order image recognition method of any one of claims 1 to 5.
CN202010334520.1A 2020-04-24 2020-04-24 Chinese meal dish image recognition method and device Active CN111523483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010334520.1A CN111523483B (en) 2020-04-24 2020-04-24 Chinese meal dish image recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010334520.1A CN111523483B (en) 2020-04-24 2020-04-24 Chinese meal dish image recognition method and device

Publications (2)

Publication Number Publication Date
CN111523483A CN111523483A (en) 2020-08-11
CN111523483B true CN111523483B (en) 2023-10-03

Family

ID=71904579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010334520.1A Active CN111523483B (en) 2020-04-24 2020-04-24 Chinese meal dish image recognition method and device

Country Status (1)

Country Link
CN (1) CN111523483B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115903A (en) * 2020-09-25 2020-12-22 广州市派客朴食信息科技有限责任公司 Method for improving dish identification system identification precision based on deep learning
CN112115906A (en) * 2020-09-25 2020-12-22 广州市派客朴食信息科技有限责任公司 Open dish identification method based on deep learning target detection and metric learning
CN113033706B (en) * 2021-04-23 2022-04-29 广西师范大学 Multi-source two-stage dish identification method based on visual detection and re-identification
CN117975445B (en) * 2024-03-29 2024-05-31 江南大学 Food identification method, system, equipment and medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491765A (en) * 2018-03-05 2018-09-04 中国农业大学 A kind of classifying identification method and system of vegetables image
CN109620152A (en) * 2018-12-16 2019-04-16 北京工业大学 A kind of electrocardiosignal classification method based on MutiFacolLoss-Densenet
CN109949824A (en) * 2019-01-24 2019-06-28 江南大学 City sound event classification method based on N-DenseNet and higher-dimension mfcc feature
CN110097564A (en) * 2019-04-04 2019-08-06 平安科技(深圳)有限公司 Image labeling method, device, computer equipment and storage medium based on multi-model fusion
CN110176002A (en) * 2019-06-05 2019-08-27 深圳大学 A kind of the lesion detection method and terminal device of radioscopic image
CN110472668A (en) * 2019-07-22 2019-11-19 华北电力大学(保定) A kind of image classification method
CN110689085A (en) * 2019-09-30 2020-01-14 天津大学 Garbage classification method based on deep cross-connection network and loss function design
CN110766063A (en) * 2019-10-17 2020-02-07 南京信息工程大学 Image classification method based on compressed excitation and tightly-connected convolutional neural network
CN110942105A (en) * 2019-12-13 2020-03-31 东华大学 Mixed pooling method based on maximum pooling and average pooling
WO2020073951A1 (en) * 2018-10-10 2020-04-16 腾讯科技(深圳)有限公司 Method and apparatus for training image recognition model, network device, and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491765A (en) * 2018-03-05 2018-09-04 中国农业大学 A kind of classifying identification method and system of vegetables image
WO2020073951A1 (en) * 2018-10-10 2020-04-16 腾讯科技(深圳)有限公司 Method and apparatus for training image recognition model, network device, and storage medium
CN109620152A (en) * 2018-12-16 2019-04-16 北京工业大学 A kind of electrocardiosignal classification method based on MutiFacolLoss-Densenet
CN109949824A (en) * 2019-01-24 2019-06-28 江南大学 City sound event classification method based on N-DenseNet and higher-dimension mfcc feature
CN110097564A (en) * 2019-04-04 2019-08-06 平安科技(深圳)有限公司 Image labeling method, device, computer equipment and storage medium based on multi-model fusion
CN110176002A (en) * 2019-06-05 2019-08-27 深圳大学 A kind of the lesion detection method and terminal device of radioscopic image
CN110472668A (en) * 2019-07-22 2019-11-19 华北电力大学(保定) A kind of image classification method
CN110689085A (en) * 2019-09-30 2020-01-14 天津大学 Garbage classification method based on deep cross-connection network and loss function design
CN110766063A (en) * 2019-10-17 2020-02-07 南京信息工程大学 Image classification method based on compressed excitation and tightly-connected convolutional neural network
CN110942105A (en) * 2019-12-13 2020-03-31 东华大学 Mixed pooling method based on maximum pooling and average pooling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
付杰.基于密集型网络的人脸年龄估计.《中国优秀硕士学位论文全文数据库》.2020,正文第3,4章. *

Also Published As

Publication number Publication date
CN111523483A (en) 2020-08-11

Similar Documents

Publication Publication Date Title
CN111523483B (en) Chinese meal dish image recognition method and device
CN111369563B (en) Semantic segmentation method based on pyramid void convolutional network
CN110097554B (en) Retina blood vessel segmentation method based on dense convolution and depth separable convolution
CN110223292B (en) Image evaluation method, device and computer readable storage medium
US11804074B2 (en) Method for recognizing facial expressions based on adversarial elimination
CN109948692B (en) Computer-generated picture detection method based on multi-color space convolutional neural network and random forest
CN109344856B (en) Offline signature identification method based on multilayer discriminant feature learning
CN112580502B (en) SICNN-based low-quality video face recognition method
CN111695513B (en) Facial expression recognition method based on depth residual error network
CN113361623B (en) Medical image classification method combining lightweight CNN with transfer learning
CN110598552A (en) Expression recognition method based on improved particle swarm optimization convolutional neural network optimization
CN112950480A (en) Super-resolution reconstruction method integrating multiple receptive fields and dense residual attention
CN112465842B (en) Multichannel retinal blood vessel image segmentation method based on U-net network
CN111224905A (en) Multi-user detection method based on convolution residual error network in large-scale Internet of things
CN111160130A (en) Multi-dimensional collision recognition method for multi-platform virtual identity account
CN114241564A (en) Facial expression recognition method based on inter-class difference strengthening network
CN117132849A (en) Cerebral apoplexy hemorrhage transformation prediction method based on CT flat-scan image and graph neural network
CN113361346A (en) Scale parameter self-adaptive face recognition method for replacing adjustment parameters
CN113033345B (en) V2V video face recognition method based on public feature subspace
CN112070009B (en) Convolutional neural network expression recognition method based on improved LBP operator
CN115035377A (en) Significance detection network system based on double-stream coding and interactive decoding
Pal et al. Face detection using artificial neural network and wavelet neural network
CN115527253A (en) Attention mechanism-based lightweight facial expression recognition method and system
CN115423828A (en) Retina blood vessel image segmentation method based on MRNet
CN114332491A (en) Saliency target detection algorithm based on feature reconstruction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant