CN111523483B

CN111523483B - Chinese meal dish image recognition method and device

Info

Publication number: CN111523483B
Application number: CN202010334520.1A
Authority: CN
Inventors: 高伟东; 郝然
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2023-10-03
Anticipated expiration: 2040-04-24
Also published as: CN111523483A

Abstract

Embodiments of the present invention provide a Chinese food dish image recognition method and device. The method includes: acquiring a target Chinese food dish image, performing a preprocessing operation on the target Chinese food dish image; and inputting the preprocessed target Chinese food dish image. to the Chinese food dish image recognition model to obtain Chinese food dish recognition results; wherein, the Chinese food dish image recognition model is obtained based on preprocessed Chinese food dish image samples and corresponding Chinese food dish category label training, and the Chinese food dish image recognition model The model is built based on the DenseNet model. The network structure of the Chinese food dish image recognition model includes: N dense connection blocks for feature reuse and N-1 transition layers for compressing the number of parameters; N is a natural number greater than 1 . Embodiments of the present invention can accurately detect and identify a variety of Chinese food, with a wide range of identification types and high identification accuracy.

Description

Chinese food dish image recognition method and device

技术领域Technical field

本发明涉及计算机技术领域，更具体地，涉及一种中餐菜品图像识别方法及装置。The present invention relates to the field of computer technology, and more specifically, to a Chinese food dish image recognition method and device.

背景技术Background technique

随着深度学习算法快速发展，计算机视觉成为了人工智能发展最快、落地最广的领域，并且已经广泛应用到人们生活中的方方面面，其中食物识别是目前计算机视觉领域中备受关注的一个新兴话题。With the rapid development of deep learning algorithms, computer vision has become the fastest growing and most widely implemented field of artificial intelligence, and has been widely used in all aspects of people's lives. Among them, food recognition is an emerging field that has attracted much attention in the field of computer vision. topic.

目前有许多针对西餐和日式菜品的识别算法研究，但针对中餐菜品图像识别的较成熟方法研究还不多，不仅由于公开的大型中餐菜品分类数据集很少，而且中餐菜品相对于西餐或者日式菜品更难识别，因为同种类别的中餐菜品可能会呈现出各种不同的形式。同时，中餐菜品图像还会受餐盘颜色、光线明暗等背景噪声的影响；另外不同中餐菜品之间还可能看起来很相似。There are currently many studies on recognition algorithms for Western and Japanese dishes, but there are not many studies on more mature methods for image recognition of Chinese dishes. This is not only because there are very few public large-scale Chinese dish classification data sets, but also because Chinese dishes are compared with Western or Japanese dishes. Chinese dishes are more difficult to identify because the same category of Chinese dishes may appear in a variety of different forms. At the same time, the images of Chinese food dishes are also affected by background noise such as the color of the plate and the brightness of the light; in addition, different Chinese food dishes may look very similar.

基于这些原因，目前能够实现对中餐菜品进行准确识别的现有技术非常有限，这些情况都增加了中餐菜品图像的识别准确的难度。因此需要一种能对中餐菜品进行准确检测与识别的方法。For these reasons, the existing technologies that can accurately identify Chinese food dishes are currently very limited. These situations have increased the difficulty of accurately identifying Chinese food dish images. Therefore, a method that can accurately detect and identify Chinese food dishes is needed.

发明内容Contents of the invention

为了解决或者至少部分地解决上述问题，本发明实施例提供一种中餐菜品图像识别方法及装置。In order to solve or at least partially solve the above problems, embodiments of the present invention provide a Chinese food dish image recognition method and device.

第一方面，本发明实施例提供一种中餐菜品图像识别方法，包括：In a first aspect, an embodiment of the present invention provides a Chinese food dish image recognition method, including:

获取目标中餐菜品图像，对所述目标中餐菜品图像执行预处理操作；Obtain the target Chinese food dish image, and perform a preprocessing operation on the target Chinese food dish image;

将经过预处理的所述目标中餐菜品图像输入至中餐菜品图像识别模型中，获得中餐菜品识别结果；Input the preprocessed target Chinese food dish image into the Chinese food dish image recognition model to obtain the Chinese food dish recognition result;

其中，所述中餐菜品图像识别模型是基于经过预处理的中餐菜品图像样本以及对应的中餐菜品类别标签训练获得的，所述中餐菜品图像识别模型基于DenseNet模型构建，所述中餐菜品图像识别模型的网络结构包括：N个用于实现特征复用的密集连接块和N-1个用于压缩参数数量的过渡层；N为大于1的自然数。Wherein, the Chinese food dish image recognition model is obtained by training based on preprocessed Chinese food dish image samples and corresponding Chinese food category labels. The Chinese food dish image recognition model is constructed based on the DenseNet model. The Chinese food dish image recognition model is The network structure includes: N densely connected blocks for feature reuse and N-1 transition layers for compressing the number of parameters; N is a natural number greater than 1.

可选地，所述将经过预处理的所述目标中餐菜品图像输入至中餐菜品图像识别模型中，获得识别结果的步骤，具体包括：Optionally, the step of inputting the preprocessed target Chinese food dish image into the Chinese food dish image recognition model to obtain the recognition result specifically includes:

将经过预处理的所述目标中餐菜品图像输入至中餐菜品图像识别模型，经过所述中餐菜品图像识别模型的第一卷积层、第一批量归一化层和激励层的操作，获得第一特征映射图；The preprocessed target Chinese food dish image is input into the Chinese food dish image recognition model, and through the operations of the first convolution layer, the first batch normalization layer and the excitation layer of the Chinese food dish image recognition model, the first feature map;

将所述第一特征映射图输入至所述中餐菜品图像识别模型的最大池化层，获得第二特征映射图；Input the first feature map into the maximum pooling layer of the Chinese food dish image recognition model to obtain a second feature map;

将所述第二特征映射图输入至所述中餐菜品图像识别模型的第一密集连接块，然后经过第一过渡层的操作，获得第三特征映射图；Input the second feature map into the first dense connection block of the Chinese food dish image recognition model, and then obtain a third feature map through the operation of the first transition layer;

将所述第三特征映射图输入至所述中餐菜品图像识别模型的第二密集连接块，然后经过第二过渡层的操作，获得第四特征映射图；Input the third feature map into the second dense connection block of the Chinese food dish image recognition model, and then obtain a fourth feature map through the operation of the second transition layer;

将所述第四特征映射图输入至所述中餐菜品图像识别模型的第三密集连接块，然后经过第三过渡层的操作，获得第五特征映射图；The fourth feature map is input into the third dense connection block of the Chinese food dish image recognition model, and then through the operation of the third transition layer, a fifth feature map is obtained;

将所述第五特征映射图输入至所述中餐菜品图像识别模型的第四密集连接块，然后经过第四过渡层的操作，获得第六特征映射图；Input the fifth feature map into the fourth dense connection block of the Chinese food dish image recognition model, and then obtain the sixth feature map through the operation of the fourth transition layer;

将所述第六特征映射图输入至所述中餐菜品图像识别模型的第二批量归一化层，然后经过全连接层和分类器的操作，获得中餐菜品识别结果。The sixth feature map is input into the second batch normalization layer of the Chinese food image recognition model, and then through the operation of the fully connected layer and the classifier, the Chinese food recognition result is obtained.

可选地，所述第一密集连接块、第二密集连接块、第三密集连接块和第四密集连接块均包括多个密集连接的瓶颈层，每个所述瓶颈层都有一个包含多种操作的复合函数，所述多种操作包括：批量归一化BN、ReLU激活函数和3×3卷积。Optionally, the first densely connected block, the second densely connected block, the third densely connected block and the fourth densely connected block each include a plurality of densely connected bottleneck layers, and each bottleneck layer has a layer containing multiple densely connected blocks. A composite function of multiple operations including: batch normalized BN, ReLU activation function and 3×3 convolution.

可选地，所述多种操作还包括：1×1卷积。Optionally, the various operations also include: 1×1 convolution.

可选地，所述第一过渡层、第二过渡层、第三过渡层和第四过渡层均执行以下操作：批量归一化BN、ReLU激活函数、1×1卷积和2×2平均池化，步长为2。Optionally, the first transition layer, the second transition layer, the third transition layer and the fourth transition layer all perform the following operations: batch normalized BN, ReLU activation function, 1×1 convolution and 2×2 averaging. Pooling, step size is 2.

可选地，在所述获取目标中餐菜品图像，对所述目标中餐菜品图像执行预处理操作的步骤之前，还包括：Optionally, before the steps of obtaining the target Chinese food dish image and performing a preprocessing operation on the target Chinese food dish image, the method further includes:

构建DenseNet模型，所述DenseNet模型包括依次连接的第一卷积层、第一批量归一化层、激励层、最大池化层、第一密集连接块、第一过渡层、第二密集连接块、第二过渡层、第三密集连接块、第三过渡层、第四密集连接块、第二批量归一化层、全连接层和分类器；Construct a DenseNet model, which includes a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first dense connection block, a first transition layer, and a second dense connection block connected in sequence , the second transition layer, the third dense connection block, the third transition layer, the fourth dense connection block, the second batch normalization layer, the fully connected layer and the classifier;

获取中餐菜品图像样本，对所述中餐菜品图像样本进行预处理；Obtain Chinese food dish image samples, and preprocess the Chinese food dish image samples;

将经过预处理的中餐菜品图像样本输入至所述DenseNet模型，获得输出结果；Input the preprocessed Chinese food dish image samples to the DenseNet model to obtain the output results;

基于所述输出结果和所述中餐菜品图像样本对应的中餐菜品类别标签，利用交叉熵损失函数，计算损失函数值；Based on the output result and the Chinese food category label corresponding to the Chinese food image sample, use the cross entropy loss function to calculate the loss function value;

基于Adam优化算法，从所述DenseNet模型的输出层开始调整所述密集连接型卷积神经网络的各个参数，以使所述损失函数值朝最小化方向移动；Based on the Adam optimization algorithm, adjust various parameters of the densely connected convolutional neural network starting from the output layer of the DenseNet model so that the loss function value moves in the direction of minimization;

判断是否达到训练结束条件，若是，则保存当前迭代所述DenseNet模型的参数，获得训练完成的中餐菜品图像识别模型。Determine whether the training end condition is reached. If so, save the parameters of the DenseNet model in the current iteration to obtain the trained Chinese food dish image recognition model.

可选地，对所述目标中餐菜品图像执行预处理操作，具体为：Optionally, perform a preprocessing operation on the target Chinese food dish image, specifically:

对所述目标中餐菜品图像按照预设角度进行随机中心旋转；Perform random center rotation on the target Chinese food dish image according to a preset angle;

对经过随机中心旋转后的所述目标中餐菜品图像按照预设长宽比进行随机裁剪；Randomly crop the image of the target Chinese food dish after random center rotation according to a preset aspect ratio;

按照预设概率对经过随机裁剪的所述目标中餐菜品图像进行水平翻转；Horizontally flipping the randomly cropped image of the target Chinese food dish according to a preset probability;

对经过水平翻转的所述目标中餐菜品图像进行归一化。The horizontally flipped image of the target Chinese food dish is normalized.

第二方面，本发明实施例提供一种中餐菜品图像识别装置，包括：In a second aspect, an embodiment of the present invention provides a Chinese food dish image recognition device, including:

预处理模块，用于获取目标中餐菜品图像，对所述目标中餐菜品图像执行预处理操作；A preprocessing module, used to obtain a target Chinese food dish image and perform a preprocessing operation on the target Chinese food dish image;

识别模块，用于将经过预处理的所述目标中餐菜品图像输入至中餐菜品图像识别模型中，获得中餐菜品识别结果；A recognition module, configured to input the preprocessed image of the target Chinese food dish into the Chinese food dish image recognition model to obtain a Chinese food dish recognition result;

其中，所述中餐菜品图像识别模型是基于经过预处理的中餐菜品图像样本训练获得的，所述中餐菜品图像识别模型基于DenseNet模型构建，所述中餐菜品图像识别模型的网络结构包括：N个用于实现特征复用的密集连接块和N-1个用于压缩参数数量的过渡层；N为大于1的自然数。Wherein, the Chinese food dish image recognition model is obtained based on training of preprocessed Chinese food dish image samples. The Chinese food dish image recognition model is constructed based on the DenseNet model. The network structure of the Chinese food dish image recognition model includes: N users Dense connection blocks for feature reuse and N-1 transition layers for compressing the number of parameters; N is a natural number greater than 1.

第三方面，本发明实施例提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如第一方面所提供的中餐菜品图像识别方法的步骤。In a third aspect, embodiments of the present invention provide an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the program, the steps described in the first aspect are implemented. The steps of the Chinese food dish image recognition method are provided.

第四方面，本发明实施例提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如第一方面所提供的中餐菜品图像识别方法的步骤。In a fourth aspect, embodiments of the present invention provide a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the steps of the Chinese food dish image recognition method provided in the first aspect are implemented. .

本发明实施例提供的中餐菜品图像识别方法及装置，能够对多种中餐进行准确检测与识别，识别种类广泛、识别准确率高。The Chinese food dish image recognition method and device provided by embodiments of the present invention can accurately detect and identify a variety of Chinese food, with a wide range of recognition types and high recognition accuracy.

附图说明Description of the drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.

图1为本发明实施例提供的中餐菜品图像识别方法的流程示意图；Figure 1 is a schematic flow chart of a Chinese food dish image recognition method provided by an embodiment of the present invention;

图2为本发明实施例提供的中餐菜品图像识别模型的网络结构示意图；Figure 2 is a schematic network structure diagram of the Chinese food dish image recognition model provided by an embodiment of the present invention;

图3为密集连接块dense block的结构示意图；Figure 3 is a schematic structural diagram of the dense block dense block;

图4为瓶颈层的结构示意图；Figure 4 is a schematic structural diagram of the bottleneck layer;

图5为本发明实施例提供的中餐菜品图像识别装置的结构示意图Figure 5 is a schematic structural diagram of a Chinese food dish image recognition device provided by an embodiment of the present invention.

图6为本发明实施例提供的电子设备的实体结构示意图。FIG. 6 is a schematic diagram of the physical structure of an electronic device according to an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, rather than all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts fall within the scope of protection of the present invention.

图1为本发明实施例提供的中餐菜品图像识别方法的流程示意图，包括：Figure 1 is a schematic flowchart of a Chinese food dish image recognition method provided by an embodiment of the present invention, including:

步骤100、获取目标中餐菜品图像，对所述目标中餐菜品图像执行预处理操作；Step 100: Obtain the target Chinese food dish image, and perform a preprocessing operation on the target Chinese food dish image;

具体地，本发明实施例中，采用固定位置的摄像头采集单张目标中餐菜品图像，然后，对所述目标中餐菜品图像执行预处理操作，所述预处理操作包括数据增强操作。常用基本数据增强操作包括如下方式：旋转、平移、缩放、随机遮挡、水平翻转、颜色色差和噪声扰动等，可选取其中某几个数据增强方法对目标中餐菜品图像执行预处理操作。Specifically, in the embodiment of the present invention, a fixed-position camera is used to collect a single target Chinese food dish image, and then a preprocessing operation is performed on the target Chinese food dish image, and the preprocessing operation includes a data enhancement operation. Commonly used basic data enhancement operations include the following methods: rotation, translation, scaling, random occlusion, horizontal flipping, color aberration and noise disturbance, etc. Some of these data enhancement methods can be selected to perform preprocessing operations on the target Chinese food dish image.

步骤101、将经过预处理的所述目标中餐菜品图像输入至中餐菜品图像识别模型中，获得中餐菜品识别结果；Step 101: Input the preprocessed target Chinese food dish image into the Chinese food dish image recognition model to obtain the Chinese food dish recognition result;

具体地，本发明实施例将经过上述预处理操作得到的目标中餐菜品图像输入至预先训练好的中餐菜品图像识别模型中，即可获得中餐菜品识别结果。Specifically, in the embodiment of the present invention, the target Chinese food dish image obtained through the above preprocessing operation is input into the pre-trained Chinese food dish image recognition model, and the Chinese food dish recognition result can be obtained.

其中，所述中餐菜品图像识别模型是基于经过预处理的中餐菜品图像样本以及对应的中餐菜品类别标签训练获得的。Wherein, the Chinese food dish image recognition model is obtained by training based on preprocessed Chinese food dish image samples and corresponding Chinese food dish category labels.

和一般的食物图像相比，中餐菜品图像一般不会像大部分西餐一样表现出独特的空间布局和明显的语义特征，更加难以提取中餐菜品图像的语义信息。因此，在本发明实施例中，中餐菜品图像识别模型基于DenseNet模型构建，因为DenseNet模型不是简单的通过很深或者很宽的网络来获得表征能力，而是通过对低层特征到高层特征的重复使用，将不同层的特征组合连接，增加了之后层输入的多样性，实现了对图像特征的极致利用。并且相对于其他网络，DenseNet模型参数更少，防止梯度消失，减小在小样本数据集上的过拟合，更加简单高效。Compared with general food images, Chinese food dish images generally do not show unique spatial layout and obvious semantic features like most Western food, making it more difficult to extract the semantic information of Chinese food dish images. Therefore, in the embodiment of the present invention, the Chinese food dish image recognition model is built based on the DenseNet model, because the DenseNet model does not simply obtain representation capabilities through a very deep or very wide network, but through the repeated use of low-level features to high-level features , combining and connecting the features of different layers, increasing the diversity of subsequent layer inputs and achieving the ultimate utilization of image features. And compared with other networks, the DenseNet model has fewer parameters, which prevents gradient disappearance and reduces overfitting on small sample data sets, making it simpler and more efficient.

进一步地，基于DenseNet网络模型，中餐菜品图像识别模型的网络结构包括：N个用于实现特征复用的密集连接块和N-1个用于压缩参数数量的过渡层。Furthermore, based on the DenseNet network model, the network structure of the Chinese food dish image recognition model includes: N dense connection blocks for feature reuse and N-1 transition layers for compressing the number of parameters.

不同于其他卷积神经网络，本发明应用密集连接方式实现特征复用，将图像特征利用到极致，能够更好的提取出图像对语义信息，实现更大概率的精确识别。所述密集连接块用于缓解梯度消失、减少训练参数、抗过拟合和实现特征复用，所述过渡层用于压缩参数数量，减少由于引入所述密集连接块所带来的模型复杂化问题。Different from other convolutional neural networks, the present invention uses dense connection methods to realize feature reuse and utilize image features to the extreme, which can better extract semantic information of image pairs and achieve accurate recognition with greater probability. The dense connection block is used to alleviate gradient disappearance, reduce training parameters, resist overfitting and implement feature reuse. The transition layer is used to compress the number of parameters and reduce model complexity caused by the introduction of the dense connection block. question.

本发明实施例提供的中餐菜品图像识别方法，能够对多种中餐进行准确检测与识别，识别种类广泛、识别准确率高。The Chinese food dish image recognition method provided by the embodiment of the present invention can accurately detect and identify a variety of Chinese food, with a wide range of recognition types and high recognition accuracy.

基于上述实施例的内容，所述将经过预处理的所述目标中餐菜品图像输入至中餐菜品图像识别模型中，获得识别结果的步骤，具体包括：Based on the content of the above embodiment, the step of inputting the preprocessed target Chinese food dish image into the Chinese food dish image recognition model to obtain the recognition result specifically includes:

图2为本发明实施例提供的中餐菜品图像识别模型的网络结构示意图，所述中餐菜品图像识别模型包括依次连接的第一卷积层、第一批量归一化层、激励层、最大池化层、第一密集连接块、第一过渡层、第二密集连接块、第二过渡层、第三密集连接块、第三过渡层、第四密集连接块、第二批量归一化层、全连接层和分类器。Figure 2 is a schematic network structure diagram of a Chinese food dish image recognition model provided by an embodiment of the present invention. The Chinese food dish image recognition model includes a first convolution layer, a first batch normalization layer, an excitation layer, and a maximum pooling layer connected in sequence. layer, the first densely connected block, the first transition layer, the second densely connected block, the second transitional layer, the third densely connected block, the third transitional layer, the fourth densely connected block, the second batch normalization layer, the full Connect layers and classifiers.

具体地，目标中餐菜品图像经过预处理后，输入到中餐菜品图像识别模型中，经过第一卷积层的卷积操作、第一批量归一化层的BN操作，以及激励层的RELU激活函数操作后实现降维，得到第一特征图，然后将第一特征映射图输入最大池化层，最大池化层是为了对特征映射图进行下采样，去掉图中不必要的冗余信息，得到第二特征映射图，依次通过四个密集连接块dense block，每个dense block层之间是过渡层。Specifically, after preprocessing, the target Chinese food dish image is input into the Chinese food dish image recognition model, and undergoes the convolution operation of the first convolution layer, the BN operation of the first batch normalization layer, and the RELU activation function of the excitation layer. After the operation, the dimensionality is reduced to obtain the first feature map, and then the first feature map is input into the maximum pooling layer. The maximum pooling layer is to downsample the feature map and remove unnecessary redundant information in the map, and obtain The second feature map passes through four dense blocks in sequence, and there is a transition layer between each dense block layer.

在一个具体的实施例中，首先将像素为224×224的目标中餐菜品图像按顺序进行图2中的卷积、BN和ReLU操作，实现降维，得到像素为112×112的第一特征映射图。然后将第一特征映射图输入最大池化层，最大池化层采用3×3卷积，步长为2。得到像素为56×56的第二特征映射图作为第一密集连接块dense block的输入。In a specific embodiment, the target Chinese food dish image with pixels of 224×224 is first subjected to the convolution, BN and ReLU operations in Figure 2 in order to achieve dimensionality reduction, and a first feature map with pixels of 112×112 is obtained. picture. Then the first feature map is input into the maximum pooling layer, which uses 3×3 convolution with a stride of 2. The second feature map with a pixel size of 56×56 is obtained as the input of the first dense block dense block.

图3为密集连接块dense block的结构示意图，dense block中的一层称为瓶颈bottleneck层。使DenseNet优于其他卷积神经网络的原因在于密集连接块dense block。有了dense block，DenseNet就拥有了缓解梯度消失、参数减少、抗过拟合和特征复用等优点。Figure 3 is a schematic structural diagram of the dense block dense block. One layer in the dense block is called the bottleneck layer. What makes DenseNet superior to other convolutional neural networks is the dense block. With dense block, DenseNet has the advantages of mitigating gradient disappearance, parameter reduction, anti-overfitting and feature reuse.

假设一个dense block有l层，x₀为dense block的输入。每一层都有一个包含三种操作的复合函数H_l(·)，三种操作分别是：BN、ReLU和3×3的卷积。为了更好的改善denseblock之间的信息传递，DenseNet提出一种与众不同的连接方式：密集连接。密集连接是将一个dense block中每层与之后的所有层进行连接，实现特征复用，如图3所示。因此，第l层将之前所有层的特征映射图x₀,...,x_l-1作为输入:Assume that a dense block has l layers, and x ₀ is the input of the dense block. Each layer has a composite function H _l (·) that contains three operations: BN, ReLU and 3×3 convolution. In order to better improve the information transfer between denseblocks, DenseNet proposes a unique connection method: dense connection. Dense connection connects each layer in a dense block to all subsequent layers to achieve feature reuse, as shown in Figure 3. Therefore, the lth layer takes the feature maps x ₀ ,...,x _l-1 of all previous layers as input:

x_l＝H_l([x₀,x₁,...,x_l-1])x _l =H _l ([x ₀ ,x ₁ ,...,x _l-1 ])

其中，[x₀,x₁,...,x_l-1]表示第0，...,l-1层输出的特征映射图经过组合连接后将作为第l层的输入。Among them, [x ₀ , x ₁ ,..., x _l-1 ] indicates that the feature map output by the 0,..., l-1 layer will be used as the input of the l-th layer after combined connection.

可选地，每个所述瓶颈层都有一个包含多种操作的复合函数，所述多种操作包括：批量归一化BN、ReLU激活函数和3×3卷积。Optionally, each bottleneck layer has a composite function including multiple operations, including: batch normalized BN, ReLU activation function and 3×3 convolution.

图4为瓶颈层的结构示意图。考虑到采用密集连接后特征映射图的数量将会很多，于是为了减少特征映射图的数量和降低每张特征映射图的维度，在bottleneck层的3×3的卷积之前加入了1×1卷积可减少计算量。Figure 4 is a schematic structural diagram of the bottleneck layer. Considering that the number of feature maps will be large after using dense connections, in order to reduce the number of feature maps and reduce the dimension of each feature map, a 1×1 volume was added before the 3×3 convolution of the bottleneck layer. Product can reduce the amount of calculation.

进一步地，所述第一过渡层、第二过渡层、第三过渡层和第四过渡层均执行以下操作：批量归一化BN、ReLU激活函数、1×1卷积和2×2平均池化，步长为2。其作用是为了进一步压缩参数数量，每一个dense block的输出特征映射图的维度和通道数都会剧增，而过渡层的卷积操作能够对特征映射图进行降维、平均池化可解决特征映射图的通道数过多的问题，从而防止经过过多的dense block后模型复杂化问题。Further, the first transition layer, the second transition layer, the third transition layer and the fourth transition layer all perform the following operations: batch normalized BN, ReLU activation function, 1×1 convolution and 2×2 average pooling , the step size is 2. Its function is to further compress the number of parameters. The dimensions and number of channels of the output feature map of each dense block will increase dramatically. The convolution operation of the transition layer can reduce the dimensionality of the feature map and average pooling can solve the feature map. The problem of too many channels in the graph prevents the model from becoming complicated after too many dense blocks.

若经过一个dense block生成了m个特征映射图，经过一个过渡层之后生成θm个特征映射图，其中，θ为压缩系数，且0＜θ≤1。当θ＝1时，经过过渡层的特征映射图数量不变。本发明实施例中设置θ＝0.5，经过渡层后特征映射图数量减少一半。If m feature maps are generated through a dense block, θm feature maps are generated after a transition layer, where θ is the compression coefficient, and 0<θ≤1. When θ=1, the number of feature maps passing through the transition layer remains unchanged. In the embodiment of the present invention, θ=0.5 is set, and the number of feature maps is reduced by half after passing through the transition layer.

在一个具体的实施例中，经过四个dense block后的特征映射图的像素分别为56×56，28×28，14×14，7×7。在最后一个dense block之后使用BN和softmax分类器，将全连接层的输出设置为中餐菜品的种类数总数。In a specific embodiment, the pixels of the feature map after four dense blocks are 56×56, 28×28, 14×14, and 7×7 respectively. After the last dense block, BN and softmax classifiers are used, and the output of the fully connected layer is set to the total number of types of Chinese food dishes.

在利用训练好的中餐菜品图像识别模型对目标中餐菜品图像进行识别之前，还需要对中餐菜品图像识别模型进行训练。Before using the trained Chinese food image recognition model to recognize the target Chinese food image, the Chinese food image recognition model also needs to be trained.

基于上述实施例的内容，在所述获取目标中餐菜品图像，对所述目标中餐菜品图像执行预处理操作的步骤之前，还包括：Based on the contents of the above embodiments, before the steps of obtaining the target Chinese food dish image and performing the preprocessing operation on the target Chinese food dish image, it also includes:

步骤200、构建DenseNet模型，所述DenseNet模型包括依次连接的第一卷积层、第一批量归一化层、激励层、最大池化层、第一密集连接块、第一过渡层、第二密集连接块、第二过渡层、第三密集连接块、第三过渡层、第四密集连接块、第二批量归一化层、全连接层和分类器；Step 200: Construct a DenseNet model. The DenseNet model includes a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first dense connection block, a first transition layer, a second Densely connected block, second transition layer, third densely connected block, third transition layer, fourth densely connected block, second batch normalization layer, fully connected layer and classifier;

具体地，本实例中DenseNet模型为改进的DenseNet 169模型，具有如图3所示的网络结构。Specifically, the DenseNet model in this example is an improved DenseNet 169 model, with a network structure as shown in Figure 3.

步骤201、获取中餐菜品图像样本，对所述中餐菜品图像样本进行预处理；Step 201: Obtain Chinese food dish image samples, and preprocess the Chinese food dish image samples;

预处理的目的是实现图像增强。The purpose of preprocessing is to achieve image enhancement.

步骤202、将经过预处理的中餐菜品图像样本输入至所述DenseNet模型，获得输出结果；Step 202: Input the preprocessed Chinese food dish image samples to the DenseNet model to obtain an output result;

步骤203、基于所述输出结果和所述中餐菜品图像样本对应的中餐菜品类别标签，利用交叉熵损失函数，计算损失函数值；Step 203: Based on the output result and the Chinese food category label corresponding to the Chinese food image sample, use the cross entropy loss function to calculate the loss function value;

损失函数采用交叉熵模型，加快收敛速度和权重矩阵的更新的速度。The loss function uses a cross-entropy model to speed up the convergence speed and the update speed of the weight matrix.

步骤204、基于Adam优化算法，从所述DenseNet模型的输出层开始调整所述密集连接型卷积神经网络的各个参数，以使所述损失函数值朝最小化方向移动；Step 204: Based on the Adam optimization algorithm, adjust various parameters of the densely connected convolutional neural network starting from the output layer of the DenseNet model, so that the loss function value moves in the direction of minimization;

训练模型中的优化器采用Adam算法，实现自适应学习率，加快训练速度，增强网络的鲁棒性。The optimizer in the training model uses the Adam algorithm to achieve adaptive learning rate, speed up training, and enhance the robustness of the network.

步骤205、判断是否达到训练结束条件，若是，则保存当前迭代所述DenseNet模型的参数，获得训练完成的中餐菜品图像识别模型。Step 205: Determine whether the training end condition is met. If so, save the parameters of the DenseNet model in the current iteration to obtain the trained Chinese food dish image recognition model.

具体地，利用固定位置的摄像头采集多张单菜品图像保存到数据库中并为每张图像添加类别标签，若是数据库中没有该类别的图像，则为其添加类别标签新建类别，将数据库按比例分为训练集和测试集。训练时，为使该模型在数据集上有更好的分类表现，对网络参数做出如下调整：epoch设置为150；批处理大小为64；优化器选择Adam，可提供自适应学习率，初始学习率为1e-4，大大提高训练速度，增强网络的鲁棒性；因为本发明针对分类问题，所以损失函数采用交叉熵模型，可实现当模型收敛效果差的时候学习速率会加快，当模型效果好的时候学习速率变慢。DenseNet169经过150个epoch后，取最优模型作为最终训练完成的中餐菜品图像识别模型。在训练之后，还可进行测试，测试时，将测试集输入最优模型进行测试，即可得到测试结果。Specifically, a fixed-position camera is used to collect multiple single-dish images and save them in the database, and add a category label to each image. If there is no image of this category in the database, add a category label to create a new category, and divide the database into proportions. are the training set and test set. During training, in order to make the model have better classification performance on the data set, the network parameters are adjusted as follows: epoch is set to 150; the batch size is 64; the optimizer selects Adam, which can provide an adaptive learning rate. The initial The learning rate is 1e-4, which greatly improves the training speed and enhances the robustness of the network; because this invention is aimed at classification problems, the loss function adopts the cross-entropy model, which can realize that the learning rate will be accelerated when the model convergence effect is poor. When the effect is good, the learning rate slows down. After 150 epochs of DenseNet169, the optimal model was selected as the final trained Chinese food dish image recognition model. After training, testing can also be performed. During testing, the test set is input into the optimal model for testing, and the test results can be obtained.

本发明实施例提供的中餐菜品图像识别方法，充分利用了DenseNet网络密集连接方式实现特征复用的优点，结对网络超参的调整不仅大大减少训练参数数量和训练网络的冗余性，还让菜品图像特征得到极致的利用，有利于捕捉到菜品图像的语义信息，经过多次迭代训练即可得到一个识别准确率高、性能优异的训练模型。因为DenseNet泛化能力强，所以本发明不仅适用于识别难度高的中餐，原则上只要经过其他类别食物数据集的训练，即可应用于更多种食物的识别。The Chinese food dish image recognition method provided by the embodiment of the present invention makes full use of the advantages of DenseNet network dense connection mode to achieve feature reuse. The adjustment of paired network super parameters not only greatly reduces the number of training parameters and the redundancy of the training network, but also makes the dishes The image features are fully utilized, which is beneficial to capturing the semantic information of the dish images. After multiple iterative trainings, a training model with high recognition accuracy and excellent performance can be obtained. Because DenseNet has strong generalization ability, the present invention is not only suitable for identifying difficult Chinese food. In principle, it can be applied to the identification of more types of food as long as it is trained on other categories of food data sets.

基于上述实施例的内容，对所述目标中餐菜品图像执行预处理操作，具体为：Based on the content of the above embodiment, a preprocessing operation is performed on the target Chinese food dish image, specifically as follows:

具体地，对所述目标中餐菜品图像按照预设角度，例如-10度到10度之间进行随机中心旋转；Specifically, perform random center rotation on the target Chinese food dish image according to a preset angle, for example, between -10 degrees and 10 degrees;

对经过随机中心旋转后的所述目标中餐菜品图像按照预设长宽比，例如224×224的长宽比进行随机裁剪；Randomly crop the image of the target Chinese food dish after random center rotation according to a preset aspect ratio, such as an aspect ratio of 224×224;

按照预设概率，例如0.5的概率，对经过随机裁剪的所述目标中餐菜品图像进行水平翻转；According to a preset probability, such as a probability of 0.5, horizontally flip the randomly cropped image of the target Chinese food dish;

对经过水平翻转的所述目标中餐菜品图像进行归一化，消除数据特征之间的量纲影响。The horizontally flipped image of the target Chinese food dish is normalized to eliminate the dimensional influence between data features.

本发明实施例提供的预处理操作步骤，有利于获得精准的训练模型和中餐菜品识别结果。The preprocessing steps provided by the embodiments of the present invention are conducive to obtaining accurate training models and Chinese food dish recognition results.

图5为本发明实施例提供的中餐菜品图像识别装置的结构示意图，包括：预处理模块510和识别模块520，其中，Figure 5 is a schematic structural diagram of a Chinese food dish image recognition device provided by an embodiment of the present invention, including: a preprocessing module 510 and a recognition module 520, wherein,

预处理模块510，用于获取目标中餐菜品图像，对所述目标中餐菜品图像执行预处理操作；The preprocessing module 510 is used to obtain the target Chinese food dish image and perform preprocessing operations on the target Chinese food dish image;

识别模块520，用于将经过预处理的所述目标中餐菜品图像输入至中餐菜品图像识别模型中，获得中餐菜品识别结果；The recognition module 520 is used to input the preprocessed target Chinese food dish image into the Chinese food dish image recognition model to obtain the Chinese food dish recognition result;

本发明实施例提供的中餐菜品图像识别装置用于实现前述中餐菜品图像识别方法实施例，因此，对于本发明实施例中各功能模块的理解可以参照前述方法实施例，在此不再赘述。The Chinese food dish image recognition device provided by the embodiment of the present invention is used to implement the foregoing Chinese food dish image recognition method embodiment. Therefore, for understanding of each functional module in the embodiment of the present invention, refer to the foregoing method embodiment and will not be described again here.

本发明实施例提供的中餐菜品图像识别装置，能够对多种中餐进行准确检测与识别，识别种类广泛、识别准确率高。The Chinese food dish image recognition device provided by the embodiment of the present invention can accurately detect and identify a variety of Chinese food, with a wide range of recognition types and high recognition accuracy.

图6为本发明实施例提供的电子设备的实体结构示意图，如图6所示，该电子设备可以包括：处理器(processor)610、通信接口(Communications Interface)620、存储器(memory)630和通信总线640，其中，处理器610，通信接口620，存储器630通过通信总线640完成相互间的通信。处理器610可以调用存储在存储器630上并可在处理器610上运行的计算机程序，以执行上述各方法实施例所提供的中餐菜品图像识别方法，例如包括：获取目标中餐菜品图像，对所述目标中餐菜品图像执行预处理操作；将经过预处理的所述目标中餐菜品图像输入至中餐菜品图像识别模型中，获得中餐菜品识别结果；其中，所述中餐菜品图像识别模型是基于经过预处理的中餐菜品图像样本以及对应的中餐菜品类别标签训练获得的，所述中餐菜品图像识别模型基于DenseNet模型构建，所述中餐菜品图像识别模型的网络结构包括：N个用于实现特征复用的密集连接块和N-1个用于压缩参数数量的过渡层；N为大于1的自然数。Figure 6 is a schematic diagram of the physical structure of an electronic device provided by an embodiment of the present invention. As shown in Figure 6, the electronic device may include: a processor (processor) 610, a communications interface (Communications Interface) 620, a memory (memory) 630 and a communication interface. Bus 640, in which the processor 610, the communication interface 620, and the memory 630 complete communication with each other through the communication bus 640. The processor 610 can call a computer program stored on the memory 630 and executable on the processor 610 to execute the Chinese food dish image recognition method provided by each of the above method embodiments, for example, including: acquiring the target Chinese food dish image, and The target Chinese food dish image performs a preprocessing operation; the preprocessed target Chinese food dish image is input into the Chinese food dish image recognition model to obtain the Chinese food dish recognition result; wherein the Chinese food dish image recognition model is based on the preprocessed Chinese food dish image samples and corresponding Chinese food dish category labels are obtained through training. The Chinese food dish image recognition model is built based on the DenseNet model. The network structure of the Chinese food dish image recognition model includes: N dense connections for feature reuse. block and N-1 transition layers used to compress the number of parameters; N is a natural number greater than 1.

此外，上述的存储器630中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logical instructions in the memory 630 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the embodiment of the present invention is essentially, or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, It includes several instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the method described in various embodiments of the present invention. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code. .

本发明实施例还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现上述各方法实施例提供的中餐菜品图像识别方法，例如包括：获取目标中餐菜品图像，对所述目标中餐菜品图像执行预处理操作；将经过预处理的所述目标中餐菜品图像输入至中餐菜品图像识别模型中，获得中餐菜品识别结果；其中，所述中餐菜品图像识别模型是基于经过预处理的中餐菜品图像样本以及对应的中餐菜品类别标签训练获得的，所述中餐菜品图像识别模型基于DenseNet模型构建，所述中餐菜品图像识别模型的网络结构包括：N个用于实现特征复用的密集连接块和N-1个用于压缩参数数量的过渡层；N为大于1的自然数。Embodiments of the present invention also provide a non-transitory computer-readable storage medium on which a computer program is stored. When executed by a processor, the computer program implements the Chinese food dish image recognition method provided by the above method embodiments. For example, it includes: obtaining target Chinese food dish image, perform a preprocessing operation on the target Chinese food dish image; input the preprocessed target Chinese food dish image into the Chinese food dish image recognition model to obtain the Chinese food dish recognition result; wherein, the Chinese food dish image The recognition model is obtained based on preprocessed Chinese food image samples and corresponding Chinese food category labels. The Chinese food image recognition model is built based on the DenseNet model. The network structure of the Chinese food image recognition model includes: N users Dense connection blocks for feature reuse and N-1 transition layers for compressing the number of parameters; N is a natural number greater than 1.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative. The units described as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in One location, or it can be distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. Persons of ordinary skill in the art can understand and implement the method without any creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and of course, it can also be implemented by hardware. Based on this understanding, the part of the above technical solution that essentially contributes to the existing technology can be embodied in the form of a software product. The computer software product can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., including a number of instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be used Modifications are made to the technical solutions described in the foregoing embodiments, or equivalent substitutions are made to some of the technical features; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The Chinese meal image recognition method is characterized by comprising the following steps of:

acquiring a target Chinese food image, and performing preprocessing operation on the target Chinese food image;

inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a Chinese food recognition result;

the Chinese meal image recognition model is obtained based on a preprocessed Chinese meal image sample and a corresponding Chinese meal category label, and is constructed based on a DenseNet model, and the network structure of the Chinese meal image recognition model comprises: n dense connecting blocks for realizing feature multiplexing and N-1 transition layers for compressing parameter quantity; n is a natural number greater than 1;

the step of inputting the preprocessed target Chinese food image into a Chinese food image recognition model to obtain a recognition result specifically comprises the following steps:

inputting the preprocessed target Chinese food image into a Chinese food image recognition model, and obtaining a first characteristic map through the operation of a first convolution layer, a first batch normalization layer and an excitation layer of the Chinese food image recognition model;

inputting the first characteristic map to a maximum pooling layer of the Chinese dish image recognition model to obtain a second characteristic map;

inputting the second characteristic map to a first intensive connection block of the Chinese dish image recognition model, and then obtaining a third characteristic map through the operation of a first transition layer;

inputting the third characteristic map to a second intensive connection block of the Chinese dish image recognition model, and then obtaining a fourth characteristic map through the operation of a second transition layer;

inputting the fourth characteristic map to a third intensive connection block of the Chinese meal dish image recognition model, and then obtaining a fifth characteristic map through the operation of a third transition layer;

inputting the fifth characteristic map to a fourth intensive connection block of the Chinese dish image recognition model to obtain a sixth characteristic map;

inputting the sixth characteristic map to a second batch normalization layer of the Chinese meal dish image recognition model, and then obtaining a Chinese meal dish recognition result through the operation of a full connection layer and a classifier;

the preprocessing operation is executed on the target Chinese food image, specifically:

randomly and centrally rotating the target Chinese food image according to a preset angle;

randomly cutting the target Chinese food image after random center rotation according to a preset length-width ratio;

performing horizontal overturning on the target Chinese food image subjected to random clipping according to preset probability;

and normalizing the target Chinese food image subjected to horizontal overturning.

2. The method of claim 1, wherein the first, second, third, and fourth densely connected blocks each comprise a plurality of densely connected bottleneck layers, each bottleneck layer having a composite function comprising a plurality of operations, the plurality of operations comprising: batch normalization BN, reLU activation function and 3 x 3 convolution.

3. The method of claim 2, wherein the plurality of operations further comprises: 1 x 1 convolution.

4. The method of claim 1, wherein the first transition layer, the second transition layer, and the third transition layer each perform the following operations: batch normalization BN, reLU activation function, 1×1 convolution and 2×2 average pooling, step size 2.

5. The method of claim 1, further comprising, prior to the step of obtaining the target chinese meal image and performing a preprocessing operation on the target chinese meal image:

constructing a DenseNet model, wherein the DenseNet model comprises a first convolution layer, a first batch normalization layer, an excitation layer, a maximum pooling layer, a first intensive connection block, a first transition layer, a second intensive connection block, a second transition layer, a third intensive connection block, a third transition layer, a fourth intensive connection block, a second batch normalization layer, a full connection layer and a classifier which are sequentially connected;

acquiring a Chinese meal dish image sample, and preprocessing the Chinese meal dish image sample;

inputting the preprocessed Chinese meal image sample into the DenseNet model to obtain an output result;

calculating a loss function value by using a cross entropy loss function based on the output result and a Chinese food category label corresponding to the Chinese food image sample;

based on an Adam optimization algorithm, starting from an output layer of the DenseNet model, adjusting various parameters of the densely connected convolutional neural network so as to move the loss function value towards a minimizing direction;

judging whether the training ending condition is reached, if so, saving the parameters of the DenseNet model of the current iteration to obtain a training-completed Chinese dish image recognition model.

6. A Chinese meal image recognition device, comprising:

the preprocessing module is used for acquiring a target Chinese food image and executing preprocessing operation on the target Chinese food image;

the identification module is used for inputting the preprocessed target Chinese food image into a Chinese food image identification model to obtain a Chinese food identification result;

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the method for identifying a chinese meal image according to any one of claims 1 to 5 when the program is executed.

8. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the chinese meal order image recognition method of any one of claims 1 to 5.