CN111539470A

CN111539470A - Image processing method, device, computer equipment and storage medium

Info

Publication number: CN111539470A
Application number: CN202010313007.4A
Authority: CN
Inventors: 韦鹏程; 黄思行; 颜蓓
Original assignee: Chongqing University of Education
Current assignee: Chongqing University of Education
Priority date: 2020-04-20
Filing date: 2020-04-20
Publication date: 2020-08-14

Abstract

This solution relates to an image processing method. The method includes: acquiring an image to be processed; the image to be processed contains a food area; searching for an initial target detection algorithm, and adjusting parameters in the initial target detection algorithm to obtain a target detection algorithm; and extracting the target detection algorithm according to the target detection algorithm. Food area, and extract the visual features in the food area through the deep neural network algorithm; classify the images to be processed according to the visual features. By adjusting the parameters in the initial target detection algorithm, the accuracy of extracting the food area in the image to be processed can be improved, while the deep neural network algorithm has good visual feature recognition ability and has a better effect in food image processing. The accuracy of image processing can be improved.

Description

Image processing method, device, computer equipment and storage medium

技术领域technical field

本发明涉及计算机技术领域，特别是涉及一种图像处理方法、装置、计算机设备及存储介质。The present invention relates to the field of computer technology, and in particular, to an image processing method, an apparatus, a computer device and a storage medium.

背景技术Background technique

计算机技术的发展带来了图像信息的大量交流与运用，图像的检索以及分类技术也得到了越来越广泛的应用。图像的检索或者分类往往是通过文本来进行的，基于文本识别来进行图像处理，即用标题、时间、环境等文本信息来描述图像，在进行图像处理时，采用文本匹配进行检索或者分类。由于通过文本来进行图像检索和分类已经不能满足人们的日常需求，于是基于图像内容的图像处理方法被研究出来。食品图像的处理在食品、健康等诸多领域广泛应用，食品图像可以进一步帮助人们评估食品的热量，分析人们的饮食习惯，提供个性化的服务。The development of computer technology has brought a lot of communication and application of image information, and image retrieval and classification technology has also been widely used. Image retrieval or classification is often carried out through text, and image processing is carried out based on text recognition, that is, text information such as title, time, environment, etc. is used to describe the image. During image processing, text matching is used for retrieval or classification. Since image retrieval and classification through text can no longer meet people's daily needs, image processing methods based on image content have been studied. Food image processing is widely used in many fields such as food and health. Food images can further help people evaluate the calories of food, analyze people's eating habits, and provide personalized services.

由于拍摄的食物图片不仅包含食物本身的视觉信息，还包含各种背景信息，因此，传统的图像处理方法在进行图像分类或者检索的时候往往存在精确度较低的问题。Because the food pictures captured not only contain the visual information of the food itself, but also contain various background information, the traditional image processing methods often have the problem of low accuracy in image classification or retrieval.

发明内容SUMMARY OF THE INVENTION

基于此，为了解决上述技术问题，提供一种图像处理方法、装置、计算机设备及存储介质，可以提高图像处理的精确度。Based on this, in order to solve the above technical problems, an image processing method, device, computer equipment and storage medium are provided, which can improve the accuracy of image processing.

一种图像处理方法，所述方法包括：An image processing method, the method comprising:

获取待处理图像；所述待处理图像中包含有食物区域；acquiring an image to be processed; the image to be processed includes a food area;

查找初始目标检测算法，并调整所述初始目标检测算法中的参数，得到目标检测算法；Find an initial target detection algorithm, and adjust the parameters in the initial target detection algorithm to obtain a target detection algorithm;

根据所述目标检测算法提取所述待处理图像中的食物区域，并通过深度神经网络算法提取所述食物区域中的视觉特征；Extract the food area in the to-be-processed image according to the target detection algorithm, and extract the visual features in the food area through a deep neural network algorithm;

根据所述视觉特征对所述待处理图像进行分类。The images to be processed are classified according to the visual features.

在其中一个实施例中，所述调整所述初始目标检测算法中的参数，得到目标检测算法，包括：In one embodiment, the adjustment of parameters in the initial target detection algorithm to obtain a target detection algorithm includes:

获取图像数据库，并从所述图像数据库中查找食物图像；obtaining a database of images, and looking up food images from the database of images;

根据所述食物图像生成食物图像数据集；generating a food image dataset from the food images;

使用所述食物图像数据集调整所述初始目标检测算法中的参数，得到所述目标检测算法。The target detection algorithm is obtained by adjusting the parameters in the initial target detection algorithm using the food image data set.

在其中一个实施例中，所述根据所述目标检测算法提取所述待处理图像中的食物区域，包括：In one embodiment, the extracting the food area in the to-be-processed image according to the target detection algorithm includes:

根据所述目标检测算法分别获取所述待处理图像的候选帧；Obtain candidate frames of the to-be-processed image respectively according to the target detection algorithm;

计算出各个所述候选帧的候选帧比重；Calculate the candidate frame proportion of each of the candidate frames;

根据所述候选帧比重提取所述待处理图像中的食物区域。Extract the food area in the to-be-processed image according to the candidate frame weight.

在其中一个实施例中，所述通过深度神经网络算法提取所述食物区域中的视觉特征，包括：In one of the embodiments, the extraction of visual features in the food area through a deep neural network algorithm includes:

获取比重阈值，并将高于所述比重阈值的所述候选帧比重所对应的食物区域作为候选区域；Obtain a specific gravity threshold, and use the food area corresponding to the candidate frame specific gravity higher than the specific gravity threshold as a candidate area;

根据所述候选区域的候选帧比重，得到所述候选区域的候选帧坐标；According to the candidate frame weight of the candidate area, obtain the candidate frame coordinates of the candidate area;

根据所述候选帧坐标提取所述候选区域的子特征；Extract the sub-features of the candidate region according to the candidate frame coordinates;

根据所述子特征，得到所述食物区域中的视觉特征。According to the sub-features, the visual features in the food region are obtained.

在其中一个实施例中，所述根据所述视觉特征对所述待处理图像进行分类，包括：In one embodiment, the classifying the to-be-processed image according to the visual feature includes:

提取所述视觉特征中的颜色特征，并提取所述视觉特征中的形状特征；Extracting the color feature in the visual feature, and extracting the shape feature in the visual feature;

根据所述颜色特征以及所述形状特征，对所述待处理图像进行分类，得到所述待处理图像的类别。According to the color feature and the shape feature, the image to be processed is classified to obtain the category of the image to be processed.

在其中一个实施例中，所述根据所述颜色特征以及所述形状特征，对所述待处理图像进行分类，得到所述待处理图像的类别，包括：In one embodiment, the classification of the image to be processed according to the color feature and the shape feature to obtain the category of the image to be processed includes:

根据所述颜色特征以及所述形状特征，获取所述待处理图像的边缘特征图；According to the color feature and the shape feature, obtain the edge feature map of the to-be-processed image;

根据所述边缘特征图计算边缘点数，并根据所述边缘点数得到图像直方图；Calculate the number of edge points according to the edge feature map, and obtain an image histogram according to the number of edge points;

将所述图像直方图输入至图像分类器中，得到所述待处理图像的类别。The image histogram is input into an image classifier to obtain the category of the image to be processed.

在其中一个实施例中，所述方法还包括：In one embodiment, the method further includes:

获取待检索图像，并提取所述待检索图像的图像特征；acquiring an image to be retrieved, and extracting image features of the image to be retrieved;

根据所述图像特征在图像库中进行图像检索，得到初始检索图像；Perform image retrieval in the image library according to the image features to obtain an initial retrieval image;

计算所述初始检索图像与所述待检索图像之间的相似度；calculating the similarity between the initial retrieval image and the to-be-retrieved image;

根据所述相似度得到目标检索图像。The target retrieval image is obtained according to the similarity.

一种图像处理装置，所述装置包括：An image processing device, the device comprising:

图像获取模块，用于获取待处理图像；所述待处理图像中包含有食物区域；an image acquisition module for acquiring an image to be processed; the image to be processed includes a food area;

参数调整模块，用于查找初始目标检测算法，并调整所述初始目标检测算法中的参数，得到目标检测算法；a parameter adjustment module, used to find an initial target detection algorithm, and adjust parameters in the initial target detection algorithm to obtain a target detection algorithm;

特征提取模块，用于根据所述目标检测算法提取所述待处理图像中的食物区域，并通过深度神经网络算法提取所述食物区域中的视觉特征；a feature extraction module, used for extracting the food area in the to-be-processed image according to the target detection algorithm, and extracting visual features in the food area through a deep neural network algorithm;

图像分类模块，用于根据所述视觉特征对所述待处理图像进行分类。An image classification module, configured to classify the to-be-processed image according to the visual feature.

一种计算机设备，包括存储器和处理器，所述存储器存储有计算机程序，所述处理器执行所述计算机程序时实现以下步骤：A computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:

一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现以下步骤：A computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

上述图像处理方法、装置、计算机设备及存储介质，通过获取待处理图像；待处理图像中包含有食物区域；查找初始目标检测算法，并调整初始目标检测算法中的参数，得到目标检测算法；根据目标检测算法提取待处理图像中的食物区域，并通过深度神经网络算法提取食物区域中的视觉特征；根据视觉特征对待处理图像进行分类。通过对初始目标检测算法中的参数进行调整，可以提高提取待处理图像中的食物区域的准确性，而深度神经网络算法具有良好的视觉特征识别能力，在食品图像处理中有着更好的效果，可以提高图像处理的精确度。The above-mentioned image processing method, device, computer equipment and storage medium obtain a target detection algorithm by acquiring an image to be processed; the to-be-processed image includes a food area; searching for an initial target detection algorithm, and adjusting parameters in the initial target detection algorithm; The target detection algorithm extracts the food area in the image to be processed, and extracts the visual features in the food area through the deep neural network algorithm; the image to be processed is classified according to the visual features. By adjusting the parameters in the initial target detection algorithm, the accuracy of extracting the food area in the image to be processed can be improved, while the deep neural network algorithm has good visual feature recognition ability and has a better effect in food image processing. The accuracy of image processing can be improved.

附图说明Description of drawings

图1为一个实施例中图像处理方法的应用环境图；Fig. 1 is the application environment diagram of the image processing method in one embodiment;

图2为一个实施例中图像处理方法的流程示意图；2 is a schematic flowchart of an image processing method in one embodiment;

图3为一个实施例中提取食物区域中的视觉特征的流程示意图；3 is a schematic flowchart of extracting visual features in a food area in one embodiment;

图4为实验中不同方法精度比较结果分析的示意图；Figure 4 is a schematic diagram of the analysis of the accuracy comparison results of different methods in the experiment;

图5为实验中使用15种不同方法随机抽取菜肴的分类结果的示意图；Figure 5 is a schematic diagram of the classification results of randomly selected dishes using 15 different methods in the experiment;

图6为实验中使用Precision@K对比不同方法的检索性能的示意图；Figure 6 is a schematic diagram of using Precision@K to compare the retrieval performance of different methods in the experiment;

图7为实验中使用MPA@K对比不同方法的检索性能的示意图；Figure 7 is a schematic diagram of comparing the retrieval performance of different methods using MPA@K in the experiment;

图8为一个实施例中图像处理装置的结构框图；8 is a structural block diagram of an image processing apparatus in one embodiment;

图9为一个实施例中计算机设备的内部结构图。Figure 9 is a diagram of the internal structure of a computer device in one embodiment.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clearly understood, the present application will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

本申请实施例提供的图像处理方法，可以应用于如图1所示的应用环境中。如图1所示，该应用环境包括计算机设备110。计算机设备110可以获取待处理图像；待处理图像中包含有食物区域；计算机设备110可以查找初始目标检测算法，并调整初始目标检测算法中的参数，得到目标检测算法；计算机设备110可以根据目标检测算法提取待处理图像中的食物区域，并通过深度神经网络算法提取食物区域中的视觉特征；计算机设备110可以根据视觉特征对待处理图像进行分类。其中，计算机设备110可以但不限于是各种个人计算机、笔记本电脑、智能手机、机器人、无人飞行器、平板电脑和便携式可穿戴设备等。The image processing method provided by the embodiment of the present application can be applied to the application environment shown in FIG. 1 . As shown in FIG. 1 , the application environment includes a computer device 110 . The computer device 110 can acquire the image to be processed; the image to be processed contains the food area; the computer device 110 can search for the initial target detection algorithm, and adjust the parameters in the initial target detection algorithm to obtain the target detection algorithm; the computer device 110 can detect the target according to the The algorithm extracts the food area in the image to be processed, and extracts the visual features in the food area through a deep neural network algorithm; the computer device 110 can classify the image to be processed according to the visual features. The computer device 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, robots, unmanned aerial vehicles, tablet computers, portable wearable devices, and the like.

在一个实施例中，如图2所示，提供了一种图像处理方法，包括以下步骤：In one embodiment, as shown in Figure 2, an image processing method is provided, comprising the following steps:

步骤202，获取待处理图像；待处理图像中包含有食物区域。Step 202, acquiring an image to be processed; the image to be processed includes a food area.

待处理图像可以是需要进行图像分类、图像检索的图像。计算机设备可以展示显示界面，用户可以通过显示界面输入待处理图像。其中，待处理图像中可以包含有食物区域。The image to be processed may be an image that needs to be classified and retrieved. The computer device can display a display interface, and the user can input the image to be processed through the display interface. The image to be processed may contain food regions.

步骤204，查找初始目标检测算法，并调整初始目标检测算法中的参数，得到目标检测算法。Step 204 , searching for an initial target detection algorithm, and adjusting parameters in the initial target detection algorithm to obtain a target detection algorithm.

初始目标检测算法可以用于目标检测，具体的，初始目标检测算法可以是卷积神经网络算法(CNN，Convolutional Neural Networks)。The initial target detection algorithm may be used for target detection. Specifically, the initial target detection algorithm may be a convolutional neural network algorithm (CNN, Convolutional Neural Networks).

神经网络由一组神经节点组成，这些节点被分成许多层，每一层都通过一个完整连接来连接起来，也就是说，每个节点都与前一层中的所有节点相关联。每个节点的输入是前一层所有节点的输出，当前节点使用其权重系数与输入数据之和作为节点的输出。权值参数是通过训练得到的，表示任务中前一层对应输入节点的重要比重。如果权重值较大，则会提高相应节点的输出值，影响更深层次的网络。A neural network consists of a set of neural nodes that are divided into many layers, each connected by a full connection, that is, each node is associated with all nodes in the previous layer. The input of each node is the output of all nodes in the previous layer, and the current node uses the sum of its weight coefficient and the input data as the output of the node. The weight parameter is obtained through training and represents the important proportion of the input node corresponding to the previous layer in the task. If the weight value is larger, the output value of the corresponding node will be increased, affecting the deeper network.

卷积神经网络有三个核心思想，即局部接收领域、权重共享和池化。其中，局部接收领域即每个神经节点只与局部图像区域相关，与全局图像无关。神经节点可以学习局部接收邻域的一些基本特征，如边角等。同时，由于每个节点只需要与图像中的一个小的局部区域相关联，因此可以减少需要训练的权重参数。权值共享可以用于表示在卷积神经网络中，同一卷积核在输入图像的不同局部区域的权值是相同的。即，如果主特征检测器能够检测到图像的一个区域中的特征，那么检测器在图像的其他位置也是有效的，从而能够克服图像变形或偏移等变化。池化是指对图像进行降维以减小图像的大小，学习到的图像特征的准确位置并不重要，只需要记录相对位置。另外，由于不同的图像可能存在偏移、变形等问题，特征的精确定位将对下一步的学习产生一定的影响。Convolutional neural networks have three core ideas, namely local receptive fields, weight sharing, and pooling. Among them, the local receiving field, that is, each neural node is only related to the local image area, and has nothing to do with the global image. Neural nodes can learn some basic features of local receiver neighborhoods, such as edges and corners. At the same time, since each node only needs to be associated with a small local area in the image, the weight parameters that need to be trained can be reduced. Weight sharing can be used to indicate that in a convolutional neural network, the weights of the same convolution kernel in different local regions of the input image are the same. That is, if the primary feature detector is able to detect a feature in one area of the image, then the detector is also effective elsewhere in the image, thereby being able to overcome variations such as image warping or offset. Pooling refers to reducing the dimensionality of the image to reduce the size of the image. The exact position of the learned image features is not important, only the relative position needs to be recorded. In addition, since different images may have problems such as offset and deformation, the precise location of features will have a certain impact on the next step of learning.

卷积神经网络作为一种预反馈神经网络，主要包括卷积层、池化层、全连接层和损失函数层。卷积层对输入数据执行卷积操作，每个卷积层包含多个卷积核。卷积核实质上是一组具有固定权重的滤波器矩阵。卷积运算是通过滑动窗口对输入数据执行内积运算的卷积核。卷积层是网络中局部接受域和权重共享思想的体现，每个卷积核也可以称为“滤波器”，用于学习输入图像的一类特征，不同的卷积核可以得到不同的特征，如颜色深度、轮廓等。一般来说，较浅的卷积层只能提取一些低级的视觉特征，如形状、边缘、纹理等，而更深层次的网络可以通过先前的低层次特征为目标任务提取一些高层视觉语义特征。As a kind of pre-feedback neural network, convolutional neural network mainly includes convolutional layer, pooling layer, fully connected layer and loss function layer. Convolutional layers perform convolution operations on the input data, and each convolutional layer contains multiple convolution kernels. A convolution kernel is essentially a set of filter matrices with fixed weights. The convolution operation is a convolution kernel that performs an inner product operation on the input data through a sliding window. The convolution layer is the embodiment of the idea of local receptive field and weight sharing in the network. Each convolution kernel can also be called a "filter", which is used to learn a class of features of the input image. Different convolution kernels can obtain different features. , such as color depth, outline, etc. Generally speaking, shallower convolutional layers can only extract some low-level visual features, such as shape, edge, texture, etc., while deeper networks can extract some high-level visual-semantic features for target tasks through previous low-level features.

池化层对输入数据进行下采样。非线性池函数有多种形式，如最大池化MaxPooling、平均池化MeanPooling、高斯池化GaussPooling等，其中以最大值池化最为常见，最大池化将输入图像分割成几个矩形区域，并为每个子区域输出最大值。池化层会不断减小数据的空间大小，因此参数的个数和计算量也会减少，这也在一定程度上抑制了过度拟合。The pooling layer downsamples the input data. There are many forms of nonlinear pooling functions, such as max pooling MaxPooling, average pooling MeanPooling, Gaussian pooling GaussPooling, etc. Among them, max pooling is the most common, and max pooling divides the input image into several rectangular areas, and is Each subregion outputs the maximum value. The pooling layer will continue to reduce the size of the data space, so the number of parameters and the amount of computation will also be reduced, which also inhibits overfitting to a certain extent.

全连接层和损耗函数层对应于经典的神经网络。全连接层可视为神经网络的隐层，损失函数层是整和整个网络的训练目标，可以用于计算网络预测结果与实际结果之间的关系，通常是网络的最后一层。The fully connected layer and the loss function layer correspond to the classical neural network. The fully connected layer can be regarded as the hidden layer of the neural network, and the loss function layer is the training target of the whole network.

卷积神经网络参数训练与神经网络一样，仍采用反向传播算法。该算法首先将训练数据发送到网络获得网络输出，然后通过损失函数计算网络输出和目标输出来计算误差，并根据得到的误差反算网络各层的残差，以更新参数。The parameter training of the convolutional neural network is the same as that of the neural network, and the back-propagation algorithm is still used. The algorithm first sends the training data to the network to obtain the network output, then calculates the error by calculating the network output and the target output through the loss function, and inversely calculates the residuals of each layer of the network according to the obtained error to update the parameters.

计算机设备可以对初始目标检测算法中的参数进行调整，从而得到目标检测算法。其中，目标检测算法可以是R-CNN网络，R-CNN网络可以基于卷积神经网络(CNN)，线性回归和支持向量机(SVM)等算法实现目标检测技术。The computer equipment can adjust the parameters in the initial target detection algorithm to obtain the target detection algorithm. Among them, the target detection algorithm can be an R-CNN network, and the R-CNN network can realize the target detection technology based on algorithms such as convolutional neural network (CNN), linear regression and support vector machine (SVM).

步骤206，根据目标检测算法提取待处理图像中的食物区域，并通过深度神经网络算法提取食物区域中的视觉特征。Step 206 , extract the food area in the image to be processed according to the target detection algorithm, and extract the visual features in the food area through the deep neural network algorithm.

具体的，计算机设备可以利用目标检测算法检测待处理图像中的食物区域，从而提取出待处理图像中的食物区域。深度神经网络算法可以是CNN深度神经网络，计算机设备可以通过CNN深度神经网络提取食物区域中的视觉特征。Specifically, the computer device can detect the food area in the image to be processed by using the target detection algorithm, so as to extract the food area in the image to be processed. The deep neural network algorithm can be a CNN deep neural network, and the computer equipment can extract visual features in the food area through the CNN deep neural network.

步骤208，根据视觉特征对待处理图像进行分类。Step 208 , classify the images to be processed according to the visual features.

在本实施例中，计算机设备通过获取待处理图像；待处理图像中包含有食物区域；查找初始目标检测算法，并调整初始目标检测算法中的参数，得到目标检测算法；根据目标检测算法提取待处理图像中的食物区域，并通过深度神经网络算法提取食物区域中的视觉特征；根据视觉特征对待处理图像进行分类。通过对初始目标检测算法中的参数进行调整，可以提高提取待处理图像中的食物区域的准确性，而深度神经网络算法具有良好的视觉特征识别能力，在食品图像处理中有着更好的效果，可以提高图像处理的精确度。In this embodiment, the computer device obtains the image to be processed; the image to be processed contains food areas; searches for the initial target detection algorithm and adjusts the parameters in the initial target detection algorithm to obtain the target detection algorithm; extracts the target detection algorithm according to the target detection algorithm The food area in the image is processed, and the visual features in the food area are extracted by the deep neural network algorithm; the image to be processed is classified according to the visual feature. By adjusting the parameters in the initial target detection algorithm, the accuracy of extracting the food area in the image to be processed can be improved, while the deep neural network algorithm has good visual feature recognition ability and has a better effect in food image processing. The accuracy of image processing can be improved.

在一个实施例中，提供的一种图像处理方法还可以包括得到目标检测算法的过程，具体过程包括：获取图像数据库，并从图像数据库中查找食物图像；根据食物图像生成食物图像数据集；使用食物图像数据集调整初始目标检测算法中的参数，得到目标检测算法。In one embodiment, an image processing method provided may further include a process of obtaining a target detection algorithm, and the specific process includes: acquiring an image database, and searching for food images from the image database; generating a food image data set according to the food images; using The food image dataset adjusts the parameters in the initial target detection algorithm to obtain the target detection algorithm.

图像数据库中可以存储有多张不同类型的图像，计算机设备可以对图像数据库中的图像进行识别，并对包含有食物的图像进行标记。计算机设备获取到图像数据库后，可以从图像数据库中查找到被标记的图像，即食物图像。计算机设备可以根据食物图像得到食物图像的集合，从而生成食物图像数据集，进而使用食物图像数据集调整初始目标检测算法中的参数，得到目标检测算法。A plurality of images of different types can be stored in the image database, and the computer equipment can identify the images in the image database and mark the images containing food. After the computer device acquires the image database, the marked image, that is, the food image, can be found from the image database. The computer device can obtain a set of food images according to the food images, thereby generating a food image data set, and then use the food image data set to adjust the parameters in the initial target detection algorithm to obtain the target detection algorithm.

由于大多数食物图像都含有不相关的背景信息，只对食物区域进行特征提取，将减少食物图像背景信息的影响，因此，需要对图像中的食物区域进行检测，通过R-CNN可以检测图像中的食物区域。具体的，计算机设备可以先从视觉基因库中选择食品相关类别的图像，然后使用选定的食物图像数据集对R-CNN进行微调。目标检测算法的公式为：

其中，i是小的anchor指数，p_i表示anchor i的预测概率，如果anchor为正，则p^*i为1，否则p^*i为0。t_i表示预测外围的四个参数坐标，t^*i为正anchor对应地面truthbox的坐标矢量，分类损失Lcls为两类(目标和非目标)的对数损失。Since most food images contain irrelevant background information, only extracting the features of the food area will reduce the influence of the background information of the food image. Therefore, it is necessary to detect the food area in the image. R-CNN can detect the image in the image. food area. Specifically, the computer equipment can first select images of food-related categories from the visual gene bank, and then fine-tune the R-CNN using the selected food image dataset. The formula of the target detection algorithm is:

Among them, i is the small anchor index, pi represents the predicted probability of anchor _i , if the anchor is positive, then p ^* i is 1, otherwise p ^* i is 0. t _i represents the four parameter coordinates of the prediction periphery, t ^* i is the coordinate vector of the positive anchor corresponding to the ground truthbox, and the classification loss Lcls is the logarithmic loss of the two categories (target and non-target).

在一个实施例中，提供的一种图像处理方法还可以包括提取食物区域的过程，具体过程包括：根据目标检测算法分别获取待处理图像的候选帧；计算出各个候选帧的候选帧比重；根据候选帧比重提取待处理图像中的食物区域。In one embodiment, an image processing method provided may further include a process of extracting a food area, and the specific process includes: separately acquiring candidate frames of the image to be processed according to a target detection algorithm; calculating the candidate frame proportions of each candidate frame; Candidate frame weights extract food regions in the image to be processed.

计算机设备在对初始检测算法进行微调后，可以利用微调后的目标检测算法计算待处理图像的候选帧和每个候选帧比重，从而提取待处理图像中的食物区域。After fine-tuning the initial detection algorithm, the computer device can use the fine-tuned target detection algorithm to calculate candidate frames of the image to be processed and the proportion of each candidate frame, so as to extract the food region in the image to be processed.

如图3所示，在一个实施例中，提供的一种图像处理方法还可以包括得到视觉特征的过程，具体步骤包括：As shown in FIG. 3, in one embodiment, an image processing method provided may further include a process of obtaining visual features, and the specific steps include:

步骤302，获取比重阈值，并将高于比重阈值的候选帧比重所对应的食物区域作为候选区域。In step 302, a specific gravity threshold is obtained, and a food region corresponding to a candidate frame with a specific gravity higher than the specific gravity threshold is used as a candidate region.

比重阈值可以是预先设置好的一个数值，用于判断候选帧比重的高低。计算机设备可以获取比重阈值，并将候选帧比重与比重阈值尽进行比较，从而得到比较结果。当计算机设备得到的比较结果是候选帧比重高于比重阈值时，计算机设备可以将候选帧比重对应的食物区域作为候选区域。The weight threshold may be a preset value for judging the weight of the candidate frame. The computer device may obtain the weight threshold, and compare the weight of the candidate frame with the weight threshold to obtain a comparison result. When the comparison result obtained by the computer device is that the proportion of the candidate frame is higher than the proportion threshold, the computer device may take the food area corresponding to the proportion of the candidate frame as the candidate area.

步骤304，根据候选区域的候选帧比重，得到候选区域的候选帧坐标。Step 304: Obtain the candidate frame coordinates of the candidate region according to the candidate frame weight of the candidate region.

步骤306，根据候选帧坐标提取候选区域的子特征。Step 306 , extract the sub-features of the candidate region according to the coordinates of the candidate frame.

由于候选区域的候选帧比重高于比重阈值，计算机设备可以利用基于候选帧坐标的AlexNet网络提取FC7层的特征，并将提取的FC7层的特征标记为子特征。Since the proportion of candidate frames in the candidate region is higher than the proportion threshold, the computer equipment can extract the features of the FC7 layer by using the AlexNet network based on the coordinates of the candidate frames, and mark the extracted features of the FC7 layer as sub-features.

步骤308，根据子特征，得到食物区域中的视觉特征。Step 308: Obtain visual features in the food area according to the sub-features.

计算机设备可以将子特征串联起来，得到食物区域中的视觉特征。The computer equipment can concatenate the sub-features to obtain visual features in the food area.

在一个实施例中，提供的一种图像处理方法还可以包括对待处理图像进行分类的过程，具体过程包括：提取视觉特征中的颜色特征，并提取视觉特征中的形状特征；根据颜色特征以及形状特征，对待处理图像进行分类，得到待处理图像的类别。In one embodiment, the provided image processing method may further include a process of classifying the images to be processed, and the specific process includes: extracting color features in visual features, and extracting shape features in visual features; feature, classify the image to be processed, and obtain the category of the image to be processed.

由于颜色特征比较直观，因此，颜色特征是图像分类和图像检索领域应用最广泛的特征之一。RGB颜色空间是通过图像彩色显示采用了R、G、B三种颜色混合的原理，根据三原色原理，可以将原来的单色进行添加与混合：F＝r[R]+g[G]+b[B]。RGB模型可以用三维笛卡尔坐标系表示，三个坐标轴R、G、B分别代表红、蓝、绿三种原色，坐标轴原点代表黑色，任何颜色都可以用三种原色的线性组合来表示。虽然RGB颜色空间可以表示任意一种颜色，但在笛卡尔坐标系下，两种相似颜色的RGB坐标值可能会有很大的不同，所以在颜色空间中对图像进行颜色特征相似性匹配时，会将RGB颜色空间转换为其他颜色空间，因为图像的一般表示形式是RGB，所以转换过程中需要将RGB颜色空间转换为其他颜色空间，例如，将RGB空间转换为HSV空间，换算公式为：Color features are one of the most widely used features in the fields of image classification and image retrieval because they are relatively intuitive. The RGB color space adopts the principle of mixing three colors of R, G, and B through the color display of the image. According to the principle of the three primary colors, the original monochrome can be added and mixed: F=r[R]+g[G]+b [B]. The RGB model can be represented by a three-dimensional Cartesian coordinate system. The three coordinate axes R, G, and B represent the three primary colors of red, blue, and green, respectively. The origin of the coordinate axis represents black. Any color can be represented by a linear combination of the three primary colors. . Although the RGB color space can represent any color, in the Cartesian coordinate system, the RGB coordinate values of two similar colors may be very different, so when the color feature similarity matching is performed on the image in the color space, The RGB color space will be converted to other color spaces. Because the general representation of the image is RGB, the RGB color space needs to be converted to other color spaces during the conversion process. For example, to convert the RGB space to the HSV space, the conversion formula is:

其中，r、g和b是RGB笛卡尔空间中的坐标值，max和min是三者的最大值和最小值，H、S和V表示HSV空间模型中的色调、饱和度和亮度。

Among them, r, g, and b are the coordinate values in the RGB Cartesian space, max and min are the maximum and minimum values of the three, and H, S, and V represent the hue, saturation, and brightness in the HSV space model.

形状特征可以用于表示图像中物体的轮廓。计算机设备可以提取出视觉特征中的形状特征，从而根据颜色特征以及形状特征，对待处理图像进行分类，得到待处理图像的类别。Shape features can be used to represent the contours of objects in an image. The computer equipment can extract the shape features in the visual features, so as to classify the images to be processed according to the color features and the shape features, and obtain the categories of the images to be processed.

在一个实施例中，提供的一种图像处理方法还可以包括得到待处理图像的类别的过程，具体过程包括：根据颜色特征以及形状特征，获取待处理图像的边缘特征图；根据边缘特征图计算边缘点数，并根据边缘点数得到图像直方图；将图像直方图输入至图像分类器中，得到待处理图像的类别。In one embodiment, the provided image processing method may further include a process of obtaining the category of the image to be processed, and the specific process includes: obtaining an edge feature map of the image to be processed according to color features and shape features; calculating according to the edge feature map The number of edge points is obtained, and the image histogram is obtained according to the number of edge points; the image histogram is input into the image classifier to obtain the category of the image to be processed.

在进行图像分类时，需要增加形状特征的相似性匹配。形状特征主要体现在图像的轮廓特征和边缘特征上。目前，对图像进行边缘检测的算法很多，如Sobel算法、Canny算法、Laplacian算法等，均用于提取图像的边缘特征。When performing image classification, it is necessary to increase the similarity matching of shape features. Shape features are mainly reflected in the contour features and edge features of the image. At present, there are many algorithms for image edge detection, such as Sobel algorithm, Canny algorithm, Laplacian algorithm, etc., all of which are used to extract edge features of images.

边缘检测是提取图像形状特征的第一步，计算机设备可以根据边缘特征图提取图像的形状特征。常用的提取形状特征的方法有：以角度的变量计算边缘点数绘制直方图，计算边缘检测图像；利用面积、偏心率、形状参数、形状特征直方图作为图像的描述，并将该直方图输入到训练好的分类器中，得到图像的类别信息。其中，计算机设备可以从所有训练集和测试集的食品图像中提取视觉特征，然后通过训练集对分类器进行训练，得到基于训练后分类模型的分类结果。Edge detection is the first step in extracting image shape features. Computer equipment can extract image shape features based on edge feature maps. Commonly used methods for extracting shape features are: using the variable of angle to calculate the number of edge points to draw a histogram, and calculate the edge detection image; use the area, eccentricity, shape parameters, and shape feature histogram as the description of the image, and input the histogram into the image. In the trained classifier, the category information of the image is obtained. Among them, the computer equipment can extract visual features from all the food images in the training set and the test set, and then train the classifier through the training set to obtain the classification result based on the classification model after training.

在一个实施例中，提供的一种图像处理方法还可以包括检索图像的过程，具体过程包括：获取待检索图像，并提取待检索图像的图像特征；根据图像特征在图像库中进行图像检索，得到初始检索图像；计算初始检索图像与待检索图像之间的相似度；根据相似度得到目标检索图像。In one embodiment, the provided image processing method may further include a process of retrieving images, and the specific process includes: acquiring an image to be retrieved, and extracting image features of the image to be retrieved; performing image retrieval in an image library according to the image features, Obtain the initial retrieval image; calculate the similarity between the initial retrieval image and the image to be retrieved; obtain the target retrieval image according to the similarity.

计算机设备可以获取待检索图像，其中，待检索图像可以是用户输入的要搜索的图像。计算机设备可以利用CNN算法提取待检索图像的图像特征，并将图像特征转换为1×4096矢量，根据图像特征在图像库中进行图像检索，得到初始检索图像。接着，计算机设备可以根据提取的图像特征，计算初始检索图像与待检索图像的相似度。在本实施例中，可以用欧几里德距离公式或确定角度来表征图像之间的相似性，因为特征向量矩阵是统一的，所以两对之间的乘积越大，两幅图像就越相似。计算公式为：a·b＝|a|×|b|×cosθ。其中，a和b代表两个特征向量，|a|和|b|可以表示两个特征向量的模值，cosθ是两个特征向量之间的夹角，点积越大，两个特征向量越相似。计算机设备可以根据点积大小对图像进行排序，从图像数据库中选取前N幅图像目标检索图像。The computer device may acquire an image to be retrieved, wherein the image to be retrieved may be an image to be searched entered by a user. The computer equipment can use the CNN algorithm to extract the image features of the image to be retrieved, convert the image features into a 1×4096 vector, and perform image retrieval in the image database according to the image features to obtain the initial retrieval image. Next, the computer device may calculate the similarity between the initial retrieval image and the to-be-retrieved image according to the extracted image features. In this embodiment, the Euclidean distance formula or the determined angle can be used to characterize the similarity between images, because the eigenvector matrix is unified, the greater the product between the two pairs, the more similar the two images are. . The calculation formula is: a·b=|a|×|b|×cosθ. Among them, a and b represent two eigenvectors, |a| and |b| can represent the modulus value of the two eigenvectors, and cosθ is the angle between the two eigenvectors. The larger the dot product, the more the two eigenvectors are. resemblance. The computer equipment can sort the images according to the size of the dot product, and select the first N image objects from the image database to retrieve the images.

接着，计算机设备可以根据BOW模型对上述N幅图像进行分类。由于BOW分类模型的准确率在50％以上，且只使用图像的一个特征，即SIFT特征，所以可以通过多特征多核的方法，使分类准确率将大大提高。由于CNN的准确率至少在70％以上，大部分图像属于同一类别，只有少数图像属于不同类别，消除了不同类别，只保留了大类别中的图像。计算机设备可以将初步搜索结果经BOW模型处理后得到的最终搜索结果反馈给用户，得到最终的搜索结果。Next, the computer device can classify the above N images according to the BOW model. Since the accuracy rate of the BOW classification model is above 50%, and only one feature of the image is used, that is, the SIFT feature, the classification accuracy rate can be greatly improved by the method of multi-feature and multi-core. Since the accuracy of CNN is at least above 70%, most of the images belong to the same category, and only a few images belong to different categories. Different categories are eliminated, and only images in large categories are retained. The computer device can feed back the final search result obtained after the preliminary search result is processed by the BOW model to the user to obtain the final search result.

在一个实施例中，图像检索的实质是在数据库中找到一个或多个最接近索引词的点，这些点可以称为近邻。以基于谱散列的图像检索算法为例，给定n个变量的m个样本，构造一个m×n矩阵，n的值一般较大，因此通常存在冗余数据，人类很难识别。对于由每个样本的特征组成的向量矩阵，向量矩阵中的某些元素是不可分辨的，例如，所有样本中有一个元素是0，或者非常接近于0，那么在做特征的这个元素是不可分辨的。区分时可以忽略这个元素，需要找到样本间变化较大的元素，元素的变化可以用方差的大小来表征，只保留方差较大的元素，从而减少数据冗余并有效地简化操作。假设共有m个样本，每个样本大小为n维，计算每个特征的平均值：

其中，i是样本的序列号，j是样本向量的维数，

是第一个样本的第一维特征值。In one embodiment, the essence of image retrieval is to find one or more points in the database that are closest to the index term, and these points may be referred to as neighbors. Taking the image retrieval algorithm based on spectral hashing as an example, given m samples of n variables, an m×n matrix is constructed, and the value of n is generally large, so there is usually redundant data, which is difficult for humans to identify. For a vector matrix composed of the features of each sample, some elements in the vector matrix are indistinguishable. For example, if one element in all samples is 0, or very close to 0, then this element of the feature is indistinguishable. distinguishable. This element can be ignored when distinguishing. It is necessary to find the element with large variation between samples. The variation of the element can be characterized by the size of the variance, and only the elements with large variance are retained, thereby reducing data redundancy and effectively simplifying the operation. Assuming a total of m samples, each of size n-dimensional, calculate the mean of each feature:

where i is the serial number of the sample, j is the dimension of the sample vector,

is the first dimension eigenvalue of the first sample.

在一个实施例中，BOW模型是文本分类中常用的一种模型，它用词汇组成的特征向量来表示文档，而忽略了文档的语法、语法和词序，由于文档中词汇的独特性，文档的特征可以用它来表示。BOW模型基于SIFT特征对图像进行分类，SIFT特征不仅具有位移、尺度、变形的不变性，而且具有光照的不变性，即对亮度差异较大的相似图像仍具有良好的检测效果，可以弥补CNN对于照明特性不足。提取图像的SIFT特征的步骤可以包括：建立固定规模空间，主要目的是描述图像的尺度不变性，图像尺度空间的定义为：L(x,y,σ)＝G(x,y,σ)×(x,y)，其中，x，y是图像像素的空间坐标，σ的大小决定了图像的平滑度；接着，检测DOG规模空间中的极值点；其次是特征点选择，通过拟合三维二次函数确定关键点的位置和尺度，去除局部曲率不对称的像素点，从而提高匹配的稳定性和抗噪能力；最后为每个特征点指定128维特征向量。每个点的梯度模值和方向计算公式分别为：

θ(x,y)＝αtan2(L(x,y+1)-L(x,y-1)/L(x+1,y)-L(x-11,y))，由于SIFT特征点的数量庞大，直接将BOW模型作为码本，数据量大，计算复杂度高。因此，需要聚类算法k-means或支持向量机对SIFT特征进行聚类。In one embodiment, the BOW model is a commonly used model in text classification. It uses a feature vector composed of words to represent documents, while ignoring the grammar, grammar and word order of documents. Due to the uniqueness of words in documents, documents' Features can be represented by it. The BOW model classifies images based on SIFT features. SIFT features not only have the invariance of displacement, scale, and deformation, but also the invariance of illumination, that is, they still have a good detection effect on similar images with large differences in brightness, which can make up for CNN. Insufficient lighting characteristics. The steps of extracting the SIFT features of the image may include: establishing a fixed scale space, the main purpose is to describe the scale invariance of the image, the definition of the image scale space is: L(x,y,σ)=G(x,y,σ)× (x, y), where x, y are the spatial coordinates of the image pixels, and the size of σ determines the smoothness of the image; then, detect extreme points in the DOG scale space; followed by feature point selection, by fitting three-dimensional The quadratic function determines the position and scale of key points, and removes pixels with asymmetric local curvature, thereby improving the stability of matching and anti-noise ability; finally, a 128-dimensional feature vector is assigned to each feature point. The formulas for calculating the gradient modulus and direction of each point are:

θ(x,y)=αtan2(L(x,y+1)-L(x,y-1)/L(x+1,y)-L(x-11,y)), due to SIFT feature points There is a huge amount of data, and the BOW model is directly used as a codebook, which has a large amount of data and high computational complexity. Therefore, clustering algorithm k-means or support vector machine is needed to cluster SIFT features.

将SIFT特征聚类成k类后，将每个类的质心放入一个字袋中，字袋模型的码本大小为k，构造一个大小为k的空数组，计算出测试数据集中每个图像的特征点，即距离码本中的每个特征。得到的词频直方图可以按照公式进行归一化，公式为：

其中，i是码本的序列号，x_i是对应于第一码本的SIFT特征的统计，而Yi是正规化值。After clustering the SIFT features into k classes, put the centroid of each class into a word bag, the codebook size of the word bag model is k, construct an empty array of size k, and calculate each image in the test data set The feature points of , that is, each feature in the distance codebook. The obtained word frequency histogram can be normalized according to the formula, the formula is:

where i is the sequence number of the codebook, _xi is the statistic corresponding to the SIFT feature of the first codebook, and Yi is the normalized value.

在一个实施例中，对本方案进行了实验，实验过程及结果如下：原始盘数据集包含117504个图像和11611个盘类别，选取图像数大于或等于15的菜品类别，最终得到233种菜品和49168幅图像，该数据集称为dish-233。In one embodiment, an experiment was carried out on this scheme. The experimental process and results are as follows: the original dish data set contains 117,504 images and 11,611 dish categories, and the dish category with the number of images greater than or equal to 15 is selected, and finally 233 dishes and 49168 images, the dataset is called dish-233.

为了利用快速R-CNN检测食品图像区域，有必要对R-CNN进行快速微调。为此，需要一个带有区域边界标签的食品图像数据集。视觉数据库包含108077幅校准图像，包括大量的食物图像。所以使用Dish-233数据集的类别名称将其翻译成英语作为查询词，构造查询字典列表，食物图像是从视觉数据集中挑选出来的。为了获得更多的食物图片，进一步选择其他食物数据库的类别信息，选择食物图片，然后再人工进行进一步的过滤，去除非食物图片，最后得到10641个食物图片和相应的校准区域，称为VisGenome-11K。In order to utilize Fast R-CNN to detect food image regions, it is necessary to perform fast fine-tuning of R-CNN. For this, a dataset of food images with region boundary labels is required. The vision database contains 108,077 calibrated images, including a large number of food images. So using the category names of the Dish-233 dataset to translate it into English as query words, a list of query dictionaries is constructed, and the food images are picked from the visual dataset. In order to obtain more food pictures, further select the category information of other food databases, select food pictures, and then perform further manual filtering to remove non-food pictures, and finally get 10641 food pictures and corresponding calibration areas, called VisGenome- 11K.

对于模型的参数设置，在快速R-CNN训练过程中，最小(mini-batch)的批量为从图像中提取256个anchor，迭代次数为80000次。在前60000次迭代中，学习率设置为0.001，在20000次迭代之后，学习率设置为0.0001，动量参数设置为0.9，权重衰减参数设置为0.0005。在对Alex-Net模型进行微调时，初始学习率设置为0.001，每20个阶段前将学习率调整为0.1，最大迭代次数设置为60个阶段。VisGenome-11K图像集分为两部分，80％用于训练集，20％用于验证集。使用VisGenome-11K微调更快的R-CNN，然后使用微调更快的R-CNN模型对Dish数据集的食物图像进行区域检测，得到图像中的食物区域，然后应用图像网络上预先训练的Alex-Net模型从图像中提取食物区域的视觉特征，对所有图像进行区域检测和视觉特征提取后，用于食物图像的检索和分类。For the parameter settings of the model, in the Fast R-CNN training process, the minimum (mini-batch) batch is to extract 256 anchors from the image, and the number of iterations is 80,000. In the first 60000 iterations, the learning rate is set to 0.001, and after 20000 iterations, the learning rate is set to 0.0001, the momentum parameter is set to 0.9, and the weight decay parameter is set to 0.0005. When fine-tuning the Alex-Net model, the initial learning rate was set to 0.001, the learning rate was adjusted to 0.1 before every 20 epochs, and the maximum number of iterations was set to 60 epochs. The VisGenome-11K image set is divided into two parts, 80% for the training set and 20% for the validation set. Use VisGenome-11K to fine-tune the faster R-CNN, then use the fine-tuned faster R-CNN model to perform region detection on the food images of the Dish dataset, get the food regions in the image, and then apply the pre-trained Alex- Net model extracts visual features of food regions from images, and after region detection and visual feature extraction are performed on all images, it is used for retrieval and classification of food images.

精度和MAP是信息检索中常用的评价指标。为了验证本文方法的有效性，将CNN-G、CNN-G-F以及快速R-CNN-G进行了对比：CNN-G主要使用7层Alex网络直接提取全局图像的视觉特征；与CNN-G相比，CNN-G-F首使用训练集对Alex网络进行微调，然后再利用微调网络提取整个图像的视觉特征；快速R-CNN-G直接使用快速R-CNN网络检测图像中的候选食物区域，然后利用微调的AlexNet网络为权重最高的候选区域提取视觉特征。其中，方法CNN-G、CNN-G-F没有使用快速R-CNN进行区域检测，快速R-CNN-G方法进一步说明通过VisGenome-11K微调快速RCNN的影响。Accuracy and MAP are commonly used evaluation metrics in information retrieval. In order to verify the effectiveness of the method in this paper, CNN-G, CNN-G-F and Fast R-CNN-G are compared: CNN-G mainly uses 7-layer Alex network to directly extract visual features of global images; compared with CNN-G , CNN-G-F first uses the training set to fine-tune the Alex network, and then uses the fine-tuning network to extract the visual features of the entire image; Fast R-CNN-G directly uses the Fast R-CNN network to detect candidate food regions in the image, and then uses the fine-tuning network to detect candidate food regions in the image. The AlexNet network extracts visual features for the highest weighted proposals. Among them, methods CNN-G and CNN-G-F do not use Fast R-CNN for region detection, and Fast R-CNN-G method further illustrates the impact of fine-tuning Fast RCNN by VisGenome-11K.

对于图像的分类任务，每个类75％的数据集用作训练集，25％的数据集用作测试集。由于分类任务是单标签的，所以采用准确率作为评价指标。采用不同方法的准确度对比结果如表1所示，采用不同方法精度比较结果分析如图4所示。For the image classification task, 75% of the dataset for each class is used as the training set and 25% of the dataset is used as the test set. Since the classification task is single-label, the accuracy rate is used as the evaluation index. The accuracy comparison results of different methods are shown in Table 1, and the analysis of the accuracy comparison results of different methods is shown in Figure 4.

表1：Table 1:

MethodMethod Accursdy rateAccursdy rate CNN-GCNN-G 0.3560.356 CNN-G-FCNN-G-F 0.7040.704 Faster-R-CNN-GFaster-R-CNN-G 0.7480.748 Artucle methodArtucle method 0.7540.754

如图4所示，快速R-CNN-G与CNN-G-F相比，性能提高了5％。As shown in Fig. 4, Fast R-CNN-G improves the performance by 5% compared to CNN-G-F.

在一个实施例中，随机选取了15种不同方法的菜肴分类结果进行分析，结果如表2和图5所示。In one embodiment, the classification results of dishes of 15 different methods are randomly selected for analysis, and the results are shown in Table 2 and FIG. 5 .

表2：Table 2:

结合表2和图5，可以得到15种随机选择的菜肴在不同方法下的分类结果。CNN-G法的最佳分类结果准确率为0.41，CNN-G-F法的最佳分类结果准确率为0.73，快速R-CNN-G方法的最佳分类结果准确率为0.78。可以看出，在大多数实验中，快速R-CNN-G方法的分类性能最好。Combining Table 2 and Figure 5, the classification results of 15 randomly selected dishes under different methods can be obtained. The best classification result accuracy of CNN-G method is 0.41, the best classification result accuracy of CNN-G-F method is 0.73, and the best classification result accuracy of Fast R-CNN-G method is 0.78. It can be seen that the Fast R-CNN-G method has the best classification performance in most experiments.

在Dish-233数据集上进行检索实验，使用Precision@K和MAP@K(K表示检索过程中返回的候选图像个数)，K＝{1,20,40,60,80100}，进行实验，这两个索引中的四个索引的检索结果如图6和图7所示。Retrieval experiments are carried out on the Dish-233 dataset, using Precision@K and MAP@K (K represents the number of candidate images returned during the retrieval process), K={1, 20, 40, 60, 80100}, to conduct experiments, The retrieval results of four of the two indexes are shown in Figures 6 and 7.

根据图6和图7可以得出：CNN-G-F比CNN-G有更好的检索性能，表明微调Alex-Net网络可以获得更适合于Dish数据集的视觉特征；快速R-CNN-G速度更快，优于CNN-G-F，说明用快速R-CNN检测食物图像区域，可以有效地减少食物图像背景信息的干扰，从而提高检索性能；说明用VisGenome11K微调可以提高食品图像检测的精度。According to Figure 6 and Figure 7, it can be concluded that CNN-G-F has better retrieval performance than CNN-G, indicating that fine-tuning the Alex-Net network can obtain visual features more suitable for the Dish dataset; Fast R-CNN-G is faster Fast, better than CNN-G-F, indicating that using fast R-CNN to detect food image regions can effectively reduce the interference of background information of food images, thereby improving retrieval performance; indicating that fine-tuning with VisGenome11K can improve the accuracy of food image detection.

应该理解的是，虽然上述各个流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，上述各个流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些子步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the steps in the above-mentioned flowcharts are displayed in sequence according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in each of the above flowcharts may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but may be executed at different times. These sub-steps or stages The order of execution of the steps is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of sub-steps or stages of other steps.

在一个实施例中，如图8所示，提供了一种图像处理装置，包括：图像获取模块810、参数调整模块820、特征提取模块830和图像分类模块840，其中：In one embodiment, as shown in FIG. 8, an image processing apparatus is provided, including: an image acquisition module 810, a parameter adjustment module 820, a feature extraction module 830, and an image classification module 840, wherein:

图像获取模块810，用于获取待处理图像；待处理图像中包含有食物区域。The image acquisition module 810 is used for acquiring an image to be processed; the image to be processed includes a food area.

参数调整模块820，用于查找初始目标检测算法，并调整初始目标检测算法中的参数，得到目标检测算法。The parameter adjustment module 820 is used to find the initial target detection algorithm, and adjust the parameters in the initial target detection algorithm to obtain the target detection algorithm.

特征提取模块830，用于根据目标检测算法提取待处理图像中的食物区域，并通过深度神经网络算法提取食物区域中的视觉特征。The feature extraction module 830 is configured to extract the food region in the image to be processed according to the target detection algorithm, and extract the visual features in the food region through the deep neural network algorithm.

图像分类模块840，用于根据视觉特征对待处理图像进行分类。The image classification module 840 is used for classifying the images to be processed according to the visual features.

在一个实施例中，参数调整模块820还用于获取图像数据库，并从图像数据库中查找食物图像；根据食物图像生成食物图像数据集；使用食物图像数据集调整初始目标检测算法中的参数，得到目标检测算法。In one embodiment, the parameter adjustment module 820 is further configured to acquire an image database, and search for food images from the image database; generate a food image data set according to the food images; use the food image data set to adjust the parameters in the initial target detection algorithm to obtain Object detection algorithm.

在一个实施例中，特征提取模块830还用于根据目标检测算法分别获取待处理图像的候选帧；计算出各个候选帧的候选帧比重；根据候选帧比重提取待处理图像中的食物区域。In one embodiment, the feature extraction module 830 is further configured to obtain candidate frames of the image to be processed according to the target detection algorithm; calculate the candidate frame proportions of each candidate frame; and extract food regions in the to-be-processed image according to the candidate frame proportions.

在一个实施例中，特征提取模块830还用于获取比重阈值，并将高于比重阈值的候选帧比重所对应的食物区域作为候选区域；根据候选区域的候选帧比重，得到候选区域的候选帧坐标；根据候选帧坐标提取候选区域的子特征；根据子特征，得到食物区域中的视觉特征。In one embodiment, the feature extraction module 830 is further configured to obtain a weight threshold, and use a food area corresponding to a candidate frame weight higher than the weight threshold as a candidate area; according to the candidate frame weight of the candidate area, obtain a candidate frame of the candidate area Coordinates; extract the sub-features of the candidate area according to the candidate frame coordinates; obtain the visual features in the food area according to the sub-features.

在一个实施例中，图像分类模块840还用于提取视觉特征中的颜色特征，并提取视觉特征中的形状特征；根据颜色特征以及形状特征，对待处理图像进行分类，得到待处理图像的类别。In one embodiment, the image classification module 840 is further configured to extract the color feature in the visual feature, and extract the shape feature in the visual feature; classify the image to be processed according to the color feature and the shape feature to obtain the category of the image to be processed.

在一个实施例中，图像分类模块840还用于根据颜色特征以及形状特征，获取待处理图像的边缘特征图；根据边缘特征图计算边缘点数，并根据边缘点数得到图像直方图；将图像直方图输入至图像分类器中，得到待处理图像的类别。In one embodiment, the image classification module 840 is further configured to obtain the edge feature map of the image to be processed according to the color feature and the shape feature; calculate the number of edge points according to the edge feature map, and obtain an image histogram according to the number of edge points; Input to the image classifier to get the category of the image to be processed.

在一个实施例中，提供的一种图像处理装置还可以包括图像检索模块，用于获取待检索图像，并提取待检索图像的图像特征；根据图像特征在图像库中进行图像检索，得到初始检索图像；计算初始检索图像与待检索图像之间的相似度；根据相似度得到目标检索图像。In one embodiment, an image processing apparatus provided may further include an image retrieval module for acquiring an image to be retrieved and extracting image features of the image to be retrieved; performing image retrieval in an image library according to the image features to obtain an initial retrieval image; calculate the similarity between the initial retrieval image and the image to be retrieved; obtain the target retrieval image according to the similarity.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是终端，其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种图像处理方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏，该计算机设备的输入装置可以是显示屏上覆盖的触摸层，也可以是计算机设备外壳上设置的按键、轨迹球或触控板，还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided, and the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 9 . The computer equipment includes a processor, memory, a network interface, a display screen, and an input device connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program implements an image processing method when executed by a processor. The display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.

本领域技术人员可以理解，图9中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 9 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

在一个实施例中，提供了一种计算机设备，包括存储器和处理器，存储器中存储有计算机程序，该处理器执行计算机程序时实现以下步骤：In one embodiment, a computer device is provided, including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program:

获取待处理图像；待处理图像中包含有食物区域；Obtain the image to be processed; the image to be processed contains food areas;

查找初始目标检测算法，并调整初始目标检测算法中的参数，得到目标检测算法；Find the initial target detection algorithm, and adjust the parameters in the initial target detection algorithm to obtain the target detection algorithm;

根据目标检测算法提取待处理图像中的食物区域，并通过深度神经网络算法提取食物区域中的视觉特征；Extract the food area in the image to be processed according to the target detection algorithm, and extract the visual features in the food area through the deep neural network algorithm;

根据视觉特征对待处理图像进行分类。Classify images to be processed based on visual features.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：获取图像数据库，并从图像数据库中查找食物图像；根据食物图像生成食物图像数据集；使用食物图像数据集调整初始目标检测算法中的参数，得到目标检测算法。In one embodiment, the processor further implements the following steps when executing the computer program: acquiring an image database, and searching for food images from the image database; generating a food image data set according to the food images; adjusting the initial target detection algorithm using the food image data set parameters to obtain the target detection algorithm.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：根据目标检测算法分别获取待处理图像的候选帧；计算出各个候选帧的候选帧比重；根据候选帧比重提取待处理图像中的食物区域。In one embodiment, the processor further implements the following steps when executing the computer program: respectively acquiring candidate frames of the image to be processed according to the target detection algorithm; calculating the candidate frame proportions of each candidate frame; food area.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：获取比重阈值，并将高于比重阈值的候选帧比重所对应的食物区域作为候选区域；根据候选区域的候选帧比重，得到候选区域的候选帧坐标；根据候选帧坐标提取候选区域的子特征；根据子特征，得到食物区域中的视觉特征。In one embodiment, the processor further implements the following steps when executing the computer program: obtaining a specific gravity threshold, and using a food area corresponding to a candidate frame proportion higher than the specific gravity threshold as a candidate area; The candidate frame coordinates of the region; the sub-features of the candidate region are extracted according to the candidate frame coordinates; the visual features in the food region are obtained according to the sub-features.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：提取视觉特征中的颜色特征，并提取视觉特征中的形状特征；根据颜色特征以及形状特征，对待处理图像进行分类，得到待处理图像的类别。In one embodiment, the processor further implements the following steps when executing the computer program: extracting color features in the visual features, and extracting shape features in the visual features; classifying the images to be processed according to the color features and the shape features, and obtaining the to-be-processed images The category of the image.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：根据颜色特征以及形状特征，获取待处理图像的边缘特征图；根据边缘特征图计算边缘点数，并根据边缘点数得到图像直方图；将图像直方图输入至图像分类器中，得到待处理图像的类别。In one embodiment, the processor also implements the following steps when executing the computer program: obtaining the edge feature map of the image to be processed according to the color feature and the shape feature; calculating the number of edge points according to the edge feature map, and obtaining the image histogram according to the number of edge points; Input the image histogram into the image classifier to obtain the category of the image to be processed.

在一个实施例中，处理器执行计算机程序时还实现以下步骤：获取待检索图像，并提取待检索图像的图像特征；根据图像特征在图像库中进行图像检索，得到初始检索图像；计算初始检索图像与待检索图像之间的相似度；根据相似度得到目标检索图像。In one embodiment, the processor also implements the following steps when executing the computer program: acquiring an image to be retrieved, and extracting image features of the image to be retrieved; performing image retrieval in an image library according to the image features to obtain an initial retrieval image; calculating an initial retrieval image The similarity between the image and the image to be retrieved; the target retrieval image is obtained according to the similarity.

在一个实施例中，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现以下步骤：In one embodiment, a computer-readable storage medium is provided on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：获取图像数据库，并从图像数据库中查找食物图像；根据食物图像生成食物图像数据集；使用食物图像数据集调整初始目标检测算法中的参数，得到目标检测算法。In one embodiment, the computer program, when executed by the processor, further implements the following steps: acquiring an image database, and looking up food images from the image database; generating a food image dataset based on the food images; adjusting an initial object detection algorithm using the food image dataset The parameters in , get the target detection algorithm.

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：根据目标检测算法分别获取待处理图像的候选帧；计算出各个候选帧的候选帧比重；根据候选帧比重提取待处理图像中的食物区域。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: respectively acquiring candidate frames of the image to be processed according to the target detection algorithm; calculating the candidate frame proportions of each candidate frame; food area.

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：获取比重阈值，并将高于比重阈值的候选帧比重所对应的食物区域作为候选区域；根据候选区域的候选帧比重，得到候选区域的候选帧坐标；根据候选帧坐标提取候选区域的子特征；根据子特征，得到食物区域中的视觉特征。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: obtaining a specific gravity threshold, and using a food area corresponding to a candidate frame proportion higher than the specific gravity threshold as a candidate area; according to the candidate frame proportion of the candidate area, obtaining The candidate frame coordinates of the candidate region; the sub-features of the candidate region are extracted according to the candidate frame coordinates; the visual features in the food region are obtained according to the sub-features.

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：提取视觉特征中的颜色特征，并提取视觉特征中的形状特征；根据颜色特征以及形状特征，对待处理图像进行分类，得到待处理图像的类别。In one embodiment, the computer program further implements the following steps when executed by the processor: extracting color features in the visual features, and extracting shape features in the visual features; classifying the images to be processed according to the color features and the shape features, and obtaining the to-be-processed images The class to handle the image.

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：根据颜色特征以及形状特征，获取待处理图像的边缘特征图；根据边缘特征图计算边缘点数，并根据边缘点数得到图像直方图；将图像直方图输入至图像分类器中，得到待处理图像的类别。In one embodiment, when the computer program is executed by the processor, the following steps are also implemented: obtaining an edge feature map of the image to be processed according to the color feature and the shape feature; calculating the number of edge points according to the edge feature map, and obtaining an image histogram according to the number of edge points ; Input the image histogram into the image classifier to obtain the category of the image to be processed.

在一个实施例中，计算机程序被处理器执行时还实现以下步骤：获取待检索图像，并提取待检索图像的图像特征；根据图像特征在图像库中进行图像检索，得到初始检索图像；计算初始检索图像与待检索图像之间的相似度；根据相似度得到目标检索图像。In one embodiment, when the computer program is executed by the processor, the following steps are further implemented: acquiring an image to be retrieved, and extracting image features of the image to be retrieved; performing image retrieval in an image library according to the image features to obtain an initial retrieved image; calculating an initial retrieval image The similarity between the retrieved image and the image to be retrieved; the target retrieval image is obtained according to the similarity.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present application, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the invention patent. It should be pointed out that for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the patent of the present application shall be subject to the appended claims.

Claims

1. an image processing method, is characterized in that, described method comprises:

acquiring an image to be processed; the image to be processed includes a food area;

Find an initial target detection algorithm, and adjust the parameters in the initial target detection algorithm to obtain a target detection algorithm;

Extract the food area in the to-be-processed image according to the target detection algorithm, and extract the visual features in the food area through a deep neural network algorithm;

The images to be processed are classified according to the visual features.

2. The method according to claim 1, wherein the adjusting parameters in the initial target detection algorithm to obtain a target detection algorithm, comprising:

obtaining a database of images, and looking up food images from the database of images;

generating a food image dataset from the food images;

The target detection algorithm is obtained by adjusting the parameters in the initial target detection algorithm using the food image data set.

3. The method according to claim 1, wherein the extracting the food region in the to-be-processed image according to the target detection algorithm comprises:

Obtain candidate frames of the to-be-processed image respectively according to the target detection algorithm;

Calculate the candidate frame proportion of each of the candidate frames;

Extract the food area in the to-be-processed image according to the candidate frame weight.

4. The method according to claim 3, wherein the extraction of visual features in the food area by a deep neural network algorithm comprises:

Obtain a specific gravity threshold, and use the food area corresponding to the candidate frame specific gravity higher than the specific gravity threshold as a candidate area;

According to the candidate frame weight of the candidate area, obtain the candidate frame coordinates of the candidate area;

Extract the sub-features of the candidate region according to the candidate frame coordinates;

According to the sub-features, the visual features in the food region are obtained.

5. The method according to claim 1, wherein the classifying the to-be-processed image according to the visual feature comprises:

Extracting the color feature in the visual feature, and extracting the shape feature in the visual feature;

According to the color feature and the shape feature, the image to be processed is classified to obtain the category of the image to be processed.

6. The method according to claim 5, wherein, according to the color feature and the shape feature, classifying the to-be-processed image to obtain the class of the to-be-processed image, comprising:

According to the color feature and the shape feature, obtain the edge feature map of the to-be-processed image;

Calculate the number of edge points according to the edge feature map, and obtain an image histogram according to the number of edge points;

The image histogram is input into an image classifier to obtain the category of the image to be processed.

7. The method of claim 1, wherein the method further comprises:

acquiring an image to be retrieved, and extracting image features of the image to be retrieved;

Perform image retrieval in the image library according to the image features to obtain an initial retrieval image;

calculating the similarity between the initial retrieval image and the to-be-retrieved image;

The target retrieval image is obtained according to the similarity.

8. An image processing device, wherein the device comprises:

an image acquisition module for acquiring an image to be processed; the image to be processed includes a food area;

a parameter adjustment module, used to find an initial target detection algorithm, and adjust parameters in the initial target detection algorithm to obtain a target detection algorithm;

a feature extraction module, used for extracting the food area in the to-be-processed image according to the target detection algorithm, and extracting visual features in the food area through a deep neural network algorithm;

An image classification module, configured to classify the to-be-processed image according to the visual feature.

9. A computer device comprising a memory and a processor, wherein the memory stores a computer program, wherein the processor implements the steps of the method according to any one of claims 1 to 7 when the processor executes the computer program .

10. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are implemented.