CN112699822B - Restaurant dish identification method based on deep convolutional neural network - Google Patents

Restaurant dish identification method based on deep convolutional neural network Download PDF

Info

Publication number
CN112699822B
CN112699822B CN202110006146.7A CN202110006146A CN112699822B CN 112699822 B CN112699822 B CN 112699822B CN 202110006146 A CN202110006146 A CN 202110006146A CN 112699822 B CN112699822 B CN 112699822B
Authority
CN
China
Prior art keywords
channel
convolution
characteristic spectrum
dish
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110006146.7A
Other languages
Chinese (zh)
Other versions
CN112699822A (en
Inventor
翟盛龙
尹旭
王东伟
张金波
张睿智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110006146.7A priority Critical patent/CN112699822B/en
Publication of CN112699822A publication Critical patent/CN112699822A/en
Application granted granted Critical
Publication of CN112699822B publication Critical patent/CN112699822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a restaurant dish identification method based on a deep convolutional neural network, which relates to the technical field of deep learning, and aims at solving the problem that dishes are difficult to divide finely when the similarity of the dishes is high in the current dish identification and classification method, and adopts the following technical scheme: taking and cutting out a dish image, and obtaining a sample block of the image; the method comprises the steps of firstly carrying out 3D convolution operation and downsampling operation on a sample block, then extracting a characteristic spectrum through an attention module, then carrying out 3D convolution operation to obtain a one-dimensional intermediate characteristic spectrum, inputting the intermediate characteristic spectrum into a deep convolution neural network with a softmax function, and obtaining a classification result of an original menu image through a probability value obtained through softmax function mapping. The invention can improve the identification precision, reduce the workload of manual auxiliary operation and solve the defect that the current dish identification and classification method is difficult to finely divide dishes when the similarity of the dishes is high.

Description

Restaurant dish identification method based on deep convolutional neural network
Technical Field
The invention relates to the technical field of deep learning, in particular to a restaurant dish identification method based on a deep convolutional neural network.
Background
Food is an important component in human life, and is an important precondition for human survival and healthy development. Along with the development of society, the requirements of people on food quality are continuously improved, the development of the catering industry is objectively and greatly promoted, various dishes are also manufactured in the restaurant of enterprises, and new dishes are continuously appeared, so that the accurate identification of a plurality of dishes is an increasingly high requirement. Meanwhile, the dining room of many enterprises has various kinds of dishes and the manual settlement efficiency of dining halls is quite low. At present, mobile interconnection rapidly develops, a large number of dish images can be rapidly obtained from massive network image information, and can be used as a data source for analysis modeling, so that a general model for classifying, dividing and identifying the dish images is obtained, and the method has very important significance for saving labor cost and improving restaurant settlement efficiency.
Image classification, in short, distinguishes images that have been obtained based on finding common features that they have from a large number of images. The model can only correctly distinguish the images if such features can be found.
Classifying dish images is a focused research direction before deep learning technology is developed. Deep learning is a machine learning method based on characterization learning of data, and aims to build and simulate a multi-layer neural network for analysis learning of human brain, which is used for explaining some data such as images, sounds, texts and the like, and has been widely applied in the field of image recognition. As the deep learning can extract more abstract and deeper features in the image, the deep learning has stronger classification capability compared with the traditional classification method. The convolutional neural network is well applied to image classification, however, the input information quantity and the classification effect of the convolutional neural network are not completely positively correlated, and under a certain model, too complicated input can not only prolong training time and classification time, but even cause no increase in accuracy and decrease in accuracy. Therefore, deep research on the feature extraction process before convolutional neural network classification is necessary, and the purpose of self-adaptive refinement of the features can be achieved on the premise of low cost.
Disclosure of Invention
Aiming at the characteristics of high similarity degree, uneven material collocation and the like of some dishes in the current dish identification and classification method, the invention carries out deep research on the characteristic extraction process before convolutional neural network classification, can achieve the aim of self-adaptive refinement of the characteristics on the premise of low expenditure, and simultaneously, optimizes and improves the loss function suitable for restaurant dish classification, enhances the robustness of an algorithm, reduces the fitting risk and provides the restaurant dish identification method based on the deep convolutional neural network for further improving the identification precision, reducing the workload of manual auxiliary operation.
The invention discloses a restaurant dish identification method based on a deep convolutional neural network, which solves the technical problems and adopts the following technical scheme:
the restaurant dish identification method based on the deep convolutional neural network is characterized by comprising the following steps of:
step S1, collecting a dish image R 1 For dish image R 1 Cutting pretreatment operation is carried out to obtain a dish image R 2 For dish image R 2 Sample block taking is carried out to obtain a dish sample block T 1 Dish sample block T 1 Namely, the characteristic information of the dish sample;
step S2, for a dish sample block T 1 Performing 3D convolution operation to obtain a dish sample block T 1 Intermediate profile T of (2) 2
Step S3, for a dish sample block T 1 Intermediate profile T of (2) 2 Carrying out pooling operation to obtain an intermediate characteristic spectrum T3;
step S4, middle characteristic spectrum T in space dimension 3 And carrying out pooling operation to obtain a channel attention module A 3 Intermediate feature pattern T in the channel dimension 3 Carrying out pooling operation to obtain a plane attention module A' 3 Intermediate characteristic spectrum T 3 Each channel vector and channel attention module, intermediate feature pattern T 3 Each spatial feature and the plane attention module are respectively subjected to phase-based multiplication to obtain an intermediate feature map T 4
Step S5, intermediate characteristic spectrum T 4 Sequentially performing 3D convolution operation and pooling operation to obtain an intermediate characteristic spectrum T 6 Intermediate feature pattern T in the spatial dimension 6 And carrying out pooling operation to obtain a channel attention module A 6 Intermediate feature pattern T in the channel dimension 6 Carrying out pooling operation to obtain a plane attention module A' 6 Intermediate characteristic spectrum T 6 Is defined as each channel vector and channel attention module, intermediate featureSyndrome pattern T 6 Each spatial feature and the plane attention module are respectively subjected to phase-based multiplication to obtain an intermediate feature map T 7
Step S6, intermediate characteristic spectrum T 7 Performing 3D convolution operation to obtain a one-dimensional intermediate characteristic spectrum T 8
Step S7, the intermediate characteristic spectrum T 8 Inputting into a deep convolutional neural network to obtain a dish image R 1 Is a result of classification of (a).
Optionally, when step S1 is performed, the menu image R is displayed 2 Sample block taking is carried out, and the concrete operation comprises the following steps:
s1.1, taking a dish image R in the plane dimension 2 Taking surrounding a multiplied by a pixel points as a neighborhood block of the sample, wherein a is the number of the pixel points of the image block in the length and width directions of the plane;
step S1.2, reserving all channel information of a×a pixel points, namely forming a three-dimensional sample block of p×a×a, wherein the three-dimensional sample block is used for representing sample characteristics of middle pixel points, and performing characteristic transformation in a sample block taking process by using a formula (1):
Figure BDA0002883292150000031
wherein Q is the number of pixel points in a single channel, namely the number of block samples, D samp The method comprises the steps of representing a sample block taking process, wherein L and H represent preset plane length and width of a cutting operation;
in the sample block taking operation, when the edge pixel point has no space neighborhood information, 0 supplementing operation is carried out.
Further optionally, executing step S2, for a dish sample block T 1 Performing 3D convolution operations, the specific operations including:
step S2.1, based on the deep convolutional neural network, selecting h different convolutional kernels from each layer of convolutional neural network, and performing on a dish sample block T 1 The P channel information contained is convolved with a 3D convolution kernel of size e x f, where e is the number of operation layers in the channel dimension,e channels are selected each time to carry out a group of convolution, f represents the number of pixel points of the image block in the length and width directions in the space dimension;
s2.2, after h different convolution kernels are selected from each layer of convolution neural network, obtaining a dish sample block T by using formulas (2), (3) and (4) 1 Intermediate profile T of (2) 2
p= [ (P-e) +1] ×h formula (2),
m= [ (a-e) +1] formula (3),
Figure BDA0002883292150000041
wherein p represents a dish sample block T 1 The number of the included channels, e, is the number of operation layers of the channel dimension, a is the number of pixel points of the image block in the length and width directions of the plane, and m is the middle characteristic map T 2 The number of pixels in the space length and width directions, con 3D Representing performing a 3D convolution operation.
Further optionally, for a dish sample block T 1 In the 3D convolution operation process, mapping of each feature in a convolution layer is connected with a plurality of adjacent continuous channels in the upper layer, and a certain position value of one convolution mapping is obtained by convolving local receptive fields of the same position of three continuous channels of the upper layer; one convolution layer has a plurality of convolution kernels, one convolution kernel can only extract one type of characteristic information from three-dimensional data, and h types of characteristic information can be extracted by using h convolution kernels, wherein h is a positive integer, and h is>1。
Further optionally, step S3 is performed to obtain an intermediate feature map T 3 The specific operations of (a) include:
step S3.1, for dish sample block T 1 Intermediate profile T of (2) 2 Performing pooling operation, which is downsampling or discarding feature processing to obtain intermediate feature map T 3 At this time, the intermediate feature map T 3 Channel number and intermediate profile T 2 The number of channels is the same, and the single channel is in the ruler of space dimensionThe size is changed;
step S3.2, after pooling treatment, the intermediate characteristic spectrum T 3 Denoted as T 3 p×r×r I.e. intermediate profile T 3 The number of the pixel points of each channel in the space length and width directions is r, and the number r of the pixel points is calculated by using a formula (5):
r= (m/2) equation (5),
wherein m is an intermediate characteristic spectrum T 2 The number of pixels in the spatial length and width directions.
Further optionally, step S4 is performed to obtain an intermediate feature map T 4 The specific operation of (a) is as follows:
intermediate characteristic map T using formulas (6), (7) 3 Transforming to make the intermediate characteristic spectrum T 3 In turn in the channel direction and channel attention module A 3 Dot multiplication channel by channel, and module A 'of attention in space direction and plane' 3 Performing channel-by-channel dot multiplication to obtain an intermediate characteristic map T 4
Figure BDA0002883292150000042
Figure BDA0002883292150000051
Wherein Aten spe Representing the intermediate characteristic spectrum T 3 Attention enhancement in channel direction, aten spa Representing the intermediate characteristic spectrum T 3 The attention is enhanced in the space direction, u is the middle characteristic spectrum T 3 The (u) th pixel point contained in a single channel, r is an intermediate characteristic spectrum T 3 The number of pixels of a single channel in the space length and width directions, p is the middle characteristic map T 3 V is the intermediate characteristic spectrum T 3 V-th channel of (a), symbol
Figure BDA0002883292150000055
Representing the multiplication of elements of the same type of matrix corresponding to the same position.
Further optionally, step S5 is performed to obtain an intermediate feature map T 7 The specific operation of (a) is as follows:
step S5.1, utilizing the formula (8) to obtain the intermediate characteristic spectrum T 4 Performing 3D convolution operation to obtain an intermediate characteristic spectrum T 5 Intermediate feature map T 5 Namely, is
Figure BDA0002883292150000052
Figure BDA0002883292150000053
Wherein Con 3D Representing the 3D convolution operation, x representing the intermediate feature map T 5 The number of pixel points in the space height direction, y represents the intermediate characteristic map T 5 The number of pixels in the space length and width directions, r is the middle characteristic spectrum T 4 The number of pixels of a single channel in the space length and width directions, p is the middle characteristic map T 4 The number of channels;
step S5.2, intermediate characteristic spectrum T 5 Downsampling to obtain intermediate characteristic spectrum T 6 At this time, the intermediate feature map T 6 Channel number and intermediate profile T 5 The number of channels is the same, the size of a single channel in the space dimension is changed, and the middle characteristic spectrum T 6 The dimensions of the individual channels in the spatial length and width directions are:
z×z=[(y÷2)×(y÷2)],
wherein z is an intermediate characteristic spectrum T 6 The number of pixels in the space length and width directions, y is the middle characteristic map T 5 The number of pixels in the space length and width directions;
step S5.3, intermediate characteristic spectrum T using formulas (9), (10) 6 Performing feature transformation to obtain an intermediate feature map T 7 ,T 7 Namely, is
Figure BDA0002883292150000054
Figure BDA0002883292150000061
Figure BDA0002883292150000062
Wherein Aten spe Representing the intermediate characteristic spectrum T 6 Attention enhancement in channel direction, aten spa Representing the intermediate characteristic spectrum T 6 The attention is enhanced in the space direction, u is the middle characteristic spectrum T 6 The (u) th pixel point contained in a single channel, and z is an intermediate characteristic spectrum T 6 The number of pixels of a single channel in the space length and width directions, x is the middle characteristic spectrum T 6 V is the intermediate characteristic spectrum T 6 V-th channel of (a), symbol
Figure BDA0002883292150000064
Representing the multiplication of elements of the same type of matrix corresponding to the same position.
Further optionally, when step S4 or S5 is performed, the specific operations of the channel attention module and the plane attention module are:
obtaining a channel attention module:
(1.1) first, the intermediate feature map T is in the spatial dimension i Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors, wherein i takes a value of 3 or 6,
(1.2) subsequently, inputting the two pooled vectors into a shared multi-layer mapped neural network for training, respectively generating two new vectors,
(1.3) finally, adding the two new vectors bit by bit, and performing nonlinear mapping by using a Sigmoid activation function to obtain a channel attention module A by using formulas (11) and (12) i (T i ),
Figure BDA0002883292150000063
A i (T i )=σ{MLP[AvePool(T i )]+MLP[MaxPool(T i )]Equation (12),
wherein sigma represents a Sigmoid activation function, e is the operation layer number of channel dimension, MLP represents nonlinear mapping through a multi-layer neural network, avePool represents average pooling, and MaxPool represents maximum pooling;
(II) obtaining a planar attention module:
(2.1) first, the intermediate feature pattern T is in the channel dimension i Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors,
(2.2) subsequently, mapping the two pooled vectors to a single-channel, same-size model by convolution operation,
(2.3) finally, performing nonlinear mapping through the Sigmoid activation function to obtain a plane attention module A 'by using formulas (13) and (14)' i (T i ),
Figure BDA0002883292150000071
Figure BDA0002883292150000072
Where σ represents the Sigmoid activation function, e is the number of operation layers in the channel dimension,
Figure BDA0002883292150000073
representing feature transformation with a 1 x 1 convolutional neural network, aveboost represents mean pooling and MaxPool represents maximum pooling.
Further optionally, step S6 is performed to obtain an intermediate feature map T 8 The specific operation of (a) is as follows:
step S6.1, convolution check Using ρXz XzCharacteristic spectrum T in vegetable middle 7 Performing 3D convolution operation to obtain a one-dimensional intermediate characteristic spectrum T 8 I.e. intermediate profile T 8 Each channel only contains one pixel point, wherein ρ is the side length of the convolution size on the channel, and z×z is the size of the convolution window;
step S6.2, the vegetable intermediate characteristic map T 7 When 3D convolution operation is carried out, the number of convolution kernels is eta, the vector length of input convolution is alpha, and the vector length alpha after convolution is obtained by utilizing a formula (15):
α= [ (α - ρ) +1] ×η formula (15).
Further optionally, step S7 is performed to obtain a dish image R 1 The specific operation is as follows:
step S7.1, selecting a deep convolutional neural network whose activation function is a softmax function shown in formula (16), the softmax function being preceded by a layer of neural network, the softmax function being,
Figure BDA0002883292150000074
wherein Y is i Representing the Y-th in vector T i An element;
step S7.2, intermediate characteristic map T 8 After the deep convolutional neural network is input, a vector T is obtained through a layer of neural network, then enters a softmax function, elements in the vector T are mapped into a (0, 1) interval by the softmax function, probability vectors of the vector T are obtained, and a dish image R is obtained 1 The name of (2) is the name corresponding to the maximum probability value in the probability vector mapped by the softmax function.
Compared with the prior art, the restaurant dish identification method based on the deep convolutional neural network has the beneficial effects that:
according to the invention, the characteristic patterns of the dish sample blocks are extracted through the deep convolutional neural network and the attention module, and the names of the original dish images are also obtained through softmax function mapping, so that the method has the advantage of high recognition accuracy, the workload of human-aided operation can be reduced, the fitting risk is reduced, and the defect that dishes are difficult to finely divide when the similarity of dishes is high in the current dish recognition and classification method is overcome.
Drawings
FIG. 1 is a simplified flow chart of a method according to a first embodiment of the invention;
FIG. 2 is a diagram showing an intermediate feature pattern T obtained in the first embodiment of the present invention 4 Is a simple flow chart of the method.
Detailed Description
In order to make the technical scheme, the technical problems to be solved and the technical effects of the invention more clear, the technical scheme of the invention is clearly and completely described below by combining specific embodiments.
Embodiment one:
referring to fig. 1 and 2, this embodiment provides a restaurant dish identification method based on a deep convolutional neural network, which is characterized in that the implementation content includes:
step S1, collecting a dish image R 1 For dish image R 1 Cutting pretreatment operation is carried out to obtain a dish image R 2 For dish image R 2 Sample block taking is carried out to obtain a dish sample block T 1 Dish sample block T 1 And the characteristic information of the dish sample is obtained.
In this step, the dish image R 2 The concrete operations of sample block taking comprise:
s1.1, taking a dish image R in the plane dimension 2 Taking surrounding a multiplied by a pixel points as a neighborhood block of the sample, wherein a is the number of the pixel points of the image block in the length and width directions of the plane;
step S1.2, reserving all channel information of a×a pixel points, namely forming a three-dimensional sample block of p×a×a, wherein the three-dimensional sample block is used for representing sample characteristics of middle pixel points, and performing characteristic transformation in a sample block taking process by using a formula (1):
Figure BDA0002883292150000091
wherein Q is a single channelThe number of pixels in the array is also the number of block samples, D samp The method comprises the steps of representing a sample block taking process, wherein L and H represent preset plane length and width of a cutting operation;
in the sample block taking operation, when the edge pixel point has no space neighborhood information, 0 supplementing operation is carried out.
Step S2, for a dish sample block T 1 Performing 3D convolution operation to obtain a dish sample block T 1 Intermediate profile T of (2) 2 The specific operation comprises the following steps:
step S2.1, based on the deep convolutional neural network, selecting h different convolutional kernels from each layer of convolutional neural network, and performing on a dish sample block T 1 The method comprises the steps that the included P channel information is subjected to convolution operation by using 3D convolution kernels with the size of e multiplied by f, wherein e is the operation layer number of a channel dimension, namely, e channels are selected for carrying out a group of convolution each time, and f represents the number of pixel points of an image block in the length direction and the width direction in the space dimension;
s2.2, after h different convolution kernels are selected from each layer of convolution neural network, obtaining a dish sample block T by using formulas (2), (3) and (4) 1 Intermediate profile T of (2) 2
p= [ (P-e) +1] ×h formula (2),
m= [ (a-e) +1] formula (3),
Figure BDA0002883292150000092
wherein p represents a dish sample block T 1 The number of the included channels, e, is the number of operation layers of the channel dimension, a is the number of pixel points of the image block in the length and width directions of the plane, and m is the middle characteristic map T 2 The number of pixels in the space length and width directions, con 3D Representing performing a 3D convolution operation.
The step is to a dish sample block T 1 In the 3D convolution operation, the mapping of each feature in the convolution layer is connected with a plurality of adjacent continuous channels in the previous layer, and a certain position value of one convolution mapping is obtained by convolving three continuous channels of the previous layerObtained from a local receptive field at the same location; one convolution layer has a plurality of convolution kernels, one convolution kernel can only extract one type of characteristic information from three-dimensional data, and h types of characteristic information can be extracted by using h convolution kernels, wherein h is a positive integer, and h is>1。
Step S3, for a dish sample block T 1 Intermediate profile T of (2) 2 Performing pooling operation to obtain intermediate characteristic spectrum T 3 The specific operation comprises the following steps:
step S3.1, for dish sample block T 1 Intermediate profile T of (2) 2 Performing pooling operation, which is downsampling or discarding feature processing to obtain intermediate feature map T 3 At this time, the intermediate feature map T 3 Channel number and intermediate profile T 2 The number of channels is the same, and the size of a single channel in the space dimension is changed;
step S3.2, after pooling treatment, the intermediate characteristic spectrum T 3 Represented as
Figure BDA0002883292150000101
I.e. intermediate feature pattern T 3 The number of the pixel points of each channel in the space length and width directions is r, and the number r of the pixel points is calculated by using a formula (5):
r= (m/2) equation (5),
wherein m is an intermediate characteristic spectrum T 2 The number of pixels in the spatial length and width directions.
Step S4, middle characteristic spectrum T in space dimension 3 And carrying out pooling operation to obtain a channel attention module A 3 Intermediate feature pattern T in the channel dimension 3 Carrying out pooling operation to obtain a plane attention module A' 3 Intermediate characteristic spectrum T 3 Each channel vector and channel attention module, intermediate feature pattern T 3 Each spatial feature and the plane attention module are respectively subjected to phase-based multiplication to obtain an intermediate feature map T 4
Step S4 includes two aspects, namely obtaining a channel attention modelBlock A 3 And a planar attention module A' 3 On the other hand, an intermediate characteristic spectrum T is obtained 4
First, the get channel attention module A will be described 3 And a planar attention module A' 3 Is a specific procedure of (a).
Obtaining a channel attention module:
(1.1) first, the intermediate feature map T is in the spatial dimension 3 Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors;
(1.2) subsequently inputting the two pooled vectors into a shared multi-layer mapped neural network for training, respectively generating two new vectors;
(1.3) finally, adding the two new vectors bit by bit, and performing nonlinear mapping by using a Sigmoid activation function to obtain a channel attention module A by using formulas (11) and (12) 3 (T 3 );
Figure BDA0002883292150000111
A 3 (T 3 )=σ{MLP[AvePool(T 3 )]+MLP[MaxPool(T 3 )]Equation (12),
wherein σ represents a Sigmoid activation function, e is the operation layer number of channel dimension, MLP represents nonlinear mapping through a multi-layer neural network, avePool represents average pooling, and MaxPool represents maximum pooling.
(II) obtaining a planar attention module:
(2.1) first, the intermediate feature pattern T is in the channel dimension 3 Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors,
(2.2) subsequently, mapping the two pooled vectors to a single-channel, same-size model by convolution operation,
(2.3) finally, performing nonlinear mapping through the Sigmoid activation function to obtain a plane attention module A 'by using formulas (13) and (14)' 3 (T 3 ),
Figure BDA0002883292150000112
Figure BDA0002883292150000113
Where σ represents the Sigmoid activation function, e is the number of operation layers in the channel dimension,
Figure BDA0002883292150000114
representing feature transformation with a 1 x 1 convolutional neural network, aveboost represents mean pooling and MaxPool represents maximum pooling.
Subsequently, the intermediate feature map T is obtained as follows 4 Is characterized by comprising the following specific operations:
intermediate characteristic map T using formulas (6), (7) 3 Transforming to make the intermediate characteristic spectrum T 3 In turn in the channel direction and channel attention module A 3 Dot multiplication channel by channel, and module A 'of attention in space direction and plane' 3 Performing channel-by-channel dot multiplication to obtain an intermediate characteristic map T 4
Figure BDA0002883292150000115
Figure BDA0002883292150000116
Wherein Aten spe Representing the intermediate characteristic spectrum T 3 Attention enhancement in channel direction, aten spa Representing the intermediate characteristic spectrum T 3 The attention is enhanced in the space direction, u is the middle characteristic spectrum T 3 The (u) th pixel point contained in a single channel, r is an intermediate characteristic spectrum T 3 The number of pixels of a single channel in the space length and width directions, p is the middle characteristic map T 3 V is the intermediate characteristic spectrum T 3 V-th channel of (a), symbol
Figure BDA0002883292150000121
Representing the multiplication of elements of the same type of matrix corresponding to the same position.
Step S5, intermediate characteristic spectrum T 4 Sequentially performing 3D convolution operation and pooling operation to obtain an intermediate characteristic spectrum T 6 Intermediate feature pattern T in the spatial dimension 6 And carrying out pooling operation to obtain a channel attention module A 6 Intermediate feature pattern T in the channel dimension 6 Carrying out pooling operation to obtain a plane attention module A' 6 Intermediate characteristic spectrum T 6 Each channel vector and channel attention module, intermediate feature pattern T 6 Each spatial feature and the plane attention module are respectively subjected to phase-based multiplication to obtain an intermediate feature map T 7
The specific operation of implementing step S5 is:
step S5.1, utilizing the formula (8) to obtain the intermediate characteristic spectrum T 4 Performing 3D convolution operation to obtain an intermediate characteristic spectrum T 5 Intermediate feature map T 5 Namely, is
Figure BDA0002883292150000122
Figure BDA0002883292150000123
Wherein Con 3D Representing the 3D convolution operation, x representing the intermediate feature map T 5 The number of pixel points in the space height direction, y represents the intermediate characteristic map T 5 The number of pixels in the space length and width directions, r is the middle characteristic spectrum T 4 The number of pixels of a single channel in the space length and width directions, p is the middle characteristic map T 4 The number of channels;
step S5.2, intermediate characteristic spectrum T 5 Downsampling to obtain intermediate characteristic spectrum T 6 At this time, the intermediate feature map T 6 Channel number and intermediate profile T 5 The number of channels is the same, the size of a single channel in the space dimension is changed, and the middle characteristic spectrum T 6 The dimensions of the individual channels in the spatial length and width directions are:
z×z=[(y÷2)×(y÷2)],
wherein z is an intermediate characteristic spectrum T 6 The number of pixels in the space length and width directions, y is the middle characteristic map T 5 The number of pixels in the space length and width directions;
step S5.3, intermediate characteristic spectrum T using formulas (9), (10) 6 Performing feature transformation to obtain an intermediate feature map T 7 ,T 7 Namely, is
Figure BDA0002883292150000131
Figure BDA0002883292150000132
Figure BDA0002883292150000133
Wherein Aten spe Representing the intermediate characteristic spectrum T 6 Attention enhancement in channel direction, aten spa Representing the intermediate characteristic spectrum T 6 The attention is enhanced in the space direction, u is the middle characteristic spectrum T 6 The (u) th pixel point contained in a single channel, and z is an intermediate characteristic spectrum T 6 The number of pixels of a single channel in the space length and width directions, x is the middle characteristic spectrum T 6 V is the intermediate characteristic spectrum T 6 V-th channel of (a), symbol
Figure BDA0002883292150000135
Representing the multiplication of elements of the same type of matrix corresponding to the same position.
Obtaining channel attention from equation (10)Module A 6 And a planar attention module A' 6 The specific operation is as follows:
obtaining channel attention module A 6
(1.1) first, the intermediate feature map T is in the spatial dimension 6 Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors,
(1.2) subsequently, inputting the two pooled vectors into a shared multi-layer mapped neural network for training, respectively generating two new vectors,
and (1.3) finally, adding the two new vectors bit by bit, and performing nonlinear mapping through a Sigmoid activation function to obtain a channel attention module A by using formulas (11 ')and (12') 6 (T 6 ),
Figure BDA0002883292150000134
A 6 (T 6 )=σ{MLP[AvePool(T 6 )]+MLP[MaxPool(T 6 )]Equation (12'),
wherein sigma represents a Sigmoid activation function, e is the operation layer number of channel dimension, MLP represents nonlinear mapping through a multi-layer neural network, avePool represents average pooling, and MaxPool represents maximum pooling;
(II) obtaining a planar attention module:
(2.1) first, the intermediate feature pattern T is in the channel dimension 6 Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors,
(2.2) subsequently, mapping the two pooled vectors to a single-channel, same-size model by convolution operation,
and (2.3) finally, performing nonlinear mapping through a Sigmoid activation function, and obtaining a plane attention module A ' by using formulas (13 ')and (14 '). 6 (T 6 ),
Figure BDA0002883292150000141
Figure BDA0002883292150000142
Where σ represents the Sigmoid activation function, e is the number of operation layers in the channel dimension,
Figure BDA0002883292150000143
representing feature transformation with a 1 x 1 convolutional neural network, aveboost represents mean pooling and MaxPool represents maximum pooling.
Step S6, intermediate characteristic spectrum T 7 Performing 3D convolution operation to obtain a one-dimensional intermediate characteristic spectrum T 8 The specific operation is as follows:
s6.1, checking the vegetable intermediate characteristic map T by adopting the convolution of rho x z 7 Performing 3D convolution operation to obtain a one-dimensional intermediate characteristic spectrum T 8 I.e. intermediate profile T 8 Each channel only contains one pixel point, wherein ρ is the side length of the convolution size on the channel, and z×z is the size of the convolution window;
step S6.2, the vegetable intermediate characteristic map T 7 When 3D convolution operation is carried out, the number of convolution kernels is eta, the vector length of input convolution is alpha, and the vector length alpha after convolution is obtained by utilizing a formula (15):
α= [ (α - ρ) +1] ×η formula (15).
Step S7, the intermediate characteristic spectrum T 8 Inputting into a deep convolutional neural network to obtain a dish image R 1 The specific operation is as follows:
step S7.1, selecting a deep convolutional neural network whose activation function is a softmax function shown in formula (16), the softmax function being preceded by a layer of neural network, the softmax function being,
Figure BDA0002883292150000151
wherein Y is i Representing the Y-th in vector T i An element;
step S7.2, intermediate characteristic map T 8 After the deep convolutional neural network is input, a vector T is obtained through a layer of neural network, then enters a softmax function, elements in the vector T are mapped into a (0, 1) interval by the softmax function, probability vectors of the vector T are obtained, and a dish image R is obtained 1 The name of (2) is the name corresponding to the maximum probability value in the probability vector mapped by the softmax function.
In summary, the restaurant dish identification method based on the deep convolutional neural network can improve the identification precision, reduce the workload of manual auxiliary operation, and solve the defect that the current dish identification and classification method is difficult to finely divide dishes when the similarity of the dishes is high.
The foregoing has outlined rather broadly the principles and embodiments of the present invention in order that the detailed description of the invention may be better understood. Based on the above-mentioned embodiments of the present invention, any improvements and modifications made by those skilled in the art without departing from the principles of the present invention should fall within the scope of the present invention.

Claims (7)

1. The restaurant dish identification method based on the deep convolutional neural network is characterized by comprising the following steps of:
step S1, collecting a dish image R 1 For dish image R 1 Cutting pretreatment operation is carried out to obtain a dish image R 2 For dish image R 2 Sample block taking is carried out to obtain a dish sample block T 1 Dish sample block T 1 Namely, the characteristic information of the dish sample;
step S2, for a dish sample block T 1 Performing 3D convolution operation to obtain a dish sample block T 1 Intermediate profile T of (2) 2
Step S3, for a dish sample block T 1 Intermediate profile T of (2) 2 Carrying out pooling operation to obtain an intermediate characteristic spectrum T3;
step S4, (1) in the spatial dimensionIntermediate characteristic pattern T 3 And carrying out pooling operation to obtain a channel attention module A 3 The specific operation is as follows:
step S4.1.1, first, the intermediate feature map T is spatially scaled 3 Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors,
step S4.1.2, then, inputting the two pooled vectors into the shared multi-layer mapping neural network for training, respectively generating two new vectors,
s4.1.3, finally, adding the two new vectors bit by bit, and performing nonlinear mapping by using Sigmoid activation function to obtain the channel attention module A by using formulas (11) and (12) 3 (T 3 ),
Figure QLYQS_1
A 3 (T 3 )=σ{MLP[AvePool(T 3 )]+MLP[MaxPool(T 3 )]Equation (12),
wherein sigma represents a Sigmoid activation function, e is the operation layer number of channel dimension, MLP represents nonlinear mapping through a multi-layer neural network, avePool represents average pooling, and MaxPool represents maximum pooling;
(2) intermediate feature map T in the channel dimension 3 Carrying out pooling operation to obtain a plane attention module A' 3 The specific operation is as follows:
step S4.2.1 first, the intermediate feature pattern T is in the channel dimension 3 Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors,
step S4.2.2, then, mapping the two pooled vectors to a single-channel, same-size model by convolution operation,
step S4.2.3, finally, performing nonlinear mapping by using Sigmoid activation function to obtain a plane attention module A 'by using formulas (13) and (13)' 3 (T 3 ),
Figure QLYQS_2
Figure QLYQS_3
Where σ represents the Sigmoid activation function, e is the number of operation layers in the channel dimension,
Figure QLYQS_4
representing feature transformation by using a 1×1 convolutional neural network, wherein AvePool represents average pooling and MaxPool represents maximum pooling;
(3) intermediate characteristic spectrum T 3 Each channel vector and channel attention module, intermediate feature pattern T 3 Each spatial feature and the plane attention module are respectively subjected to phase-based multiplication to obtain an intermediate feature map T 4 The specific operation is as follows: intermediate characteristic map T using formulas (6), (7) 3 Transforming to make the intermediate characteristic spectrum T 3 In turn in the channel direction with channel attention module A 3 Dot multiplication channel by channel, and module A 'of attention in space direction and plane' 3 Performing channel-by-channel dot multiplication to obtain an intermediate characteristic map T 4
Figure QLYQS_5
/>
Figure QLYQS_6
Wherein Aten spe Representing the intermediate characteristic spectrum T 3 Attention enhancement in channel direction, aten spa Representing the intermediate characteristic spectrum T 3 The attention is enhanced in the space direction, u is the middle characteristic spectrum T 3 The (u) th pixel point contained in a single channel, r is an intermediate characteristic spectrum T 3 In the space length and width directionThe number of upward pixels, p is the middle characteristic map T 3 V is the intermediate characteristic spectrum T 3 V-th channel of (a), symbol
Figure QLYQS_7
Multiplying elements representing the same positions corresponding to the matrices of the same type;
step S5, (1) intermediate feature map T 4 Sequentially performing 3D convolution operation and pooling operation to obtain an intermediate characteristic spectrum T 6 The specific operation is as follows:
step S5.1.1, utilizing formula (8) to intermediate characteristic spectrum T 4 Performing 3D convolution operation to obtain an intermediate characteristic spectrum T 5 Intermediate feature map T 5 Namely, is
Figure QLYQS_8
Figure QLYQS_9
Wherein Con 3D Representing the 3D convolution operation, x representing the intermediate feature map T 5 The number of pixel points in the space height direction, y represents the intermediate characteristic map T 5 The number of pixels in the space length and width directions, r is the middle characteristic spectrum T 4 The number of pixels of a single channel in the space length and width directions, p is the middle characteristic map T 4 The number of channels;
step S5.1.2, intermediate characteristic atlas T 5 Downsampling to obtain intermediate characteristic spectrum T 6 At this time, the intermediate feature map T 6 Channel number and intermediate profile T 5 The number of channels is the same, the size of a single channel in the space dimension is changed, and the middle characteristic spectrum T 6 The dimensions of the individual channels in the spatial length and width directions are:
z×z=[(y÷2)×(y÷2)],
wherein z is an intermediate characteristic spectrum T 6 The number of pixels in the space length and width directions, y is the middle characteristicSyndrome pattern T 5 The number of pixels in the space length and width directions;
(2) intermediate feature map T in spatial dimension 6 And carrying out pooling operation to obtain a channel attention module A 6 The specific operation is as follows:
step S5.2.1, first, the intermediate feature map T is spatially scaled 6 Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors,
step S5.2.2, then, inputting the two pooled vectors into the shared multi-layer mapping neural network for training, respectively generating two new vectors,
s5.2.3, finally, adding the two new vectors bit by bit, and performing nonlinear mapping by using Sigmoid activation function to obtain the channel attention module A by using formulas (11 '), (12') 6 (T 6 ),
Figure QLYQS_10
A 6 (T 6 )=σ{MLP[AvePool(T 6 )]+MLP[MaxPool(T 6 )]Equation (12'),
wherein sigma represents a Sigmoid activation function, e is the operation layer number of channel dimension, MLP represents nonlinear mapping through a multi-layer neural network, avePool represents average pooling, and MaxPool represents maximum pooling;
(3) intermediate feature map T in the channel dimension 6 And carrying out pooling operation to obtain a plane attention module A'6, wherein the specific operation is as follows:
step S5.3.1 first, the intermediate feature pattern T is in the channel dimension 6 Respectively carrying out maximum pooling and average pooling operation to generate two pooling vectors,
step S5.3.2, then, mapping the two pooled vectors to a single-channel, same-size model by convolution operation,
step S5.3.3, finally, performing nonlinear mapping through a Sigmoid activation function, namely using formulas (13'), (1)4 ') obtaining a planar attention module A' 6 (T 6 ),
Figure QLYQS_11
Figure QLYQS_12
Where σ represents the Sigmoid activation function, e is the number of operation layers in the channel dimension,
Figure QLYQS_13
representing feature transformation by using a 1×1 convolutional neural network, wherein AvePool represents average pooling and MaxPool represents maximum pooling;
(4) intermediate characteristic spectrum T 6 Each channel vector and channel attention module, intermediate feature pattern T 6 Each spatial feature and the plane attention module are respectively subjected to phase-based multiplication to obtain an intermediate feature map T 7 ,T 7 Namely, is
Figure QLYQS_14
As shown in the formulas (9) and (10),
Figure QLYQS_15
Figure QLYQS_16
wherein Aten spe Representing the intermediate characteristic spectrum T 6 Attention enhancement in channel direction, aten spa Representing the intermediate characteristic spectrum T 6 The attention is enhanced in the space direction, u is the middle characteristic spectrum T 6 The (u) th pixel point contained in a single channel, and z is an intermediate characteristic spectrum T 6 Pixel point of single channel in space length and width directionsThe number x is the middle characteristic spectrum T 6 V is the intermediate characteristic spectrum T 6 V-th channel of (a), symbol
Figure QLYQS_17
Multiplying elements representing the same positions corresponding to the matrices of the same type;
step S6, intermediate characteristic spectrum T 7 Performing 3D convolution operation to obtain a one-dimensional intermediate characteristic spectrum T 8
Step S7, the intermediate characteristic spectrum T 8 Inputting into a deep convolutional neural network to obtain a dish image R 1 Is a result of classification of (a).
2. The restaurant dish recognition method based on the deep convolutional neural network as recited in claim 1, wherein the dish image R is 2 Sample block taking is carried out, and the concrete operation comprises the following steps:
s1.1, taking a dish image R in the plane dimension 2 Taking surrounding a multiplied by a pixel points as a neighborhood block of the sample, wherein a is the number of the pixel points of the image block in the length and width directions of the plane;
step S1.2, reserving all channel information of a×a pixel points, namely forming a three-dimensional sample block of p×a×a, wherein the three-dimensional sample block is used for representing sample characteristics of middle pixel points, and performing characteristic transformation in a sample block taking process by using a formula (1):
Figure QLYQS_18
wherein Q is the number of pixel points in a single channel, namely the number of block samples, D samp The method comprises the steps of representing a sample block taking process, wherein L and H represent preset plane length and width of a cutting operation;
in the sample block taking operation, when the edge pixel point has no space neighborhood information, 0 supplementing operation is carried out.
3. A depth-based convolution god according to claim 2The restaurant dish identification method through the network is characterized in that when the step S2 is executed, a dish sample block T is used for 1 Performing 3D convolution operations, the specific operations including:
step S2.1, based on the deep convolutional neural network, selecting h different convolutional kernels from each layer of convolutional neural network, and performing on a dish sample block T 1 The method comprises the steps that the included P channel information is subjected to convolution operation by using 3D convolution kernels with the size of e multiplied by f, wherein e is the operation layer number of a channel dimension, namely, e channels are selected for carrying out a group of convolution each time, and f represents the number of pixel points of an image block in the length direction and the width direction in the space dimension;
s2.2, after h different convolution kernels are selected from each layer of convolution neural network, obtaining a dish sample block T by using formulas (2), (3) and (4) 1 Intermediate profile T of (2) 2
p= [ (P-e) +1] ×h formula (2),
m= [ (a-e) +1] formula (3),
Figure QLYQS_19
wherein p represents a dish sample block T 1 The number of the included channels, e, is the number of operation layers of the channel dimension, a is the number of pixel points of the image block in the length and width directions of the plane, and m is the middle characteristic map T 2 The number of pixels in the space length and width directions, con 3D Representing performing a 3D convolution operation.
4. A restaurant dish identification method based on deep convolutional neural network as claimed in claim 3, wherein for dish sample block T 1 In the 3D convolution operation process, mapping of each feature in a convolution layer is connected with a plurality of adjacent continuous channels in the upper layer, and a certain position value of one convolution mapping is obtained by convolving local receptive fields of the same position of three continuous channels of the upper layer; one convolution layer has multiple convolution kernels, one convolution kernel can only extract one type of characteristic information from three-dimensional data, using hThe convolution kernel may extract h types of feature information, where h is a positive integer, and h>1。
5. The restaurant dish recognition method based on the deep convolutional neural network as recited in claim 3, wherein the step S3 is performed to obtain an intermediate characteristic spectrum T 3 The specific operations of (a) include:
step S3.1, for dish sample block T 1 Intermediate profile T of (2) 2 Performing pooling operation, which is downsampling or discarding feature processing to obtain intermediate feature map T 3 At this time, the intermediate feature map T 3 Channel number and intermediate profile T 2 The number of channels is the same, and the size of a single channel in the space dimension is changed;
step S3.2, after pooling treatment, the intermediate characteristic spectrum T 3 Represented as
Figure QLYQS_20
I.e. intermediate feature pattern T 3 The number of the pixel points of each channel in the space length and width directions is r, and the number r of the pixel points is calculated by using a formula (5):
r= (m/2) equation (5),
wherein m is an intermediate characteristic spectrum T 2 The number of pixels in the spatial length and width directions.
6. The restaurant dish recognition method based on the deep convolutional neural network as recited in claim 1, wherein step S6 is performed to obtain an intermediate feature map T 8 The specific operation of (a) is as follows:
s6.1, checking the vegetable intermediate characteristic map T by adopting the convolution of rho x z 7 Performing 3D convolution operation to obtain a one-dimensional intermediate characteristic spectrum T 8 I.e. intermediate profile T 8 Each channel only contains one pixel point, wherein ρ is the side length of the convolution size on the channel, and z×z is the size of the convolution window;
step S6.2, the vegetable intermediate characteristic map T 7 When 3D convolution operation is performedThe number of convolution kernels adopted is eta, the vector length of input convolution is alpha, and the vector length alpha after convolution is obtained by utilizing a formula (15):
α= [ (α - ρ) +1] ×η formula (15).
7. The restaurant dish recognition method based on the deep convolutional neural network as recited in claim 1, wherein the step S7 is performed to obtain a dish image R 1 The specific operation is as follows:
step S7.1, selecting a deep convolutional neural network whose activation function is a softmax function shown in formula (16), the softmax function being preceded by a layer of neural network, the softmax function being,
Figure QLYQS_21
wherein Y is i Representing the Y-th in vector T i An element;
step S7.2, intermediate characteristic map T 8 After the deep convolutional neural network is input, a vector T is obtained through a layer of neural network, then enters a softmax function, elements in the vector T are mapped into a (0, 1) interval by the softmax function, probability vectors of the vector T are obtained, and a dish image R is obtained 1 The name of (2) is the name corresponding to the maximum probability value in the probability vector mapped by the softmax function.
CN202110006146.7A 2021-01-05 2021-01-05 Restaurant dish identification method based on deep convolutional neural network Active CN112699822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110006146.7A CN112699822B (en) 2021-01-05 2021-01-05 Restaurant dish identification method based on deep convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110006146.7A CN112699822B (en) 2021-01-05 2021-01-05 Restaurant dish identification method based on deep convolutional neural network

Publications (2)

Publication Number Publication Date
CN112699822A CN112699822A (en) 2021-04-23
CN112699822B true CN112699822B (en) 2023-05-30

Family

ID=75514577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110006146.7A Active CN112699822B (en) 2021-01-05 2021-01-05 Restaurant dish identification method based on deep convolutional neural network

Country Status (1)

Country Link
CN (1) CN112699822B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845527A (en) * 2016-12-29 2017-06-13 南京江南博睿高新技术研究院有限公司 A kind of vegetable recognition methods
CN107578060A (en) * 2017-08-14 2018-01-12 电子科技大学 A kind of deep neural network based on discriminant region is used for the method for vegetable image classification
CN109377205A (en) * 2018-12-06 2019-02-22 深圳市淘米科技有限公司 A kind of cafeteria's intelligence settlement system based on depth convolutional network
CN110689056A (en) * 2019-09-10 2020-01-14 Oppo广东移动通信有限公司 Classification method and device, equipment and storage medium
CN111667489A (en) * 2020-04-30 2020-09-15 华东师范大学 Cancer hyperspectral image segmentation method and system based on double-branch attention deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845527A (en) * 2016-12-29 2017-06-13 南京江南博睿高新技术研究院有限公司 A kind of vegetable recognition methods
CN107578060A (en) * 2017-08-14 2018-01-12 电子科技大学 A kind of deep neural network based on discriminant region is used for the method for vegetable image classification
CN109377205A (en) * 2018-12-06 2019-02-22 深圳市淘米科技有限公司 A kind of cafeteria's intelligence settlement system based on depth convolutional network
CN110689056A (en) * 2019-09-10 2020-01-14 Oppo广东移动通信有限公司 Classification method and device, equipment and storage medium
CN111667489A (en) * 2020-04-30 2020-09-15 华东师范大学 Cancer hyperspectral image segmentation method and system based on double-branch attention deep learning

Also Published As

Publication number Publication date
CN112699822A (en) 2021-04-23

Similar Documents

Publication Publication Date Title
CN107563385B (en) License plate character recognition method based on depth convolution production confrontation network
CN104834922B (en) Gesture identification method based on hybrid neural networks
CN113239954B (en) Attention mechanism-based image semantic segmentation feature fusion method
CN111753828B (en) Natural scene horizontal character detection method based on deep convolutional neural network
CN110276402B (en) Salt body identification method based on deep learning semantic boundary enhancement
CN108304357B (en) Chinese character library automatic generation method based on font manifold
CN102096819B (en) Method for segmenting images by utilizing sparse representation and dictionary learning
CN105894045A (en) Vehicle type recognition method with deep network model based on spatial pyramid pooling
CN112241679B (en) Automatic garbage classification method
CN106599863A (en) Deep face identification method based on transfer learning technology
CN109117703B (en) Hybrid cell type identification method based on fine-grained identification
CN104182771B (en) Based on the graphical analysis method of the time series data with packet loss automatic coding
CN111553837A (en) Artistic text image generation method based on neural style migration
CN102122353A (en) Method for segmenting images by using increment dictionary learning and sparse representation
CN110751072B (en) Double-person interactive identification method based on knowledge embedded graph convolution network
CN114019467A (en) Radar signal identification and positioning method based on MobileNet model transfer learning
CN111401380A (en) RGB-D image semantic segmentation method based on depth feature enhancement and edge optimization
CN110992374A (en) Hair refined segmentation method and system based on deep learning
CN107679539A (en) A kind of single convolutional neural networks local message wild based on local sensing and global information integration method
CN110163855B (en) Color image quality evaluation method based on multi-path deep convolutional neural network
CN114998756A (en) Yolov 5-based remote sensing image detection method and device and storage medium
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN113505856A (en) Hyperspectral image unsupervised self-adaptive classification method
CN112699822B (en) Restaurant dish identification method based on deep convolutional neural network
CN111127407B (en) Fourier transform-based style migration forged image detection device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant