CN112070077A - Deep learning-based food identification method and device - Google Patents

Deep learning-based food identification method and device Download PDF

Info

Publication number
CN112070077A
CN112070077A CN202011274995.2A CN202011274995A CN112070077A CN 112070077 A CN112070077 A CN 112070077A CN 202011274995 A CN202011274995 A CN 202011274995A CN 112070077 A CN112070077 A CN 112070077A
Authority
CN
China
Prior art keywords
food
image
feature map
camera
identification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011274995.2A
Other languages
Chinese (zh)
Other versions
CN112070077B (en
Inventor
裘实
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Health Hope (beijing) Technology Co ltd
Original Assignee
Health Hope (beijing) Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Health Hope (beijing) Technology Co ltd filed Critical Health Hope (beijing) Technology Co ltd
Priority to CN202011274995.2A priority Critical patent/CN112070077B/en
Publication of CN112070077A publication Critical patent/CN112070077A/en
Application granted granted Critical
Publication of CN112070077B publication Critical patent/CN112070077B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/68Food, e.g. fruit or vegetables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a food identification method and a device based on deep learning, which are characterized in that an image to be identified is obtained, food identification is carried out on the image to be identified, and the image to be identified containing food is determined as a first identification image; then, food target detection is carried out on the first identification image, and position parameters of food in the first identification image are determined, wherein the position parameters comprise coordinates of the food, and the length and the width of an area where the food is located in the first identification image; determining a second identification image according to the first identification image and the position parameter; and finally, carrying out food identification on the second identification image to determine the name of the food. The scheme provided by the invention can solve the problem that the food identification mode is complicated.

Description

Deep learning-based food identification method and device
Technical Field
The invention relates to the technical field of computer vision, in particular to a food identification method and device based on deep learning.
Background
Due to the rise of the communication age and society, the living standard of people is also improved. In the diet process, as more and more people who are healthy, lose weight or pay attention to diet health are required to search relevant information corresponding to each food before eating the food, and the user can be better informed of the food eaten by the user comprehensively.
However, when a user identifies food, the user often needs to perform an inquiry in a web search manner, which also causes the user to have a cumbersome operation, thereby affecting the user experience.
Disclosure of Invention
The invention aims to solve the technical problem that a food identification mode is complicated, and provides a food identification method and device based on deep learning aiming at the defects in the prior art.
In order to solve the above technical problem, the present invention provides a food identification method, including:
acquiring an image to be identified;
performing food identification on the image to be identified, and determining the image to be identified containing food as a first identification image;
performing food target detection on the first identification image, and determining position parameters of food in the first identification image, wherein the position parameters comprise coordinates of the food, and the length and the width of an area where the food is located in the first identification image;
determining a second identification image according to the first identification image and the position parameter;
and performing food identification on the second identification image to determine the name of the food.
In a possible implementation manner, the performing food recognition on the image to be recognized includes:
carrying out food identification on the image to be identified by utilizing a pre-constructed neural network;
wherein the first neural network is constructed by:
3 x 3 convolution kernels are adopted in the first layer to the twelfth layer, wherein a characteristic pyramid structure is added between the fifth layer and the eighth layer and is used for fusing low-dimensional characteristics and high-dimensional characteristics;
applying 5 x 5 convolution kernels to the thirteenth to seventeenth layers;
adopting a flatten layer on the eighteenth layer;
a fully connected layer is employed at the nineteenth layer, and a softmax classifier is employed after the fully connected layer.
In a possible implementation manner, the performing food recognition on the first recognition image and determining a position parameter of the food in the first recognition image includes:
carrying out six times of feature extraction on the first identification image to obtain a first high-dimensional feature map;
performing primary feature extraction on the first identification image to obtain a first low-dimensional feature map;
carrying out feature extraction on the first identification image for three times to obtain a second low-dimensional feature map;
performing five times of feature extraction on the first identification image to obtain a third low-dimensional feature map;
performing feature fusion on the first high-dimensional feature map, the first low-dimensional feature map, the second low-dimensional feature map and the third low-dimensional feature map to obtain a first fused feature map;
and determining the position parameter of the food in the first identification image according to the first fusion feature map.
In one possible implementation manner, the determining the position parameter of the food in the first recognition image according to the first fusion feature map includes:
carrying out feature extraction on the first fusion feature map, and sending the features subjected to feature extraction to a classifier;
performing feature extraction on the first fusion feature map, performing three times of maximum pooling on the features subjected to feature extraction, and performing feature fusion on the features subjected to the three times of maximum pooling to obtain a second fusion feature map;
extracting the features of the second fusion feature map, and sending the features after feature extraction to a position regression device;
determining a location parameter of the food in the first identification image according to the classifier and the location regressor.
In a possible implementation manner, the performing food recognition on the second recognition image and determining the name of the food includes:
performing first downsampling processing on the second identification image to obtain a first downsampling feature map;
performing second downsampling processing on the first downsampling feature map to obtain a second downsampling feature map;
performing feature extraction on the second downsampling feature map to obtain a first extracted feature map;
performing second downsampling processing on the first extracted feature map to obtain a third downsampled feature map;
performing second downsampling processing on the third downsampled feature map to obtain a fourth downsampled feature map;
performing feature extraction on the fourth down-sampling feature map to obtain a second extracted feature map;
performing feature fusion on the first downsampling feature map, the first extracted feature map, the fourth downsampling feature map and the second extracted feature map to obtain a third fused feature map;
giving weight exceeding a preset threshold value to the detail features in the third fusion feature map to obtain a target fusion feature map;
and after the target fusion characteristic graph is subjected to pooling treatment, determining the name of food through full-connection layer classification.
In one possible implementation, the image to be recognized is taken by a rotatable camera from at least several positions above the food;
after the acquiring the image to be recognized, further comprising:
acquiring the quality of food to be detected;
identifying the image to be identified by utilizing a pre-constructed food space parameter model to obtain a space parameter of the food to be identified, wherein the space parameter is used for representing a three-dimensional profile of the food to be identified;
identifying the image to be identified by using a pre-established food color identification model to obtain the color of the food to be identified;
and determining the quality of the food to be detected according to the quality, the space parameter and the color of the food to be detected.
In a possible implementation manner, the rotatable camera includes a first camera and a second camera, wherein the camera internal parameters of the first camera and the second camera are the same, optical axes of the first camera and the second camera are parallel to each other, an X axis of a first camera coordinate system of the first camera and an X axis of a second camera coordinate system of the second camera coincide with each other, a Y axis of the first camera coordinate system and a Y axis of the second camera coordinate system are parallel to each other, a distance between an origin of the first camera coordinate system and an origin of the second camera coordinate system on the X axis is b, the first camera coordinate system uses the optical center of the first camera as the origin, a three-dimensional rectangular coordinate system is established by using the optical axis as the Z axis, the second camera coordinate system uses the optical center of the second camera as the origin, a three-dimensional rectangular coordinate system is established by using the optical axis as the Z axis, when the rotatable camera shoots the food to be detected, setting an included angle between optical axes of the first camera and the second camera to be
Figure 832119DEST_PATH_IMAGE001
Degree;
the food space parameter model is constructed in the following way:
acquiring a plurality of preset standard images of standard foods, wherein each standard image of the standard foods is a plurality of standard images;
for each standard food, the following were performed:
a1, determining the projection point coordinate of any space point of the current standard food in each standard image in the standard food
Figure 952522DEST_PATH_IMAGE002
First coordinates in the first camera coordinate system
Figure 450368DEST_PATH_IMAGE003
And second coordinates in the second camera coordinate system
Figure 48840DEST_PATH_IMAGE004
A2, obtaining a first formula for representing the first coordinate according to the corresponding relation between the first coordinate and the second coordinate, wherein the corresponding relation is as follows:
Figure 597633DEST_PATH_IMAGE005
the first formula is:
Figure 888937DEST_PATH_IMAGE006
a3, obtaining a second formula according to the central projection relation between the current space point and the projection point coordinate, wherein the second formula is as follows:
Figure 874079DEST_PATH_IMAGE007
wherein, the
Figure 541821DEST_PATH_IMAGE008
For characterizing a focal length of a pixel of the first camera in an X-axis, the
Figure 679541DEST_PATH_IMAGE009
For characterizing a focal length of a pixel of the first camera in a Y-axis, the
Figure 141747DEST_PATH_IMAGE010
Is the center pixel coordinate of the standard image,
Figure 614185DEST_PATH_IMAGE011
linear model parameters of the first camera respectively representing the central pixel coordinate of the standard image and the origin of the standard imageHorizontal pixel values and vertical pixel values between pixel coordinates,
Figure 820039DEST_PATH_IMAGE012
the linear model parameters of the second camera are respectively used for representing the horizontal pixel value and the vertical pixel value between the central pixel coordinate of the standard image and the origin pixel coordinate of the standard image;
a4, obtaining a third formula for representing the space point in the first camera coordinate system according to the first formula and the second formula, wherein the third formula is as follows:
Figure 812265DEST_PATH_IMAGE014
a5, determining the corresponding coordinates of each space point of the current standard food in the first camera coordinate system according to the third formula;
and obtaining a food space parameter model according to the corresponding coordinates of each space point in each standard food in the first camera coordinate system.
The present invention also provides a food recognition apparatus, comprising:
the acquisition module is used for acquiring an image to be identified;
the first identification module is used for identifying food in the image to be identified and determining the image to be identified containing food as a first identification image;
the second identification module is used for carrying out food target detection on the first identification image and determining position parameters of food in the first identification image, wherein the position parameters comprise coordinates of the food, and the length and the width of an area where the food is located in the first identification image; determining a second identification image according to the first identification image and the position parameter;
and the third identification module is used for identifying the food in the second identification image and determining the name of the food.
The present invention also provides a food recognition apparatus, comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform the method as described above.
The invention also provides a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method as described above.
The food recognition method and the device based on deep learning have the following beneficial effects that:
firstly, food identification is carried out on an image to be identified, and the image to be identified containing food is determined as a first identification image; then, food target detection is carried out on the first identification image, and position parameters of the food in the first identification image are determined, wherein the position parameters comprise coordinates of the food, and the length and the width of an area where the food is located in the first identification image; determining a second identification image according to the first identification image and the position parameter; and finally, carrying out food identification on the second identification image to determine the name of the food. Through the technical scheme, the method and the device have the advantages that the name of the food is determined to be divided into three stages, namely, whether the image to be recognized contains the food or not, the position of the food in the first recognition image and the name of the food are determined, so that the name of the food can be quickly marked on the image to be recognized, and the problem that a user can know the related information of the food only by manually inquiring the name of the food is avoided.
Drawings
FIG. 1 is a flow chart of a method of food identification provided by one embodiment of the present invention;
FIG. 2 is a schematic view of an apparatus in which a food identifying device is located according to an embodiment of the present invention;
FIG. 3 is a schematic view of a food recognition device provided in accordance with an embodiment of the present invention;
fig. 4 is a schematic diagram of a food recognition system provided by an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the food identification method provided by the embodiment of the present invention includes:
step 101, obtaining an image to be identified.
The image to be recognized may be derived from an image acquired by the recognition end in real time, for example, the recognition end is a smart phone, the smart phone is configured with a camera, or an image stored in advance by the recognition end, for example, the recognition end is a server, and is acquired by local reading or network transmission.
In other words, for the food recognition device disposed at the recognition end, the image to be recognized collected in real time may be obtained so as to perform food recognition on the image to be recognized in real time, and the image to be recognized collected in a historical time period may also be obtained so as to perform food recognition on the image to be recognized when the processing task is less, or perform food recognition on the image to be recognized under the instruction of the operator, which is not specifically limited in this embodiment.
Further, for the camera shooting assembly configured at the recognition end, if the camera shooting assembly can be used as an independent device, such as a camera, a video recorder and the like, the camera shooting assembly can be arranged around the environment where food is located, so that the food can be shot from different angles, images to be recognized reflecting the food from different angles can be obtained, and the accuracy of subsequent posture recognition can be guaranteed.
It should be noted that the shooting may be a single shooting or a continuous shooting, and accordingly, in the case of a single shooting, the obtained image to be recognized is a picture, and in the case of a continuous shooting, a video including a plurality of images to be recognized is obtained. Therefore, in each embodiment of the present invention, the image to be recognized for food recognition may be a single picture taken at a time, or may also be a certain image to be recognized in a section of video taken continuously, which is not specifically limited by the present invention.
It should be noted that, when the mobile terminal is used for food identification, a first neural network, a second neural network and a third neural network are pre-constructed in the mobile terminal, wherein the first neural network is used for food identification of an image to be identified, and the image to be identified containing food is determined as a first identification image; the second neural network is used for carrying out food identification on the first identification image and determining the position parameters of the food in the first identification image; and the third neural network is used for carrying out food identification on the second identification image and determining the name of the food. Correspondingly, the training process of the neural network is trained aiming at the preset food type picture of the mobile terminal. Specifically, for example, by means of crawler, shooting, purchasing and labeling, more than 2000 types of common food categories are collected, and a total number of 1200 ten thousand pictures are collected, wherein the pictures comprise a top-down shooting mode, a side-down shooting mode, a flat-shooting mode and the like, and operations such as picture translation, turning, gray scale, sharpening and the like are simulated by using traditional image processing tools such as opencv and the like, so that the generalization capability of the first neural network, the second neural network and the third neural network for identification is increased. For example, the first neural network uses an SGD iterator, the initial learning rate is 0.04, iterates for 25 thousands of steps, the Batch size is 256, trains for 1 week, and acc (accuracy) reaches 99.5%; the second neural network uses Adam iterator, initial learning rate is 0.05, iteration is 30 ten thousand steps, Batch size is 128, training is 2 weeks, mAP (mean Average Precision) reaches 0.654; the third neural network used the RMSprop iterator with an initial learning rate of 0.03, 60 ten thousand iterations, a batch size of 256, trained for 3 weeks, and acc of 98.3%.
And 102, identifying food for the image to be identified, and determining the image to be identified containing the food as a first identification image.
In the step, food identification can be carried out on the image to be identified by utilizing a pre-constructed neural network;
wherein the first neural network is constructed by:
3 x 3 convolution kernels are adopted in the first layer to the twelfth layer, wherein a characteristic pyramid structure is added between the fifth layer and the eighth layer and is used for fusing low-dimensional characteristics and high-dimensional characteristics;
applying 5 x 5 convolution kernels to the thirteenth to seventeenth layers;
adopting a flatten layer (namely a full connection layer) on the eighteenth layer;
a fully connected layer is used at the nineteenth layer and a softmax classifier is used after the fully connected layer.
In this embodiment, the convolution kernel may adopt DW convolution (Depth Wise Conv), and it is experimentally demonstrated that the operation time can be reduced by 45% in the first neural network by using the DW convolution kernel compared with the ordinary convolution kernel. The first neural network adopts a nineteen-layer structure to provide a good Receptive Field (RF), wherein 3 x 3 convolution kernels are adopted from the first layer to the twelfth layer to extract low-dimensional features so as to ensure that the features are not lost; the operation speed of the convolution kernel of 5 x 5 is faster than that of 2 convolution kernels of 3 x 3 in the thirteenth layer to the seventeenth layer; and a characteristic pyramid structure is added between the fifth layer and the eighth layer and is used for fusing low-dimensional characteristics and high-dimensional characteristics, the low-dimensional characteristics mainly focus on local information of the image, and the high-dimensional characteristics mainly focus on overall information of the image, so that the low-dimensional characteristics and the high-dimensional characteristics are fused, and the accuracy of food identification can be improved. According to experimental demonstration, the mAP is improved by 3.5 by adding the characteristic pyramid structure compared with the characteristic pyramid structure, and the operation time is reduced by 4.5% by using 5 × 5 convolution kernels to replace 3 × 3 convolution kernels.
It should be noted that the feature pyramid structure is a basic component in a recognition system for detecting objects with different scales (i.e., low-dimensional features and high-dimensional features), which is well known to those skilled in the art and will not be described herein.
Step 103, performing food target detection on the first identification image, and determining the position parameter of the food in the first identification image.
In this step, the position parameters include coordinates of the food, and a length and a width of the region in the first recognition image where the food is located. That is, by performing feature extraction on the image to be recognized using a neural network including a plurality of feature extraction modules (e.g., residual network modules), a bounding box (bounding box) of the food in the first recognition image can be obtained.
As previously mentioned, the first recognition image may be physically recognized using a second neural network, the specific operation of which is described below.
In an embodiment of the present invention, step 103 specifically includes the following steps:
carrying out six times of feature extraction on the first identification image to obtain a first high-dimensional feature map;
performing primary feature extraction on the first identification image to obtain a first low-dimensional feature map;
carrying out feature extraction on the first identification image for three times to obtain a second low-dimensional feature map;
performing five times of feature extraction on the first recognition image to obtain a third low-dimensional feature map;
performing feature fusion on the first high-dimensional feature map, the first low-dimensional feature map, the second low-dimensional feature map and the third low-dimensional feature map to obtain a first fusion feature map;
according to the first fused feature map, the position parameters of the food in the first recognition image are determined.
In the embodiment of the invention, by performing feature fusion (for example, feature fusion by means of Concat) on feature maps of different scales (i.e., a first high-dimensional feature map, a first low-dimensional feature map, a second low-dimensional feature map and a third low-dimensional feature map), the global features of food in an image to be identified and local features of important interest can be effectively fused together, so that the feature robustness of the fused feature map can be greatly enhanced, and the determined position parameters of the food in the first identification image are more accurate.
In addition, for example, a residual network module may be used to perform feature extraction on an image to be recognized, where in the residual network module, DW convolutions of two layers may be specifically used to perform feature extraction to reduce operation time, DW convolutions of one layer may be a convolution kernel of 1 × 1, DW convolutions of another layer may be a convolution kernel of 3 × 3 to extract fine-grained features, then a Relu activation function is used to improve the fitting capability on nonlinear data and prevent overfitting, and finally a high-dimensional feature and a low-dimensional feature in each residual network module are subjected to feature fusion in an Add manner to prevent feature loss.
Further, in an embodiment of the present invention, determining the position parameter of the food in the first recognition image according to the first fused feature map includes:
extracting the features of the first fusion feature map, and sending the features after feature extraction to a classifier;
performing feature extraction on the first fusion feature map, performing three times of maximum pooling on the features subjected to feature extraction, and performing feature fusion on the features subjected to the three times of maximum pooling to obtain a second fusion feature map;
extracting the features of the second fusion feature map, and sending the features after feature extraction to a position regression device;
according to the classifier and the location regressor, a location parameter of the food in the first recognition image is determined.
In the embodiment of the present invention, the first fused feature map is subjected to feature extraction, and the features after feature extraction are subjected to three times of maximum pooling, for example, the maxporoling (i.e., maximum pooling) layers of 5 × 5, 9 × 9 and 13 × 13 can be used, the maxporoling layers have feature dimensionality reduction, the parameter quantity of the neural network is reduced, the maximum representative features can be extracted, and the position and rotation invariance of the features can be maintained, which has a greater effect on processing the global features, while the convolution size using 5 × 5, 9 × 9 and 13 × 13 is determined because under the cpu condition, the 3 × 3 convolution with the same sense requires multiple layers, the computation speed of the multiple layers 3 × 3 convolution is slower than that of a single layer 5 × 5 or 9 × 9 or 13, and the computation time using the maxporoling layers with the size is reduced by 23.6% compared with that of the maxporoling layer using 3 without loss.
In addition, the second fused feature map obtained by three times of pooling and feature fusion is finally sent to a location regressor (location head), and the features which are not pooled are sent to a classifier (classification head), so that the determined position parameters of the food in the first identification image can be more accurate. This is because the maximum pooling may lose part of the detail features, which have a decisive factor in the classification process for the classification accuracy of the algorithm, while in the position regression process, the overall appearance feature has a great influence on the accuracy and the detail feature has a small influence.
And 104, determining a second identification image according to the first identification image and the position parameter.
It is understood that the second identification image is an image region containing food, i.e., can be understood as a region of interest (ROI) image.
And 105, identifying the food in the second identification image, and determining the name of the food.
As described above, the second recognition image may be subjected to food recognition using a third neural network, and a specific operation of the third neural network will be described below.
In an embodiment of the present invention, step 105 specifically includes:
performing first downsampling processing on the second identification image to obtain a first downsampling feature map;
performing second downsampling processing on the first downsampling feature map to obtain a second downsampling feature map;
performing feature extraction on the second down-sampling feature map to obtain a first extracted feature map;
performing second downsampling processing on the first extracted feature map to obtain a third downsampled feature map;
performing second downsampling processing on the third downsampling feature map to obtain a fourth downsampling feature map;
performing feature extraction on the fourth down-sampling feature map to obtain a second extracted feature map;
performing feature fusion on the first downsampling feature map, the first extracted feature map, the fourth downsampling feature map and the second extracted feature map to obtain a third fused feature map;
giving weight exceeding a preset threshold value to the detail features in the third fusion feature map to obtain a target fusion feature map;
and after the target fusion characteristic graph is subjected to pooling treatment, determining the name of the food through full-connection layer classification.
In the embodiment of the present invention, first, a first downsampling process may be performed on the second identification image by using a SepConv module, where the SepConv module may perform convolution on each channel of the input channel to obtain feature maps with the same number as the input channels, and then aggregate the values of each feature map by using a plurality of convolutions of 1 × 1 to obtain an output feature map, which mainly functions in downsampling to remove redundant features, for example, reducing the original feature map with a size of 224 × 224 to a feature map with a size of 112 × 112 to reduce the operation time.
Secondly, when the first sampling feature map is subjected to second down-sampling processing, convolution of 1 × 1, 3 × 3 and 1 × 1 can be adopted in sequence, and the convolution plays a transition role in the algorithm; meanwhile, after the first down-sampling processing, the second down-sampling processing is performed, which is beneficial to extracting key features in the first down-sampling feature map, and then the second down-sampling processing is performed, the low-dimensional features and the high-dimensional features extracted by convolution of 1 x 1, 3 x 3 and 1 x 1 can be fused in an Add mode, so that the fitting performance of the third neural network on complex data is improved.
In addition, the MBConv module may be used to perform feature extraction on the second downsampled feature map and perform feature extraction on the fourth downsampled feature map, and the MBConv module is well known to those skilled in the art and will not be described herein again.
The first downsampling feature map, the first extracted feature map, the fourth downsampling feature map and the second extracted feature map are subjected to feature fusion, so that the obtained third fused feature map has more detailed features; and then, the weight exceeding the preset threshold is given to the detail features in the third fused feature map, so that the detail features in the obtained target fused feature map can be more concerned, and the identification accuracy of similar foods can be improved. For example, in the experimental process, for the pineapple and the pineapple, the difference of the external forms of the pineapple and the pineapple is not obvious, but the detail features have differences (such as the difference of meat thorn and color), and the identification accuracy of the pineapple and the pineapple can be improved by 18.3% by weighting the detail features in the third fused feature map to exceed a preset threshold value.
In summary, the food identification method provided by the embodiment of the present invention first performs food identification on an image to be identified, and determines the image to be identified containing food as a first identification image; then, food recognition is carried out on the first recognition image, and position parameters of the food in the first recognition image are determined, wherein the position parameters comprise coordinates of the food, and the length and the width of an area where the food is located in the first recognition image; determining a second identification image according to the first identification image and the position parameter; and finally, carrying out food identification on the second identification image to determine the name of the food. Through the technical scheme, the method and the device have the advantages that the name of the food is determined to be divided into three stages, namely, whether the image to be recognized contains the food or not, the position of the food in the first recognition image and the name of the food are determined, so that the name of the food can be quickly marked on the image to be recognized, and the problem that a user can know the related information of the food only by manually inquiring the name of the food is avoided.
In addition to the above-described identification of the kind of food, the quality of the food can also be identified.
In one embodiment of the invention, the image to be recognized is obtained by shooting from a rotatable camera at least from a plurality of positions above the food;
after acquiring the image to be identified, the method further comprises the following steps:
acquiring the quality of food to be detected;
identifying an image to be identified by utilizing a pre-constructed food space parameter model to obtain a space parameter of the food to be detected, wherein the space parameter is used for representing a three-dimensional profile of the food to be detected;
identifying the image to be identified by utilizing a pre-constructed food color identification model to obtain the color of the food to be detected;
and determining the quality of the food to be detected according to the quality, the space parameter and the color of the food to be detected.
In the embodiment of the invention, the quality of the food to be detected and a plurality of images to be identified of the food to be detected are obtained, wherein the images to be identified are shot by a rotatable camera at least from a plurality of positions above the food to be detected; identifying a plurality of images to be identified by utilizing a pre-constructed food space parameter model to obtain space parameters of food to be identified, wherein the space parameters are used for representing a three-dimensional profile of the food to be identified; identifying a plurality of images to be identified by using a pre-established food color identification model to obtain the color of the food to be identified; and determining the quality of the food to be detected according to the quality, the space parameter and the color of the food to be detected, wherein the density of the food to be detected is obtained according to the quality and the space parameter of the food to be detected, and the quality of the food to be detected is accurately and quickly identified according to the density of the food to be detected by utilizing a pre-established density classifier. The quality of the food to be detected is identified through the density classifier and the food color identification model, and the identification accuracy can be improved. The food quality identification method based on the image recognition technology is integrated with the image recognition technology to finish the food quality identification, and can solve the problem that the food quality cannot be accurately identified by the common people.
It should be noted that the quality of the food to be identified and the obtaining of the image to be identified are realized by the food identification system as shown in fig. 4. The food identification system comprises a rotary weighing platform 1 and a rotatable camera 2, food to be measured is placed on the rotary weighing platform 1 for weighing, the rotary weighing platform 1 is connected with the rotatable camera 2, and the rotatable camera 2 can rotate by taking the center of the rotary weighing platform 1 as the center of a circle.
The food to be detected is placed on the rotary weighing platform 1, the quality of the food to be detected can be obtained, meanwhile, the rotatable camera 2 is utilized to shoot the food to be detected at a plurality of set angles, and a plurality of images to be recognized are obtained. Since the food to be measured is an irregular three-dimensional entity, the three-dimensional outline of the food to be measured can be reflected to the maximum extent only by shooting from a plurality of angles of the food. Meanwhile, when the rotatable camera 2 shoots the food to be detected, the linear distance between the rotatable camera 2 and the food to be detected can be kept unchanged (namely, the distance from the rotatable camera 2 to the center of the rotary weighing platform 1 is considered to be unchanged), only one variable with different shooting angles is maintained, and the space vector of the food to be detected can be obtained according to the obtained plurality of images to be identified.
In one embodiment of the present invention, the rotatable camera comprises a first camera and a second camera, wherein the camera internal parameters of the first camera and the second camera are the same, the optical axes are parallel to each other, the first camera coordinate system of the first camera and the X-axis of the second camera coordinate system of the second camera coincide with each other, the first camera coordinate system and the Y-axis of the second camera coordinate system are parallel to each other, the distance between the origin of the first camera coordinate system and the origin of the second camera coordinate system on the X-axis is b, the first camera coordinate system takes the optical center of the first camera as the origin, a three-dimensional rectangular coordinate system established by taking the optical axis as the Z axis, a three-dimensional rectangular coordinate system established by taking the optical center of the second camera as the origin and the optical axis as the Z axis, when the rotatable camera shoots the food to be measured, the included angle between the optical axes of the first camera and the second camera is set to be.
Figure 710951DEST_PATH_IMAGE001
Degree;
the food space parameter model is constructed by the following method:
acquiring a plurality of preset standard images of standard foods, wherein the number of the standard images of each standard food is several;
for each standard food, the following were performed:
a1, determining the projection point coordinates of any space point of the current standard food in each standard image in the standard food
Figure 670686DEST_PATH_IMAGE002
First coordinates in a first camera coordinate system
Figure 414651DEST_PATH_IMAGE003
And second coordinates in a second camera coordinate system
Figure 526963DEST_PATH_IMAGE004
A2, obtaining a first formula for representing the first coordinate according to the corresponding relation between the first coordinate and the second coordinate, wherein the corresponding relation is as follows:
Figure 330971DEST_PATH_IMAGE005
the first formula is:
Figure 778002DEST_PATH_IMAGE006
a3, obtaining a second formula according to the central projection relation between the current space point and the projection point coordinate, wherein the second formula is as follows:
Figure 325658DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 558056DEST_PATH_IMAGE008
for characterizing the focal length of the pixels of the first camera in the X-axis,
Figure 798545DEST_PATH_IMAGE009
for characterizing the focal length of the pixels of the first camera in the Y-axis,
Figure 467292DEST_PATH_IMAGE010
is the center pixel coordinate of the standard image,
Figure 333486DEST_PATH_IMAGE011
is a linear model parameter of the first camera for characterizing a horizontal pixel value and a vertical pixel value between a center pixel coordinate of the standard image and an origin pixel coordinate of the standard image,
Figure 154812DEST_PATH_IMAGE012
the linear model parameters of the second camera are respectively used for representing the horizontal pixel value and the vertical pixel value between the central pixel coordinate of the standard image and the origin pixel coordinate of the standard image;
a4, obtaining a third formula for representing the space point in the first camera coordinate system according to the first formula and the second formula, wherein the third formula is as follows:
Figure 566201DEST_PATH_IMAGE014
a5, determining the corresponding coordinates of each space point of the current standard food in the first camera coordinate system according to a third formula;
and obtaining a food space parameter model according to the corresponding coordinates of each space point in each standard food in the first camera coordinate system.
In the embodiment of the invention, the rotatable camera comprises two cameras, when each camera shoots food to be detected, the image of a three-dimensional entity of the food to be detected on a two-dimensional plane is obtained, the two cameras are used for shooting two images of the same three-dimensional entity at the same time, because the original points of coordinate systems of the two cameras have a certain distance in the X-axis direction, a parallax angle exists between the two cameras, the three-dimensional entity has similarity and certain difference in the image imaged by each camera, camera parallax is formed, and depth sense exists between the images shot by the two cameras. Based on the method, the three-dimensional entity of the food to be detected is recovered in a three-dimensional reconstruction mode, and further the space parameters of the three-dimensional entity are obtained. Specifically, the rotatable camera comprises a first camera and a second camera, wherein the camera internal parameters of the first camera and the second camera are the same, the optical axes of the first camera and the second camera are parallel to each other, the X axes of the first camera coordinate system and the second camera coordinate system are coincident with each other, the Y axes of the first camera coordinate system and the second camera coordinate system are parallel to each other, and the distance between the origin of the first camera coordinate system and the origin of the second camera coordinate system on the X axis is b. When shooting is carried out by utilizing the rotatable camera, the first camera and the second camera respectively wind around the positionThe Y axes of the camera coordinate systems are rotated relatively to each other so that the included angle between the optical axes of the first camera and the second camera is
Figure 472977DEST_PATH_IMAGE001
In the present embodiment, the first and second electrodes, in the present embodiment,
Figure 611704DEST_PATH_IMAGE001
set to 90 °. Therefore, according to the positional relationship between the first camera and the second camera, the relationship between the first coordinates in the first camera coordinate system and the second coordinates in the second camera coordinate system can be determined, and the second coordinates can be expressed by the first coordinates. According to the central projection relation of the current space point and the projection point coordinate of the standard food, the camera internal parameters and the linear model parameters of the first camera and the second camera are expressed by using the first coordinate, the second coordinate, the pixel focal length of the first camera on the X axis and the pixel focal length of the first camera on the Y axis. Further, the coordinates of the projected point coordinates in the first camera coordinate system are represented by known parameters of camera intrinsic parameters, linear model parameters, the distance between the two cameras in the X-axis, and the pixel focal length of the first camera in the X-axis and the Y-axis. Therefore, the corresponding coordinates of each space point of the current standard food in the first camera coordinate system can be obtained according to the method for obtaining the three-dimensional coordinates through three-dimensional reconstruction. And obtaining a food space parameter model according to the corresponding coordinates of each space point of each standard food in the first camera coordinate system.
As shown in fig. 2 and 3, the embodiment of the invention provides a device where a food recognition device is located and the food recognition device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. From a hardware level, as shown in fig. 2, a hardware structure diagram of a device in which a food identification apparatus according to an embodiment of the present invention is located is provided, where the device in the embodiment may generally include other hardware, such as a forwarding chip responsible for processing a packet, in addition to the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 2. Taking a software implementation as an example, as shown in fig. 3, as a logical apparatus, the apparatus is formed by reading, by a CPU of a device in which the apparatus is located, corresponding computer program instructions in a non-volatile memory into a memory for execution.
As shown in fig. 3, the food identification device provided in this embodiment includes:
an obtaining module 301, configured to obtain an image to be identified;
a first identification module 302, configured to perform food identification on the image to be identified, and determine the image to be identified containing food as a first identification image;
the second identification module 303 is configured to perform food target detection on the first identification image, and determine a position parameter of food in the first identification image, where the position parameter includes a coordinate of the food, and a length and a width of an area where the food is located in the first identification image; determining a second identification image according to the first identification image and the position parameter;
and the third identification module 304 is configured to perform food identification on the second identification image, and determine a name of the food.
In this embodiment of the present invention, the obtaining module 301 may be configured to perform step 101 in the foregoing method embodiment, and the first identifying module 302 may be configured to perform step 102 in the foregoing method embodiment; the second identification module 303 may be configured to perform steps 103 and 104 in the above-described method embodiment; the third identification module 304 may be configured to perform step 105 in the above-described method embodiment.
In an embodiment of the present invention, the first identifying module 302 is configured to perform the following operations:
carrying out food identification on the image to be identified by utilizing a pre-constructed neural network;
wherein the neural network is constructed by:
3 x 3 convolution kernels are adopted in the first layer to the twelfth layer, wherein a characteristic pyramid structure is added between the fifth layer and the eighth layer and is used for fusing low-dimensional characteristics and high-dimensional characteristics;
applying 5 x 5 convolution kernels to the thirteenth to seventeenth layers;
adopting a flatten layer on the eighteenth layer;
a fully connected layer is employed at the nineteenth layer, and a softmax classifier is employed after the fully connected layer.
In an embodiment of the present invention, the second identifying module 303 is configured to perform the following operations:
carrying out six times of feature extraction on the first identification image to obtain a first high-dimensional feature map;
performing primary feature extraction on the first identification image to obtain a first low-dimensional feature map;
carrying out feature extraction on the first identification image for three times to obtain a second low-dimensional feature map;
performing five times of feature extraction on the first identification image to obtain a third low-dimensional feature map;
performing feature fusion on the first high-dimensional feature map, the first low-dimensional feature map, the second low-dimensional feature map and the third low-dimensional feature map to obtain a first fused feature map;
and determining the position parameter of the food in the first identification image according to the first fusion feature map.
In an embodiment of the present invention, the second identification module 303, when performing the determining of the position parameter of the food in the first identification image according to the first fused feature map, is configured to perform the following operations:
carrying out feature extraction on the first fusion feature map, and sending the features subjected to feature extraction to a classifier;
performing feature extraction on the first fusion feature map, performing three times of maximum pooling on the features subjected to feature extraction, and performing feature fusion on the features subjected to the three times of maximum pooling to obtain a second fusion feature map;
extracting the features of the second fusion feature map, and sending the features after feature extraction to a position regression device;
determining a location parameter of the food in the first identification image according to the classifier and the location regressor.
In an embodiment of the present invention, the third identifying module 304 is configured to perform the following operations:
performing first downsampling processing on the second identification image to obtain a first downsampling feature map;
performing second downsampling processing on the first downsampling feature map to obtain a second downsampling feature map;
performing feature extraction on the second downsampling feature map to obtain a first extracted feature map;
performing second downsampling processing on the first extracted feature map to obtain a third downsampled feature map;
performing second downsampling processing on the third downsampled feature map to obtain a fourth downsampled feature map;
performing feature extraction on the fourth down-sampling feature map to obtain a second extracted feature map;
performing feature fusion on the first downsampling feature map, the first extracted feature map, the fourth downsampling feature map and the second extracted feature map to obtain a third fused feature map;
giving weight exceeding a preset threshold value to the detail features in the third fusion feature map to obtain a target fusion feature map;
and after the target fusion characteristic graph is subjected to pooling treatment, determining the name of food through full-connection layer classification.
In one embodiment of the invention, the image to be identified is obtained by shooting from a rotatable camera at least from a plurality of positions above the food;
the food identification device further comprises:
the quality acquisition module is used for acquiring the quality of the food to be detected;
the fourth identification module is used for identifying the image to be identified by utilizing a pre-constructed food space parameter model to obtain a space parameter of the food to be identified, wherein the space parameter is used for representing a three-dimensional profile of the food to be identified;
the fifth identification module is used for identifying the image to be identified by utilizing a pre-established food color identification model to obtain the color of the food to be identified;
and the quality determining module is used for determining the quality of the food to be detected according to the quality, the space parameter and the color of the food to be detected.
In an embodiment of the present invention, the rotatable camera includes a first camera and a second camera, wherein the first camera and the second camera have the same camera internal parameters and have parallel optical axes, a first camera coordinate system of the first camera and an X-axis of a second camera coordinate system of the second camera coincide with each other, a Y-axis of the first camera coordinate system and a Y-axis of the second camera coordinate system are parallel to each other, a distance between an origin of the first camera coordinate system and an origin of the second camera coordinate system on the X-axis is b, the first camera coordinate system uses the optical center of the first camera as the origin and a three-dimensional rectangular coordinate system established by using the optical axis as the Z-axis, the second camera coordinate system uses the optical center of the second camera as the origin and a three-dimensional rectangular coordinate system established by using the optical axis as the Z-axis, when the rotatable camera photographs the food to be measured, setting an included angle between optical axes of the first camera and the second camera to be
Figure 553115DEST_PATH_IMAGE001
Degree;
the food space parameter model is constructed in the following way:
acquiring a plurality of preset standard images of standard foods, wherein each standard image of the standard foods is a plurality of standard images;
for each standard food, the following were performed:
a1, determining the projection point coordinate of any space point of the current standard food in each standard image in the standard food
Figure 135406DEST_PATH_IMAGE002
First coordinates in the first camera coordinate system
Figure 716429DEST_PATH_IMAGE003
And second coordinates in the second camera coordinate system
Figure 878420DEST_PATH_IMAGE004
A2, obtaining a first formula for representing the first coordinate according to the corresponding relation between the first coordinate and the second coordinate, wherein the corresponding relation is as follows:
Figure 923605DEST_PATH_IMAGE005
the first formula is:
Figure 67011DEST_PATH_IMAGE006
a3, obtaining a second formula according to the central projection relation between the current space point and the projection point coordinate, wherein the second formula is as follows:
Figure 417220DEST_PATH_IMAGE007
wherein, the
Figure 632170DEST_PATH_IMAGE008
For characterizing a focal length of a pixel of the first camera in an X-axis, the
Figure 282594DEST_PATH_IMAGE009
For characterizing a focal length of a pixel of the first camera in a Y-axis, the
Figure 675529DEST_PATH_IMAGE010
Is the center pixel coordinate of the standard image,
Figure 293461DEST_PATH_IMAGE011
linear model parameters of the first camera respectively representing the central pixel coordinate of the standard image and the origin of the standard imageHorizontal pixel values and vertical pixel values between pixel coordinates,the linear model parameters of the second camera are respectively used for representing the horizontal pixel value and the vertical pixel value between the central pixel coordinate of the standard image and the origin pixel coordinate of the standard image;
a4, obtaining a third formula for representing the space point in the first camera coordinate system according to the first formula and the second formula, wherein the third formula is as follows:
Figure 98923DEST_PATH_IMAGE015
a5, determining the corresponding coordinates of each space point of the current standard food in the first camera coordinate system according to the third formula;
and obtaining a food space parameter model according to the corresponding coordinates of each space point in each standard food in the first camera coordinate system.
It is to be understood that the illustrated structure of the embodiments of the present invention does not constitute a specific limitation to the food identifying device. In other embodiments of the invention, the food identification device may include more or fewer components than illustrated, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Because the content of information interaction, execution process, and the like among the modules in the device is based on the same concept as the method embodiment of the present invention, specific content can be referred to the description in the method embodiment of the present invention, and is not described herein again.
An embodiment of the present invention further provides a food identification device, including: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform the food identification method of any embodiment of the present invention.
Embodiments of the present invention also provide a computer-readable medium storing instructions for causing a computer to perform a food identification method as described herein. Specifically, a method or an apparatus equipped with a storage medium on which a software program code that realizes the functions of any of the above-described embodiments is stored may be provided, and a computer (or a CPU or MPU) of the method or the apparatus is caused to read out and execute the program code stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments can be implemented not only by executing the program code read out by the computer, but also by performing a part or all of the actual operations by an operation method or the like operating on the computer based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then causes a CPU or the like mounted on the expansion board or the expansion unit to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the above-described embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A food identification method based on deep learning is characterized by comprising the following steps:
acquiring an image to be identified;
performing food identification on the image to be identified, and determining the image to be identified containing food as a first identification image;
performing food target detection on the first identification image, and determining position parameters of food in the first identification image, wherein the position parameters comprise coordinates of the food, and the length and the width of an area where the food is located in the first identification image;
determining a second identification image according to the first identification image and the position parameter;
and performing food identification on the second identification image to determine the name of the food.
2. The method of claim 1, wherein the identifying the food from the image to be identified comprises:
carrying out food identification on the image to be identified by utilizing a pre-constructed neural network;
wherein the neural network is constructed by:
3 x 3 convolution kernels are adopted in the first layer to the twelfth layer, wherein a characteristic pyramid structure is added between the fifth layer and the eighth layer and is used for fusing low-dimensional characteristics and high-dimensional characteristics;
applying 5 x 5 convolution kernels to the thirteenth to seventeenth layers;
adopting a flatten layer on the eighteenth layer;
a fully connected layer is employed at the nineteenth layer, and a softmax classifier is employed after the fully connected layer.
3. The method of claim 1, wherein the performing food recognition on the first recognition image and determining the position parameter of the food in the first recognition image comprises:
carrying out six times of feature extraction on the first identification image to obtain a first high-dimensional feature map;
performing primary feature extraction on the first identification image to obtain a first low-dimensional feature map;
carrying out feature extraction on the first identification image for three times to obtain a second low-dimensional feature map;
performing five times of feature extraction on the first identification image to obtain a third low-dimensional feature map;
performing feature fusion on the first high-dimensional feature map, the first low-dimensional feature map, the second low-dimensional feature map and the third low-dimensional feature map to obtain a first fused feature map;
and determining the position parameter of the food in the first identification image according to the first fusion feature map.
4. The method of claim 3, wherein determining the location parameter of the food item in the first recognition image from the first fused feature map comprises:
carrying out feature extraction on the first fusion feature map, and sending the features subjected to feature extraction to a classifier;
performing feature extraction on the first fusion feature map, performing three times of maximum pooling on the features subjected to feature extraction, and performing feature fusion on the features subjected to the three times of maximum pooling to obtain a second fusion feature map;
extracting the features of the second fusion feature map, and sending the features after feature extraction to a position regression device;
determining a location parameter of the food in the first identification image according to the classifier and the location regressor.
5. The method of claim 1, wherein the performing food recognition on the second recognition image and determining the name of the food comprises:
performing first downsampling processing on the second identification image to obtain a first downsampling feature map;
performing second downsampling processing on the first downsampling feature map to obtain a second downsampling feature map;
performing feature extraction on the second downsampling feature map to obtain a first extracted feature map;
performing second downsampling processing on the first extracted feature map to obtain a third downsampled feature map;
performing second downsampling processing on the third downsampled feature map to obtain a fourth downsampled feature map;
performing feature extraction on the fourth down-sampling feature map to obtain a second extracted feature map;
performing feature fusion on the first downsampling feature map, the first extracted feature map, the fourth downsampling feature map and the second extracted feature map to obtain a third fused feature map;
giving weight exceeding a preset threshold value to the detail features in the third fusion feature map to obtain a target fusion feature map;
and after the target fusion characteristic graph is subjected to pooling treatment, determining the name of food through full-connection layer classification.
6. The method according to any one of claims 1-5, wherein the image to be recognized is taken by a rotatable camera from at least several positions above the food;
after the acquiring the image to be recognized, further comprising:
acquiring the quality of food to be detected;
identifying the image to be identified by utilizing a pre-constructed food space parameter model to obtain a space parameter of the food to be identified, wherein the space parameter is used for representing a three-dimensional profile of the food to be identified;
identifying the image to be identified by using a pre-established food color identification model to obtain the color of the food to be identified;
and determining the quality of the food to be detected according to the quality, the space parameter and the color of the food to be detected.
7. The method according to claim 6, wherein the rotatable camera comprises a first camera and a second camera, wherein the first camera and the second camera have the same camera internal parameters and parallel optical axes, the first camera coordinate system of the first camera and the second camera coordinate system of the second camera have X axes coincident with each other, the first camera coordinate system and the second camera coordinate system have Y axes parallel with each other, the distance between the origin of the first camera coordinate system and the origin of the second camera coordinate system is b on the X axis, the first camera coordinate system takes the optical center of the first camera as the origin, a three-dimensional rectangular coordinate system is established by taking the optical axis as the Z axis, the second camera coordinate system takes the optical center of the second camera as the origin, a three-dimensional rectangular coordinate system is established by taking the optical axis as the Z axis, when the rotatable camera photographs the food to be measured, setting an included angle between optical axes of the first camera and the second camera to be
Figure 471020DEST_PATH_IMAGE001
Degree;
the food space parameter model is constructed in the following way:
acquiring a plurality of preset standard images of standard foods, wherein each standard image of the standard foods is a plurality of standard images;
for each standard food, the following were performed:
a1, determining the projection point coordinate of any space point of the current standard food in each standard image in the standard food
Figure 591423DEST_PATH_IMAGE002
First coordinates in the first camera coordinate system
Figure 840001DEST_PATH_IMAGE003
And second coordinates in the second camera coordinate system
Figure 953320DEST_PATH_IMAGE004
A2, obtaining a first formula for representing the first coordinate according to the corresponding relation between the first coordinate and the second coordinate, wherein the corresponding relation is as follows:
Figure 502113DEST_PATH_IMAGE005
the first formula is:
Figure 793417DEST_PATH_IMAGE006
a3, obtaining a second formula according to the central projection relation between the current space point and the projection point coordinate, wherein the second formula is as follows:
Figure 529292DEST_PATH_IMAGE007
wherein, the
Figure 197033DEST_PATH_IMAGE008
For characterizing a focal length of a pixel of the first camera in an X-axis, the
Figure 584021DEST_PATH_IMAGE009
For characterizing a focal length of a pixel of the first camera in a Y-axis, the
Figure 46227DEST_PATH_IMAGE010
Is the center pixel coordinate of the standard image,
Figure 3818DEST_PATH_IMAGE011
linear model parameters for the first camera for characterizing a horizontal pixel value and a vertical pixel value between a center pixel coordinate of the standard image and an origin pixel coordinate of the standard image, respectively,
Figure 458939DEST_PATH_IMAGE012
the linear model parameters of the second camera are respectively used for representing the horizontal pixel value and the vertical pixel value between the central pixel coordinate of the standard image and the origin pixel coordinate of the standard image;
a4, obtaining a third formula for representing the space point in the first camera coordinate system according to the first formula and the second formula, wherein the third formula is as follows:
Figure DEST_PATH_IMAGE014
a5, determining the corresponding coordinates of each space point of the current standard food in the first camera coordinate system according to the third formula;
and obtaining a food space parameter model according to the corresponding coordinates of each space point in each standard food in the first camera coordinate system.
8. A deep learning based food recognition device, comprising:
the acquisition module is used for acquiring an image to be identified;
the first identification module is used for identifying food in the image to be identified and determining the image to be identified containing food as a first identification image;
the second identification module is used for carrying out food target detection on the first identification image and determining position parameters of food in the first identification image, wherein the position parameters comprise coordinates of the food, and the length and the width of an area where the food is located in the first identification image; determining a second identification image according to the first identification image and the position parameter;
and the third identification module is used for identifying the food in the second identification image and determining the name of the food.
9. A food identification device, comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor, configured to invoke the machine readable program to perform the method of any of claims 1-7.
10. A computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1-7.
CN202011274995.2A 2020-11-16 2020-11-16 Deep learning-based food identification method and device Active CN112070077B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011274995.2A CN112070077B (en) 2020-11-16 2020-11-16 Deep learning-based food identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011274995.2A CN112070077B (en) 2020-11-16 2020-11-16 Deep learning-based food identification method and device

Publications (2)

Publication Number Publication Date
CN112070077A true CN112070077A (en) 2020-12-11
CN112070077B CN112070077B (en) 2021-02-26

Family

ID=73655413

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011274995.2A Active CN112070077B (en) 2020-11-16 2020-11-16 Deep learning-based food identification method and device

Country Status (1)

Country Link
CN (1) CN112070077B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677443A (en) * 2022-05-27 2022-06-28 深圳智华科技发展有限公司 Optical positioning method, device, equipment and storage medium
CN115861855A (en) * 2022-12-15 2023-03-28 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636757A (en) * 2015-02-06 2015-05-20 中国石油大学(华东) Deep learning-based food image identifying method
US20170323174A1 (en) * 2014-02-12 2017-11-09 Microsoft Technology Licensing, Llc Food logging from images
WO2018034905A1 (en) * 2016-08-15 2018-02-22 Canon U.S.A. Inc. Spectrally encoded endoscopic image process
CN108280474A (en) * 2018-01-19 2018-07-13 广州市派客朴食信息科技有限责任公司 A kind of food recognition methods based on neural network
CN109214250A (en) * 2017-07-05 2019-01-15 中南大学 A kind of static gesture identification method based on multiple dimensioned convolutional neural networks
CN109711705A (en) * 2018-12-21 2019-05-03 上海应用技术大学 The method for establishing model of color aesthetic quality control in a kind of fermentation milk production
CN110705621A (en) * 2019-09-25 2020-01-17 北京影谱科技股份有限公司 Food image identification method and system based on DCNN and food calorie calculation method
US20200074247A1 (en) * 2018-08-29 2020-03-05 International Business Machines Corporation System and method for a visual recognition and/or detection of a potentially unbounded set of categories with limited examples per category and restricted query scope
CN111743618A (en) * 2020-08-05 2020-10-09 哈尔滨梓滨科技有限公司 Binocular optics-based bipolar electric coagulation forceps positioning device and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170323174A1 (en) * 2014-02-12 2017-11-09 Microsoft Technology Licensing, Llc Food logging from images
CN104636757A (en) * 2015-02-06 2015-05-20 中国石油大学(华东) Deep learning-based food image identifying method
WO2018034905A1 (en) * 2016-08-15 2018-02-22 Canon U.S.A. Inc. Spectrally encoded endoscopic image process
CN109214250A (en) * 2017-07-05 2019-01-15 中南大学 A kind of static gesture identification method based on multiple dimensioned convolutional neural networks
CN108280474A (en) * 2018-01-19 2018-07-13 广州市派客朴食信息科技有限责任公司 A kind of food recognition methods based on neural network
US20200074247A1 (en) * 2018-08-29 2020-03-05 International Business Machines Corporation System and method for a visual recognition and/or detection of a potentially unbounded set of categories with limited examples per category and restricted query scope
CN109711705A (en) * 2018-12-21 2019-05-03 上海应用技术大学 The method for establishing model of color aesthetic quality control in a kind of fermentation milk production
CN110705621A (en) * 2019-09-25 2020-01-17 北京影谱科技股份有限公司 Food image identification method and system based on DCNN and food calorie calculation method
CN111743618A (en) * 2020-08-05 2020-10-09 哈尔滨梓滨科技有限公司 Binocular optics-based bipolar electric coagulation forceps positioning device and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
任永新等: "计算机视觉技术在水果品质检测中的研究进展", 《中国农业科技导报》 *
汪聪: "基于机器视觉的菜品智能识别技术研究", 《中国优秀硕士学位论文全文数据库工程科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677443A (en) * 2022-05-27 2022-06-28 深圳智华科技发展有限公司 Optical positioning method, device, equipment and storage medium
CN115861855A (en) * 2022-12-15 2023-03-28 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station
CN115861855B (en) * 2022-12-15 2023-10-24 福建亿山能源管理有限公司 Operation and maintenance monitoring method and system for photovoltaic power station

Also Published As

Publication number Publication date
CN112070077B (en) 2021-02-26

Similar Documents

Publication Publication Date Title
US11436437B2 (en) Three-dimension (3D) assisted personalized home object detection
Gardner et al. Learning to predict indoor illumination from a single image
US9934591B2 (en) Remote determination of quantity stored in containers in geographical region
US10002286B1 (en) System and method for face recognition robust to multiple degradations
US20190180464A1 (en) Remote determination of containers in geographical region
US20180012411A1 (en) Augmented Reality Methods and Devices
CN108717531B (en) Human body posture estimation method based on Faster R-CNN
US8175412B2 (en) Method and apparatus for matching portions of input images
CN111328396A (en) Pose estimation and model retrieval for objects in images
US20140043329A1 (en) Method of augmented makeover with 3d face modeling and landmark alignment
CN108875542B (en) Face recognition method, device and system and computer storage medium
US8755607B2 (en) Method of normalizing a digital image of an iris of an eye
Krig et al. Ground truth data, content, metrics, and analysis
JP5833507B2 (en) Image processing device
CN112070077B (en) Deep learning-based food identification method and device
CN107766864B (en) Method and device for extracting features and method and device for object recognition
CN112784712B (en) Missing child early warning implementation method and device based on real-time monitoring
JP2019185787A (en) Remote determination of containers in geographical region
Wu et al. Privacy leakage of sift features via deep generative model based image reconstruction
Tiwari et al. Occlusion resistant network for 3d face reconstruction
Ji et al. An evaluation of conventional and deep learning‐based image‐matching methods on diverse datasets
Wietrzykowski et al. Stereo plane R-CNN: Accurate scene geometry reconstruction using planar segments and camera-agnostic representation
CN111709269B (en) Human hand segmentation method and device based on two-dimensional joint information in depth image
CN114972492A (en) Position and pose determination method and device based on aerial view and computer storage medium
CN113095347A (en) Deep learning-based mark recognition method and training method, system and electronic equipment thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A food recognition method and device based on deep learning

Effective date of registration: 20220120

Granted publication date: 20210226

Pledgee: Haidian Beijing science and technology enterprise financing Company limited by guarantee

Pledgor: HEALTH HOPE (BEIJING) TECHNOLOGY CO.,LTD.

Registration number: Y2022110000012

PC01 Cancellation of the registration of the contract for pledge of patent right
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20230302

Granted publication date: 20210226

Pledgee: Haidian Beijing science and technology enterprise financing Company limited by guarantee

Pledgor: HEALTH HOPE (BEIJING) TECHNOLOGY CO.,LTD.

Registration number: Y2022110000012

PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A food recognition method and device based on deep learning

Effective date of registration: 20230306

Granted publication date: 20210226

Pledgee: Haidian Beijing science and technology enterprise financing Company limited by guarantee

Pledgor: HEALTH HOPE (BEIJING) TECHNOLOGY CO.,LTD.

Registration number: Y2023110000086