CN108038879A - A kind of volume of food method of estimation and its device - Google Patents
A kind of volume of food method of estimation and its device Download PDFInfo
- Publication number
- CN108038879A CN108038879A CN201711320238.2A CN201711320238A CN108038879A CN 108038879 A CN108038879 A CN 108038879A CN 201711320238 A CN201711320238 A CN 201711320238A CN 108038879 A CN108038879 A CN 108038879A
- Authority
- CN
- China
- Prior art keywords
- food
- image
- video data
- volume
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/62—Analysis of geometric attributes of area, perimeter, diameter or volume
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Geometry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a kind of volume of food method of estimation and its device, belong to depth learning technology field.The described method includes:Collection includes the image or video data of plurality of classes food, and obtains the described image of collection or the true volume data of food in video data;It is trained according to described image or video data, the true volume data using default deep learning neural network model, obtains volumetric estimate model;The volumetric estimate model is utilized according to the image of food to be measured or video data, obtains the volumetric estimate result of the food to be measured.The volumetric estimate method is simply efficient, user only need to simply input a width food image or a bit of food video, the prediction volume of the food can be quickly obtained, can be widely applied to need frequently rapidly to estimate in the network information services such as the similar intelligent eating and drinking management of volume of food.
Description
Technical field
The present invention relates to deep learning technology field, more particularly to a kind of volume of food method of estimation and its device.
Background technology
Modern increasingly pays close attention to health diet, the heat of especially of interest intake food, and foodstuff calories are eaten with intake
The volume of product is closely related, and how quickly to estimate the volume of food automatically using the food image of shooting is this intelligence
The key of dietary management application.
At present, the method for the volume of food estimation based on image is also fewer, and existing method is more by user's input mostly
Multi-view image, then carries out the food in image the reconstruct of threedimensional model, finally calculates the volume of threedimensional model.This mode
It is higher to user's input requirements, use very inconvenient, and need substantial amounts of computing resource in the process of three-dimensional reconstruction, especially
It is not suitable for doing mobile edition application on mobile phone.
The content of the invention
In order to solve problem of the prior art, an embodiment of the present invention provides a kind of volume of food method of estimation and its dress
Put.The technical solution is as follows:
First aspect, there is provided a kind of volume of food method of estimation, the described method includes:
Collection includes the image or video data of plurality of classes food, and obtains in described image or the video data of collection
The true volume data of food;Default deep learning is utilized according to described image or video data, the true volume data
Neural network model is trained, and obtains volumetric estimate model;The body is utilized according to the image of food to be measured or video data
Product estimation model, obtains the volumetric estimate result of the food to be measured.
With reference to first aspect, in the first possible implementation, collection includes the image or video of plurality of classes food
Data, and the described image of collection or the true volume data of food in video data are obtained, including:
Under a variety of backgrounds, several scenes, a variety of shooting angle, collection includes the image or video counts of plurality of classes food
According to, a variety of backgrounds include simple background and complex background, and the scene includes general room scene and common outdoor scene,
A variety of shooting angle include at least positive visual angle and oblique viewing angle;And in the described image or food data by measuring collection
The true volume of food, to obtain the true volume data of food in the described image of collection or video data.
With reference to first aspect, in second of possible implementation, the predetermined deep learning neural network model includes
Default ResNet, VGG or DenseNet deep learning neural network model.
With reference to first aspect, in the third possible implementation, according to described multiple images or video data, described true
Entity volume data is trained using default deep learning neural network model, obtains volumetric estimate model, including:
By described image or video data, the true volume data input default deep learning neural network model into
Row training, volume, which calculates, uses default deep learning neural network model, and loss function is calculated using mean square error function, obtained
Obtain volumetric estimate model.
The third possible implementation with reference to first aspect, in the 4th kind of possible implementation, the default depth
It is default ResNet deep learnings neural network model to spend learning neural network model:By ResNet10 networks last layer
Full articulamentum is changed to one-dimensional output, and changes Loss layers, and object function is changed to the acquisition of euclidean loss function
ResNet10 deep learning neural network models, wherein desired value correspond to the true volume of food.
With reference to first aspect, in the 5th kind of possible implementation, utilized according to the image of food to be measured or video data
The volumetric estimate model, obtain the volumetric estimate of the food to be measured as a result, including:
The image of food to be measured or video data are inputted into the volumetric estimate model, when described image or video data packet
During multiple image containing the food to be measured, volume is calculated according to the corresponding each bulking value of each image being calculated and is put down
Average, so far obtains the volumetric estimate result of the food to be measured.
With reference to first aspect, in the 6th kind of possible implementation, according to described multiple images or video data, described true
Entity volume data is trained using default ResNet, VGG or DenseNet deep learning neural network model, obtains body
Before product estimation model, the method further includes:
Image preprocessing is carried out to the image comprising plurality of classes food or video data of collection, including:
The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its ginseng
Position and frame according to thing.
The 6th kind of possible implementation with reference to first aspect, in the 7th kind of possible implementation, includes collection
After the image or video data of plurality of classes food are labeled, the method further includes:
The default SDD deep learning neural network models of result data input of mark are trained, obtain food region
Detection model;
Wherein, the volumetric estimate model is utilized according to the image of food to be measured or video data, obtains the food to be measured
The volumetric estimates of product as a result, including:
The image or video data of food to be measured are labeled, mark out the position of the food to be measured and its object of reference
And frame, food annotation results data to be measured are obtained, the food region detection is utilized according to food annotation results data to be measured
Model, obtains the area detection result data of the food to be measured, the volume is utilized according to the area detection result data
Estimate model, obtain the volumetric estimate result of the food to be measured.
The 7th kind of possible implementation with reference to first aspect, in the 8th kind of possible implementation, includes collection
The image or video data of plurality of classes food carry out image preprocessing, further include:
Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to operate, described in acquisition
The single width or multiple image of video.
The first to eight kind of possible implementation with reference to first aspect, in the 9th to 16 kind of possible implementation, root
The volumetric estimate model is utilized according to the image or video data of food to be measured, obtains the volumetric estimate knot of the food to be measured
Fruit, including:
Video in the image or video data of food to be measured is carried out taking out frame or divides mirror to operate, obtains the list of the video
Width or multiple image.
Second aspect, there is provided a kind of volume of food estimation device, described device include:
Acquisition module, for gathering the image or video data that include plurality of classes food;
Acquisition module, for obtaining the true volume data of food in the described image gathered or video data;
Model training module, it is default for being utilized according to described multiple images or video data, the true volume data
ResNet, VGG or DenseNet deep learning neural network model be trained, obtain volumetric estimate model;
Model computation module, utilizes the volumetric estimate model for the image according to food to be measured or video data, obtains
Obtain the volumetric estimate result of the food to be measured.
With reference to second aspect, in the first possible implementation, the model training module is used for:According to the multiple
Image or video data, the true volume data utilize default ResNet, VGG or DenseNet deep learning neutral net
Model is trained, and obtains volumetric estimate model.
With reference to second aspect, in second of possible implementation, the model training module is used for:By the multiple figure
Picture or video data, the true volume data input default ResNet, VGG or DenseNet deep learning neutral net
Model is trained, and volume calculates and uses ResNet, VGG or DenseNet deep learning neural network model, and loss function is adopted
Calculated with mean square error function, obtain volumetric estimate model.
With reference to second aspect, in the third possible implementation, the model training module is used for:By the multiple figure
Picture or video data, the true volume data input default ResNet10 deep learnings neural network model and are trained,
The full articulamentum of ResNet10 networks last layer is changed to one-dimensional output, and object function is changed to euclidean loss function
The deep learning neural network model of acquisition, volume calculate and use ResNet10 deep learning neural network models, loss function
Calculated using mean square error function, obtain volumetric estimate model.
With reference to second aspect, in the 4th kind of possible implementation, described device further includes image pre-processing module, is used for
Image preprocessing is carried out to the image comprising plurality of classes food or video data of collection, including:
The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its ginseng
Position and frame according to thing;Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to grasp
Make, obtain the single width or multiple image of the video;To the video in the image or video data of food to be measured carry out take out frame or
Divide mirror operation, obtain the single width or multiple image of the video.
With reference to the 4th kind of possible implementation of second aspect, in the 5th kind of possible implementation,
The model training module is additionally operable to:By the default SDD deep learning neutral net moulds of result data input of mark
Type is trained, and obtains food region detection model.
A kind of volume of food method of estimation provided in an embodiment of the present invention and its device, by the food image of collection or regard
Frequency according to this and food true volume data, is trained predetermined deep learning neural network model study, obtains volume and estimates
Count model, the volumetric estimate model for then again learning the image of food to be measured or video input, due to the volumetric estimate
Neural network models obtained from model is a large amount of training study of progress early period, can learn to calculate the volume of food to be measured automatically
Predicted value, so as to effectively realize the estimation of volume of food.The embodiment party of volume of food estimation based on the embodiment of the present invention
Case, at least has the advantages that compared with prior art:
(1) model provided in an embodiment of the present invention learns to obtain by deep neural network completely, and food is not carried out
Explicitly feature extraction (such as contours extract), can accurately realize the volume reconstruction of single image under complex background;
(2) for new complicated shape food, only model need to be trained again, without the new food of manual analysis
Characteristic updates feature database;
(3) multiple food regions in input picture can be detected at the same time, and predict its corresponding volume respectively, be adapted to
The scene of single or multiple food, also, this method is not limited to food type, is answered suitable for fruit, cake, disk meal etc. are various
Use scene;
(4) method that the embodiment of the present invention proposes avoids the complex process of three-dimensional reconstruction, it is not necessary to substantial amounts of to calculate money
Source, is more suitable for practical application, is particularly applied to the intelligent eating and drinking management of mobile terminal;
(5) method proposed by the present invention is simple to input requirements, can directly input a width food image or a bit of phase
Video is closed, there is more preferable user experience.
Generally speaking, volumetric estimate method provided in an embodiment of the present invention is due to the use of deep learning neutral net, both
Other characteristics of image such as food profile, food geometric characteristic and background characteristics need not explicitly be extracted, it is not required that right
Image background carries out a priori assumption, it is possible to satisfactory food image area data under complex scene is robustly extracted,
Subgraph in the food image region extracted is input to trained volumetric estimate model, obtains volume of food prediction
Value.The volumetric estimate method is simply efficient, and user only need to simply input a width food image or a bit of food video, it is possible to
The prediction volume of the food is quickly obtained, flow is simple, efficient, can be widely applied to need frequently rapidly to estimate to eat
In the network information services such as the similar intelligent eating and drinking management of product volume.
Brief description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for
For those of ordinary skill in the art, without creative efforts, other can also be obtained according to these attached drawings
Attached drawing.
Fig. 1 is volume of food method of estimation flow diagram provided in an embodiment of the present invention;
Fig. 2 is volume of food method of estimation flow diagram provided in an embodiment of the present invention;
Fig. 3 is the first stage flow diagram of volume of food method of estimation provided in an embodiment of the present invention;
Fig. 4 is the second stage flow diagram of volume of food method of estimation provided in an embodiment of the present invention;
Fig. 5 is the structure diagram of volume of food estimation device provided in an embodiment of the present invention;
Fig. 6 is monocular logo image food court domain testing result schematic diagram in application example;
Fig. 7 is monocular logo image food court domain testing result schematic diagram in application example;
Fig. 8 is multi-Target Image food area detection result schematic diagram in application example;
Middle monocular logo image volume of food estimation flow diagram in Fig. 9 application examples;
Figure 10 is that multi-Target Image volume of food estimates flow diagram in application example.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention
Figure, is clearly and completely described the technical solution in the embodiment of the present invention, it is clear that described embodiment is only this
Invention part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist
All other embodiments obtained under the premise of creative work are not made, belong to the scope of protection of the invention.
It should be noted that term " first ", " second " etc. are only used for description purpose, and it is not intended that instruction or hint
Relative importance or the implicit quantity for indicating indicated technical characteristic.Thus, the feature of " first ", " second " etc. is defined
It can express or implicitly include one or more this feature.In the description of the present invention, " multiple " are meant that two
More than a, unless otherwise specifically defined.
An embodiment of the present invention provides a kind of volume of food method of estimation and its device, by the food image of collection or regard
Frequency according to this and food true volume data, is trained predetermined deep learning neural network model study, obtains volume and estimates
Count model, the volumetric estimate model for then again learning the image of food to be measured or video input, due to the volumetric estimate
Neural network models obtained from model is a large amount of training study of progress early period, can learn to calculate the volume of food to be measured automatically
Predicted value, so as to effectively realize the estimation of volume of food.The volumetric estimate method is due to the use of deep learning neutral net, both
Other characteristics of image such as food profile, food geometric characteristic and background characteristics need not explicitly be extracted, it is not required that right
Image background carries out a priori assumption, it is possible to satisfactory food image area data under complex scene is robustly extracted,
Subgraph in the food image region extracted is input to trained volumetric estimate model, obtains volume of food prediction
Value.The volumetric estimate method is simply efficient, and user only need to simply input a width food image or a bit of food video, it is possible to
The prediction volume of the food is quickly obtained, can be widely applied to need frequently and rapidly estimate the similar intelligence of volume of food
In the network information services such as dietary management.
Below in conjunction with specific embodiment, application example and its attached drawing, volume of food provided in an embodiment of the present invention is estimated
Meter method and device is described further.
Embodiment 1
Fig. 1 is volume of food method of estimation flow diagram provided in an embodiment of the present invention.As shown in Figure 1, the present invention is real
The volume of food method of estimation of example offer is provided, is comprised the following steps:
101st, collection includes the image or video data of plurality of classes food, and obtains the described image or video counts of collection
According to the true volume data of middle food.
Specifically, under a variety of backgrounds, several scenes, a variety of shooting angle, collection includes the image of plurality of classes food
Or video data, a variety of backgrounds include but not limited to simple background (such as desktop background, pure color white background) and complex background, scene
Including general room scene and common outdoor scene, the inclination that a variety of shooting angle include at least positive visual angle and have certain deviation regards
Angle.Metastable object of reference is preferably included in food image, such as:Coin, finger etc., to be easy to use, the present invention is implemented
Example preferably chooses object of reference of the finger as food.
In addition, the true volume by measuring food in the image gathered or food data, to obtain the figure of collection
The true volume data of food in picture or video data.Preferably, cubing is carried out to every kind of food by graduated cylinder method, obtained
True volume data, to be used for a part for the training data of training pattern as the later stage.
102nd, instructed according to image or video data, true volume data using predetermined deep learning neural network model
Practice, obtain volumetric estimate model.
Specifically, image or video data, true volume data input predetermined deep learning neural network model are carried out
Training, preferably volume, which calculate, uses ResNet, VGG or DenseNet deep learning neural network model, and loss function uses
Mean square error function calculates, and obtains volumetric estimate model.Here default predetermined deep learning neural network model, except preferred
Ground is used and carried out as needed outside default ResNet, VGG or DenseNet deep learning neural network model, can also be used
Any possible deep learning neural network model, the embodiment of the present invention do not limit it especially in the prior art.
Preferably, existing ResNet deep learnings neural network model is finely adjusted, by ResNet10 sorter networks most
The full articulamentum of later layer is changed to one-dimensional output, and object function is changed to euclidean loss function, obtains this programme and is applicable in
ResNet deep learning neural network models, i.e., default ResNet deep learnings neural network model, wherein desired value pair
Answer the true volume of food.
103rd, the volumetric estimate model is utilized according to the image of food to be measured or video data, obtains the body of food to be measured
Product estimated result.
Specifically, the image of food to be measured or video data are inputted into volumetric estimate model, when image or video data packet
During multiple image containing food to be measured, volume is calculated according to the corresponding each bulking value of each image being calculated and is averaged
Value, so far obtains the volumetric estimate result of food to be measured.
The data type of input volumetric estimate model can be image or video, if input is video data, preferably exist
A point mirror is first carried out to video before input, respective image sequence is obtained, is then selected at random from the video sequence comprising food again
A certain number of images are taken, do volume predictions respectively to every part of image, last output valve is the average value of all predicted values.
Embodiment 2
Fig. 2 is volume of food method of estimation flow diagram provided in an embodiment of the present invention.Fig. 3 is that the embodiment of the present invention carries
The first stage flow diagram of the volume of food method of estimation of confession.Fig. 4 is volume of food estimation provided in an embodiment of the present invention
The second stage flow diagram of method.As in Figure 2-4, volume of food method of estimation provided in an embodiment of the present invention can be divided into
Two stages:First stage-training obtains food region detection model M1 and 2 stage of volumetric estimate model M;Second stage-
Volume of food estimation stages to be measured.
Specifically, the first stage comprises the following steps:
201st, under a variety of backgrounds, several scenes, a variety of shooting angle, image or regard that collection includes plurality of classes food
Frequency evidence;And by measuring the true volume of food in the image gathered or food data, to obtain the image of collection or video
The true volume data of food, instant food data acquisition in data.A variety of backgrounds include but not limited to simple background (such as table
Face background, pure color white background) and complex background, scene include general room scene and common outdoor scene, a variety of shooting angle
Including at least positive visual angle and the oblique viewing angle for having certain deviation.Metastable object of reference is preferably included in food image, such as:
Coin, finger etc., to be easy to use, the embodiment of the present invention preferably chooses object of reference of the finger as food.
In addition, the true volume by measuring food in the image gathered or food data, to obtain the figure of collection
The true volume data of food in picture or video data.Preferably, cubing is carried out to every kind of food by graduated cylinder method, obtained
True volume data, to be used for a part for the training data of training pattern as the later stage.
202nd, the image comprising plurality of classes food or video data of collection are labeled, mark out food and its ginseng
Position and frame according to thing, specifically, manually mark food subject area and its object of reference, to be next training
Food region detection model is obtained to prepare.
In addition in the case of collection has the video data of food, above-mentioned 202 step further includes following sub-step:
2021st, the video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to operate, obtained
Take the single width or multiple image of video.
203rd, the default SDD deep learning neural network models of result data input of mark are trained, obtain food
Region detection model M 1, i.e., by a large amount of food region detections training output food region detection model M1.Specifically, existing
SSD (Single Shot MultiBox Detector) algorithm (refers to following prior art data:Wei Liu,
Dragomir Anguelov,Dumitru Erhan,Christian Szegedy,Scott Reed,Cheng-Yang Fu,
and AlexanderC.Berg,“Ssd:Single shot multibox detector,”in ECCV 2016:14th
European Conference on Computer Vision, Part I, 2016, pp.21-37) on the basis of, to being related to detection
Model, the corresponding module of detection algorithm carry out model fine setting, default SDD deep learning neural network models are obtained, to be applicable in
Then a large amount of food datas collected are inputted default SDD deep learning neural network models, obtain food region by this programme
Detection model M1.In addition, being detected with food region detection model M1 to the food image of all collections, extract comprising food
The frame (such as being set as rectangular area) of product and its object of reference, the input data as training volumetric estimate model.Here, obtain
Default SDD deep learnings neural network model and acquisition pattern that food region detection model M1 processes use are obtained, not office
Above-mentioned model and method are limited, can also other any possible detection model or detection methods, such as also may be used in the prior art
To carry out food region detection using the food method for detecting area based on engineer's feature.
204th, the image comprising plurality of classes food or video data, true volume data are inputted into default ResNet depths
Degree learning neural network model is trained, and volume calculates and uses ResNet deep learning neural network models, and loss function is adopted
Calculated with mean square error function, obtain volumetric estimate model M 2;Wherein, default ResNet deep learnings neural network model
For:The full articulamentum of ResNet10 networks last layer is changed to one-dimensional output, and changes Loss layers, and object function is changed
The deep learning neural network model obtained for euclidean loss function, wherein desired value correspond to the true volume of food.
Second stage comprises the following steps:
205th, the image of food to be measured or video data are inputted into volumetric estimate model, when image or video data are included and treated
When surveying the multiple image of food, bulk averaged value is calculated according to the corresponding each bulking value of each image being calculated, extremely
This obtains the volumetric estimate result of food to be measured.
Image or food data gatherer process for food to be measured, can in the lump carry out in above-mentioned 201 step, also may be used
To perform an independent food data gatherer process before above-mentioned 205 step, detailed process or be related to that food data gathers
Ins and outs it is similar with above-mentioned 201 step, details are not described herein.
In addition in the case of collection has the video data of food, above-mentioned 205 step further includes following sub-step:
2051st, the video in the video data comprising plurality of classes food to be measured of collection is carried out taking out frame or divides mirror to grasp
Make, obtain the single width or multiple image of video.
Embodiment 3
Fig. 5 is the structure diagram of volume of food estimation device provided in an embodiment of the present invention.It is as shown in figure 5, of the invention
The volume of food estimation device that embodiment provides includes:
Acquisition module 31, for gathering the image or video data that include plurality of classes food;
Acquisition module 32, for obtaining the true volume data of food in the image or video data that gather;
Model training module 33, it is pre- for being utilized according to described multiple images or video data, the true volume data
If deep learning neural network model be trained, obtain volumetric estimate model.Preferably, model training module 33 is used for root
Default ResNet, VGG or DenseNet deep learning nerve is utilized according to multiple images or video data, true volume data
Network model is trained, and obtains volumetric estimate model.Preferably, model training module is used for:By multiple images or video counts
Default ResNet, VGG or DenseNet deep learning neural network model is inputted according to, true volume data to be trained, body
Product calculates and uses ResNet, VGG or DenseNet deep learning neural network model, and loss function uses mean square error function meter
Calculate, obtain volumetric estimate model.It is further preferred that model training module is used for:By multiple images or video data, true body
Volume data inputs default ResNet10 deep learnings neural network model and is trained, by ResNet10 networks last layer
Full articulamentum is changed to one-dimensional output, and object function is changed to the deep learning neutral net mould of euclidean loss function acquisition
Type, volume, which calculates, uses ResNet10 deep learning neural network models, and loss function is calculated using mean square error function, is obtained
Volumetric estimate model.
Model computation module 34, volumetric estimate model is utilized for the image according to food to be measured or video data, is obtained
The volumetric estimate result of food to be measured.
Preferably, volume of food estimation device provided in an embodiment of the present invention further includes image pre-processing module 35, is used for
Image preprocessing is carried out to the image comprising plurality of classes food or video data of collection, including:
The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its ginseng
Position and frame according to thing;Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to grasp
Make, obtain the single width or multiple image of the video;To the video in the image or video data of food to be measured carry out take out frame or
Divide mirror operation, obtain the single width or multiple image of the video.Thus, model training module 33 is additionally operable to:By the result of mark
The default SDD deep learning neural network models of data input are trained, and obtain food region detection model;
It should be noted that:The volume of food estimation device that above-described embodiment provides is when performing volumetric estimate business, only
With the division progress of above-mentioned each function module for example, in practical application, can as needed and by above-mentioned function distribution by
Different function modules is completed, i.e., the internal structure of device is divided into different function modules, described above complete to complete
Portion or partial function.In addition, volume of food estimation device and volume of food method of estimation embodiment that above-described embodiment provides
Belong to same design, its specific implementation process refers to embodiment of the method, and which is not described herein again.
Application example 1
Fig. 6 is monocular logo image food court domain testing result schematic diagram in application example.Fig. 7 is single goal in application example
Image food area detection result schematic diagram.Fig. 8 is multi-Target Image food area detection result schematic diagram in application example.Figure
Middle monocular logo image volume of food estimation flow diagram in 9 application examples.Figure 10 is multi-Target Image food in application example
Volumetric estimate flow diagram.
In the application example, volume of food method of estimation implementation process provided in an embodiment of the present invention is divided into following two
Stage:First stage-training obtains food region detection model M1 and 2 stage of volumetric estimate model M;Second stage-to be measured
Volume of food estimation stages.
(1) first stage, comprises the following steps:
1) food data gathers.Enough food varieties and classification are collected, fruit is included but are not limited to, respectively to every
The different classes of individual of kind food and every kind of food, image is gathered under different background different angle.Wherein, background includes doing
Public scene, general room scene, road streetscape etc.;Angle is randomly selected from the left and right sides, so as to including whole food and ginseng
It is principle according to thing.Meanwhile cubing is carried out to every kind of food individual, obtain true volume data (unit:Cubic centimetre).This
In used 28 different food, acquire 8000 width images altogether.
2) food image marks.The image selection N width of collection is manually marked, marks out object of reference (finger) and food
Product subject area.Preferably, setting N=100.
3) food region detection, obtains food region detection model M1.The present embodiment realizes food using object detection algorithm
Product examine is surveyed, preferably, being trained using artificial labeled data on the basis of existing SSD algorithms, 3000 step of iteration, is optimized
Parameter model afterwards is saved as food region detection model M1.As shown in figs 6-8, food image is detected using model M 1,
The rectangular area comprising object of reference and food is extracted, the input data as later stage training volumetric estimate model.Extract altogether
28 different food, 112 width images.
4) training obtains volumetric estimate model M 2.The present embodiment carries out model training using deep neural network, as excellent
Choosing, using ResNet10, the output of 1 dimension is revised as by the full articulamentum of last layer, and changes Loss layers, and object function uses
Euclidean loss function, desired value correspond to the true volume (unit of food:Cubic centimetre).After the training of 20000 steps, obtain
To volumetric estimate model M 2.
(2) second stage, as shown in figure 9, food estimation stages to be measured comprise the following steps:
1) input data pre-processes.Here the food data inputted can be completed in single acquisition step, be obtained to be measured
Food single image or video segment, can also complete food data gatherer process in the lump in the first stage.If input is video
Fragment, then first divide video in mirror with automatic lens border detection algorithm, then 3 width image conducts are chosen from obtained image sequence
Input.
2) food region detection.Food region and reference are detected with food region detection model M1 to every input picture
Thing (finger) region, can obtain P food object and 1 finger.It is as input pre- that this P food region is extracted respectively
Survey data.Exemplarily, as shown in figs 6-8, Fig. 6,7 show food area detection result to the image containing single goal,
Fig. 8 then shows the food area detection result to the image containing multiple target.
3) volume predictions.Respectively by food area detection result (the single goal image-region testing result and more after detection
Object region testing result) it is input in volumetric estimate model M 2, obtain corresponding volume predictions value.If same width figure
There is P food object as in, then P volume of food and the cumulative volume for being exactly food in the image.If initial input is to regard
Frequency fragment, then the bulk averaged value of 3 images of prediction, then be the prediction volume of input food object.
Table 1 below is the predicted value (the third line) and true volume value (second of the food object volume in corresponding diagram 6-8
OK):The predicted value of image is 408 cubic centimetres and 186 cubic centimetres respectively shown in image shown in Fig. 6 and Fig. 7;In Fig. 8 due to
There are two food objects, its cumulative volume is the sum of two food predicted values, i.e. 289+252=541 cubic centimetres.It can be seen that its
Predict that error is very low in image during single food object, about 5% or so, error is higher when more food occur, and about exists
10% or so, but still within tolerance interval.
(unit:Cubic centimetre) | (a) | (b) | (c) |
Actual value | 390 | 180 | 240 (left side)+250 (right side)=490 |
Predicted value | 408 | 186 | 289 (left side)+252 (right side)=541 |
Furthermore it is preferred that as shown in Figure 10,, will be to be measured when it is monocular logo image to gather view data in second stage
The collection view data of food is directly inputted in volumetric estimate model M 2, obtains the volumetric estimate value of food to be measured.The process saves
The food area detection step using food region detection model M1 is omited, food region detection model M1 is primarily directed to more mesh
Logo image carries out target area detection, and so operation further simplify volumetric estimate flow, improve volumetric estimate efficiency.This
In explained or definition, monocular logo image refer to there was only single food in image, it is necessary to make one to monocular logo image and multi-Target Image
Items mark, multi-Target Image refer in image only have multiple food targets, above-mentioned explanation or definition also only to facilitate statement,
It is merely exemplary, in the case where not departing from design of the embodiment of the present invention and spirit, other form of presentation can also be used,
Regardless of whether which kind of form of presentation should not all cause to limit to the present invention.
Above-mentioned all optional technical solutions, can use any combination to form the alternative embodiment of the present invention, herein no longer
Repeat one by one.
From above description and practice, a kind of volume of food method of estimation provided in an embodiment of the present invention and its device,
By the food image or video data and food true volume data of collection, to predetermined deep learning neural network model into
Row training study, obtains volumetric estimate model, then again estimates the volume that the image of food to be measured or video input learn
Model is counted, since the volumetric estimate model is neural network model obtained from carrying out a large amount of training study early period, can be learned automatically
The volume predictions value for calculating food to be measured is practised, so as to effectively realize the estimation of volume of food.There is provided based on the embodiment of the present invention
Volume of food estimation embodiment, at least have the advantages that compared with prior art:
(1) model provided in an embodiment of the present invention learns to obtain by deep neural network completely, and food is not carried out
Explicitly feature extraction (such as contours extract), can accurately realize the volume reconstruction of single image under complex background;
(2) for new complicated shape food, only model need to be trained again, without the new food of manual analysis
Characteristic updates feature database;
(3) multiple food regions in input picture can be detected at the same time, and predict its corresponding volume respectively, be adapted to
The scene of single or multiple food, also, this method is not limited to food type, is answered suitable for fruit, cake, disk meal etc. are various
Use scene;
(4) method that the embodiment of the present invention proposes avoids the complex process of three-dimensional reconstruction, it is not necessary to substantial amounts of to calculate money
Source, is more suitable for practical application, is particularly applied to the intelligent eating and drinking management of mobile terminal;
(5) method proposed by the present invention is simple to input requirements, can directly input a width food image or a bit of phase
Video is closed, there is more preferable user experience.
Generally speaking, volumetric estimate method provided in an embodiment of the present invention is due to the use of deep learning neutral net, both
Other characteristics of image such as food profile, food geometric characteristic and background characteristics need not explicitly be extracted, it is not required that right
Image background carries out a priori assumption, it is possible to satisfactory food image area data under complex scene is robustly extracted,
Subgraph in the food image region extracted is input to trained volumetric estimate model, obtains volume of food prediction
Value.The volumetric estimate method is simply efficient, and user only need to simply input a width food image or a bit of food video, it is possible to
The prediction volume of the food is quickly obtained, flow is simple, efficient, can be widely applied to need frequently rapidly to estimate to eat
In the network information services such as the similar intelligent eating and drinking management of product volume.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment
To complete, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent replacement, improvement and so on, should all be included in the protection scope of the present invention.
Claims (16)
- A kind of 1. volume of food method of estimation, it is characterised in that the described method includes:Collection includes the image or video data of plurality of classes food, and obtain collection described image or video data in food True volume data;Carried out according to described image or video data, the true volume data using default deep learning neural network model Training, obtains volumetric estimate model;The volumetric estimate model is utilized according to the image of food to be measured or video data, the volume for obtaining the food to be measured is estimated Count result.
- 2. according to the method described in claim 1, it is characterized in that, collection includes the image or video counts of plurality of classes food According to, and the described image of collection or the true volume data of food in video data are obtained, including:Under a variety of backgrounds, several scenes, a variety of shooting angle, collection includes the image or video data of plurality of classes food, A variety of backgrounds include simple background and complex background, and the scene includes general room scene and common outdoor scene, institute State a variety of shooting angle and include at least positive visual angle and oblique viewing angle;And eaten in the described image or food data by measuring collection The true volume of product, to obtain the true volume data of food in the described image of collection or video data.
- 3. according to the method described in claim 1, it is characterized in that, the predetermined deep learning neural network model includes presetting ResNet, VGG or DenseNet deep learning neural network model.
- 4. according to the method described in claim 1, it is characterized in that, according to described multiple images or video data, described true Volume data is trained using default deep learning neural network model, obtains volumetric estimate model, including:Described image or video data, the true volume data are inputted default deep learning neural network model and instructed Practice, volume, which calculates, uses default deep learning neural network model, and loss function is calculated using mean square error function, obtains body Product estimation model.
- 5. according to the method described in claim 4, it is characterized in that, the default deep learning neural network model is default ResNet deep learning neural network models:The full articulamentum of ResNet10 networks last layer is changed to one-dimensional output, and And Loss layers are changed, and object function is changed to the ResNet10 deep learning neutral net moulds of euclidean loss function acquisition Type, wherein desired value correspond to the true volume of food.
- 6. according to the method described in claim 1, it is characterized in that, according to utilizing the image of food to be measured or video data Volumetric estimate model, obtain the volumetric estimate of the food to be measured as a result, including:The image of food to be measured or video data are inputted into the volumetric estimate model, when described image or video data include institute When stating the multiple image of food to be measured, volume is calculated according to the corresponding each bulking value of each image being calculated and is averaged Value, so far obtains the volumetric estimate result of the food to be measured.
- 7. according to the method described in claim 1, it is characterized in that, according to described multiple images or video data, described true Volume data is trained using default ResNet, VGG or DenseNet deep learning neural network model, obtains volume Before estimating model, the method further includes:Image preprocessing is carried out to the image comprising plurality of classes food or video data of collection, including:The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its object of reference Position and frame.
- 8. the method according to the description of claim 7 is characterized in that the image or video that include plurality of classes food to collection After data are labeled, the method further includes:The default SDD deep learning neural network models of result data input of mark are trained, obtain food region detection Model.
- 9. the according to the method described in claim 8, it is characterized in that, image or video that include plurality of classes food to collection Data carry out image preprocessing, further include:Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to operate, obtains the video Single width or multiple image.
- 10. according to claim 1 to 9 any one of them method, it is characterised in that according to the image or video counts of food to be measured According to the volumetric estimate model is utilized, obtain the volumetric estimate of the food to be measured as a result, including:Video in the image or video data of food to be measured, which take out, frame or divides mirror to operate, obtain the video single width or Multiple image.
- 11. a kind of volume of food estimation device, it is characterised in that described device includes:Acquisition module, for gathering the image or video data that include plurality of classes food;Acquisition module, for obtaining the true volume data of food in the described image gathered or video data;Model training module, for utilizing default depth according to described multiple images or video data, the true volume data Degree learning neural network model is trained, and obtains volumetric estimate model;Model computation module, utilizes the volumetric estimate model for the image according to food to be measured or video data, obtains institute State the volumetric estimate result of food to be measured.
- 12. volume of food estimation device according to claim 11, it is characterised in that the model training module, is used for Default ResNet, VGG or DenseNet depth is utilized according to described multiple images or video data, the true volume data Learning neural network model is trained, and obtains volumetric estimate model.
- 13. volume of food estimation device according to claim 11, it is characterised in that the model training module is used for: Described multiple images or video data, the true volume data are inputted into default ResNet, VGG or DenseNet depth Learning neural network model is trained, and volume calculates and uses ResNet, VGG or DenseNet deep learning neutral net mould Type, loss function are calculated using mean square error function, obtain volumetric estimate model.
- 14. volume of food estimation device according to claim 11, it is characterised in that the model training module is used for: Described multiple images or video data, the true volume data are inputted into default ResNet10 deep learnings neutral net mould Type is trained, and the full articulamentum of ResNet10 networks last layer is changed to one-dimensional output, and changes Loss layers, and by mesh Scalar functions are changed to the deep learning neural network model of euclidean loss function acquisition, and volume calculates and uses ResNet10 depth Learning neural network model, loss function are calculated using mean square error function, obtain volumetric estimate model.
- 15. volume of food estimation device according to claim 11, it is characterised in that described device further includes image and locates in advance Module is managed, for carrying out image preprocessing to the image comprising plurality of classes food or video data of collection, including:The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its object of reference Position and frame;Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to operate, is obtained Take the single width or multiple image of the video;Video in the image or video data of food to be measured is carried out taking out frame or divides mirror to grasp Make, obtain the single width or multiple image of the video.
- 16. volume of food estimation device according to claim 15, it is characterised in thatThe model training module is additionally operable to:By the default SDD deep learnings neural network model of the result data of mark input into Row training, obtains food region detection model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711320238.2A CN108038879A (en) | 2017-12-12 | 2017-12-12 | A kind of volume of food method of estimation and its device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711320238.2A CN108038879A (en) | 2017-12-12 | 2017-12-12 | A kind of volume of food method of estimation and its device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108038879A true CN108038879A (en) | 2018-05-15 |
Family
ID=62102099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711320238.2A Pending CN108038879A (en) | 2017-12-12 | 2017-12-12 | A kind of volume of food method of estimation and its device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108038879A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109003299A (en) * | 2018-07-05 | 2018-12-14 | 北京推想科技有限公司 | A method of the calculating cerebral hemorrhage amount based on deep learning |
CN109064509A (en) * | 2018-06-29 | 2018-12-21 | 广州雅特智能科技有限公司 | The recognition methods of food volume and fuel value of food, device and system |
CN110174399A (en) * | 2019-04-10 | 2019-08-27 | 晋江双龙制罐有限公司 | Solid content qualification detection method and its detection system in a kind of transparent can |
WO2021082285A1 (en) * | 2019-10-30 | 2021-05-06 | 青岛海尔智能技术研发有限公司 | Method and device for measuring volume of ingredient, and kitchen appliance apparatus |
CN113128300A (en) * | 2019-12-30 | 2021-07-16 | 上海际链网络科技有限公司 | Cargo volume measuring method and artificial intelligence system in logistics park |
CN113201905A (en) * | 2020-01-15 | 2021-08-03 | 青岛海尔洗衣机有限公司 | Clothes volume estimation method and control method of clothes treatment equipment and clothes treatment system |
CN113486689A (en) * | 2020-05-27 | 2021-10-08 | 海信集团有限公司 | Refrigerator and food material volume estimation method |
CN114565659A (en) * | 2022-01-19 | 2022-05-31 | 北京精培医学研究院 | Food volume estimation method based on single depth map deep learning view synthesis |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103148781A (en) * | 2013-01-26 | 2013-06-12 | 广西工学院鹿山学院 | Grapefruit size estimating method based on binocular vision |
CN103162627A (en) * | 2013-03-28 | 2013-06-19 | 广西工学院鹿山学院 | Method for estimating fruit size by citrus fruit peel mirror reflection |
CN103307979A (en) * | 2013-05-27 | 2013-09-18 | 四川农业大学 | Fruit volume measuring method based on computer vision |
WO2015000890A1 (en) * | 2013-07-02 | 2015-01-08 | Roche Diagnostics Gmbh | Estimation of food volume and carbs |
CN104764402A (en) * | 2015-03-11 | 2015-07-08 | 广西科技大学 | Visual inspection method for citrus size |
CN106757976A (en) * | 2017-01-22 | 2017-05-31 | 无锡小天鹅股份有限公司 | Washing machine and its control method of washing and device based on image recognition clothing volume |
CN107180438A (en) * | 2017-04-26 | 2017-09-19 | 清华大学 | Estimate yak body chi, the method for body weight and corresponding portable computer device |
-
2017
- 2017-12-12 CN CN201711320238.2A patent/CN108038879A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103148781A (en) * | 2013-01-26 | 2013-06-12 | 广西工学院鹿山学院 | Grapefruit size estimating method based on binocular vision |
CN103162627A (en) * | 2013-03-28 | 2013-06-19 | 广西工学院鹿山学院 | Method for estimating fruit size by citrus fruit peel mirror reflection |
CN103307979A (en) * | 2013-05-27 | 2013-09-18 | 四川农业大学 | Fruit volume measuring method based on computer vision |
WO2015000890A1 (en) * | 2013-07-02 | 2015-01-08 | Roche Diagnostics Gmbh | Estimation of food volume and carbs |
CN104764402A (en) * | 2015-03-11 | 2015-07-08 | 广西科技大学 | Visual inspection method for citrus size |
CN106757976A (en) * | 2017-01-22 | 2017-05-31 | 无锡小天鹅股份有限公司 | Washing machine and its control method of washing and device based on image recognition clothing volume |
CN107180438A (en) * | 2017-04-26 | 2017-09-19 | 清华大学 | Estimate yak body chi, the method for body weight and corresponding portable computer device |
Non-Patent Citations (7)
Title |
---|
K.A.FORBERS 等: "Estimating fruit volume from digital images", 《1999 IEEE AFRICON. 5TH AFRICON CONFERENCE IN AFRICA (CAT. NO.99CH36342)》 * |
PARISA POULADZADEH 等: "Food calorie measurement using deep learning neural network", 《2016 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE PROCEEDINGS》 * |
王文平: "《经济管理数据、模型与计算--方法、实现及案例》", 30 November 2010, 东南大学出版社 * |
翟俊海 等: "卷积神经网络及其研究进展", 《河北大学学报》 * |
董守斌 等: "《网络信息检索》", 30 April 2010, 西安电子科技大学出版社 * |
邹权 等: "《系统生物学中的网络分析方法》", 30 April 2015, 西安电子科技大学出版社 * |
颜志国 等: "《深度学习原理与TensorFlow实践》", 30 June 2017, 电子工业出版社 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109064509A (en) * | 2018-06-29 | 2018-12-21 | 广州雅特智能科技有限公司 | The recognition methods of food volume and fuel value of food, device and system |
CN109064509B (en) * | 2018-06-29 | 2021-04-06 | 广州雅特智能科技有限公司 | Method, device and system for recognizing food volume and food heat |
CN109003299A (en) * | 2018-07-05 | 2018-12-14 | 北京推想科技有限公司 | A method of the calculating cerebral hemorrhage amount based on deep learning |
CN110174399A (en) * | 2019-04-10 | 2019-08-27 | 晋江双龙制罐有限公司 | Solid content qualification detection method and its detection system in a kind of transparent can |
WO2021082285A1 (en) * | 2019-10-30 | 2021-05-06 | 青岛海尔智能技术研发有限公司 | Method and device for measuring volume of ingredient, and kitchen appliance apparatus |
CN113128300A (en) * | 2019-12-30 | 2021-07-16 | 上海际链网络科技有限公司 | Cargo volume measuring method and artificial intelligence system in logistics park |
CN113201905A (en) * | 2020-01-15 | 2021-08-03 | 青岛海尔洗衣机有限公司 | Clothes volume estimation method and control method of clothes treatment equipment and clothes treatment system |
CN113486689A (en) * | 2020-05-27 | 2021-10-08 | 海信集团有限公司 | Refrigerator and food material volume estimation method |
CN113486689B (en) * | 2020-05-27 | 2024-11-01 | 海信集团有限公司 | Refrigerator and food volume estimation method |
CN114565659A (en) * | 2022-01-19 | 2022-05-31 | 北京精培医学研究院 | Food volume estimation method based on single depth map deep learning view synthesis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108038879A (en) | A kind of volume of food method of estimation and its device | |
Wang et al. | An edge-weighted centroidal Voronoi tessellation model for image segmentation | |
US8139850B2 (en) | Constraint generation for use in image segregation | |
CN108229379A (en) | Image-recognizing method, device, computer equipment and storage medium | |
US20100142825A1 (en) | Image segregation system architecture | |
CN109272016A (en) | Target detection method, device, terminal equipment and computer readable storage medium | |
US8260050B2 (en) | Test bed for optimizing an image segregation | |
US20100142846A1 (en) | Solver for image segregation | |
CN109658412A (en) | It is a kind of towards de-stacking sorting packing case quickly identify dividing method | |
CN107730515A (en) | Panoramic picture conspicuousness detection method with eye movement model is increased based on region | |
CN109102506A (en) | A kind of automatic division method carrying out abdominal CT hepatic disease image based on three-stage cascade network | |
US8983183B2 (en) | Spatially varying log-chromaticity normals for use in an image process | |
Pound et al. | A patch-based approach to 3D plant shoot phenotyping | |
WO2011075164A1 (en) | Method and system for factoring an illumination image | |
CN109671055B (en) | Pulmonary nodule detection method and device | |
CN109146934A (en) | A kind of face three-dimensional rebuilding method and system based on binocular solid and photometric stereo | |
Sasmal et al. | A survey on the utilization of Superpixel image for clustering based image segmentation | |
AU2015218184A1 (en) | Processing hyperspectral or multispectral image data | |
CN110189318A (en) | Pulmonary nodule detection method and system with semantic feature score | |
US8798392B2 (en) | Method and system for generating intrinsic images using a smooth illumination constraint | |
CN109766919A (en) | Cascade the gradual change type Classification Loss calculation method and system in object detection system | |
US8934735B2 (en) | Oriented, spatio-spectral illumination constraints for use in an image progress | |
WO2015171355A1 (en) | A method for identifying color-based vectors for analysis of frames of a video | |
Liu et al. | Automated binocular vision measurement of food dimensions and volume for dietary evaluation | |
Beaini et al. | Deep green function convolution for improving saliency in convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180515 |
|
RJ01 | Rejection of invention patent application after publication |