CN108038879A

CN108038879A - A kind of volume of food method of estimation and its device

Info

Publication number: CN108038879A
Application number: CN201711320238.2A
Authority: CN
Inventors: 韩天奇; 李宏宇
Original assignee: Zhongan Information Technology Service Co Ltd
Current assignee: Zhongan Information Technology Service Co Ltd
Priority date: 2017-12-12
Filing date: 2017-12-12
Publication date: 2018-05-15

Abstract

The invention discloses a kind of volume of food method of estimation and its device, belong to depth learning technology field.The described method includes：Collection includes the image or video data of plurality of classes food, and obtains the described image of collection or the true volume data of food in video data；It is trained according to described image or video data, the true volume data using default deep learning neural network model, obtains volumetric estimate model；The volumetric estimate model is utilized according to the image of food to be measured or video data, obtains the volumetric estimate result of the food to be measured.The volumetric estimate method is simply efficient, user only need to simply input a width food image or a bit of food video, the prediction volume of the food can be quickly obtained, can be widely applied to need frequently rapidly to estimate in the network information services such as the similar intelligent eating and drinking management of volume of food.

Description

A kind of volume of food method of estimation and its device

Technical field

The present invention relates to deep learning technology field, more particularly to a kind of volume of food method of estimation and its device.

Background technology

Modern increasingly pays close attention to health diet, the heat of especially of interest intake food, and foodstuff calories are eaten with intake The volume of product is closely related, and how quickly to estimate the volume of food automatically using the food image of shooting is this intelligence The key of dietary management application.

At present, the method for the volume of food estimation based on image is also fewer, and existing method is more by user's input mostly Multi-view image, then carries out the food in image the reconstruct of threedimensional model, finally calculates the volume of threedimensional model.This mode It is higher to user's input requirements, use very inconvenient, and need substantial amounts of computing resource in the process of three-dimensional reconstruction, especially It is not suitable for doing mobile edition application on mobile phone.

The content of the invention

In order to solve problem of the prior art, an embodiment of the present invention provides a kind of volume of food method of estimation and its dress Put.The technical solution is as follows：

First aspect, there is provided a kind of volume of food method of estimation, the described method includes：

Collection includes the image or video data of plurality of classes food, and obtains in described image or the video data of collection The true volume data of food；Default deep learning is utilized according to described image or video data, the true volume data Neural network model is trained, and obtains volumetric estimate model；The body is utilized according to the image of food to be measured or video data Product estimation model, obtains the volumetric estimate result of the food to be measured.

With reference to first aspect, in the first possible implementation, collection includes the image or video of plurality of classes food Data, and the described image of collection or the true volume data of food in video data are obtained, including：

Under a variety of backgrounds, several scenes, a variety of shooting angle, collection includes the image or video counts of plurality of classes food According to, a variety of backgrounds include simple background and complex background, and the scene includes general room scene and common outdoor scene, A variety of shooting angle include at least positive visual angle and oblique viewing angle；And in the described image or food data by measuring collection The true volume of food, to obtain the true volume data of food in the described image of collection or video data.

With reference to first aspect, in second of possible implementation, the predetermined deep learning neural network model includes Default ResNet, VGG or DenseNet deep learning neural network model.

With reference to first aspect, in the third possible implementation, according to described multiple images or video data, described true Entity volume data is trained using default deep learning neural network model, obtains volumetric estimate model, including：

By described image or video data, the true volume data input default deep learning neural network model into Row training, volume, which calculates, uses default deep learning neural network model, and loss function is calculated using mean square error function, obtained Obtain volumetric estimate model.

The third possible implementation with reference to first aspect, in the 4th kind of possible implementation, the default depth It is default ResNet deep learnings neural network model to spend learning neural network model：By ResNet10 networks last layer Full articulamentum is changed to one-dimensional output, and changes Loss layers, and object function is changed to the acquisition of euclidean loss function ResNet10 deep learning neural network models, wherein desired value correspond to the true volume of food.

With reference to first aspect, in the 5th kind of possible implementation, utilized according to the image of food to be measured or video data The volumetric estimate model, obtain the volumetric estimate of the food to be measured as a result, including：

The image of food to be measured or video data are inputted into the volumetric estimate model, when described image or video data packet During multiple image containing the food to be measured, volume is calculated according to the corresponding each bulking value of each image being calculated and is put down Average, so far obtains the volumetric estimate result of the food to be measured.

With reference to first aspect, in the 6th kind of possible implementation, according to described multiple images or video data, described true Entity volume data is trained using default ResNet, VGG or DenseNet deep learning neural network model, obtains body Before product estimation model, the method further includes：

Image preprocessing is carried out to the image comprising plurality of classes food or video data of collection, including：

The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its ginseng Position and frame according to thing.

The 6th kind of possible implementation with reference to first aspect, in the 7th kind of possible implementation, includes collection After the image or video data of plurality of classes food are labeled, the method further includes：

The default SDD deep learning neural network models of result data input of mark are trained, obtain food region Detection model；

Wherein, the volumetric estimate model is utilized according to the image of food to be measured or video data, obtains the food to be measured The volumetric estimates of product as a result, including：

The image or video data of food to be measured are labeled, mark out the position of the food to be measured and its object of reference And frame, food annotation results data to be measured are obtained, the food region detection is utilized according to food annotation results data to be measured Model, obtains the area detection result data of the food to be measured, the volume is utilized according to the area detection result data Estimate model, obtain the volumetric estimate result of the food to be measured.

The 7th kind of possible implementation with reference to first aspect, in the 8th kind of possible implementation, includes collection The image or video data of plurality of classes food carry out image preprocessing, further include：

Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to operate, described in acquisition The single width or multiple image of video.

The first to eight kind of possible implementation with reference to first aspect, in the 9th to 16 kind of possible implementation, root The volumetric estimate model is utilized according to the image or video data of food to be measured, obtains the volumetric estimate knot of the food to be measured Fruit, including：

Video in the image or video data of food to be measured is carried out taking out frame or divides mirror to operate, obtains the list of the video Width or multiple image.

Second aspect, there is provided a kind of volume of food estimation device, described device include：

Acquisition module, for gathering the image or video data that include plurality of classes food；

Acquisition module, for obtaining the true volume data of food in the described image gathered or video data；

Model training module, it is default for being utilized according to described multiple images or video data, the true volume data ResNet, VGG or DenseNet deep learning neural network model be trained, obtain volumetric estimate model；

Model computation module, utilizes the volumetric estimate model for the image according to food to be measured or video data, obtains Obtain the volumetric estimate result of the food to be measured.

With reference to second aspect, in the first possible implementation, the model training module is used for：According to the multiple Image or video data, the true volume data utilize default ResNet, VGG or DenseNet deep learning neutral net Model is trained, and obtains volumetric estimate model.

With reference to second aspect, in second of possible implementation, the model training module is used for：By the multiple figure Picture or video data, the true volume data input default ResNet, VGG or DenseNet deep learning neutral net Model is trained, and volume calculates and uses ResNet, VGG or DenseNet deep learning neural network model, and loss function is adopted Calculated with mean square error function, obtain volumetric estimate model.

With reference to second aspect, in the third possible implementation, the model training module is used for：By the multiple figure Picture or video data, the true volume data input default ResNet10 deep learnings neural network model and are trained, The full articulamentum of ResNet10 networks last layer is changed to one-dimensional output, and object function is changed to euclidean loss function The deep learning neural network model of acquisition, volume calculate and use ResNet10 deep learning neural network models, loss function Calculated using mean square error function, obtain volumetric estimate model.

With reference to second aspect, in the 4th kind of possible implementation, described device further includes image pre-processing module, is used for Image preprocessing is carried out to the image comprising plurality of classes food or video data of collection, including：

The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its ginseng Position and frame according to thing；Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to grasp Make, obtain the single width or multiple image of the video；To the video in the image or video data of food to be measured carry out take out frame or Divide mirror operation, obtain the single width or multiple image of the video.

With reference to the 4th kind of possible implementation of second aspect, in the 5th kind of possible implementation,

The model training module is additionally operable to：By the default SDD deep learning neutral net moulds of result data input of mark Type is trained, and obtains food region detection model.

A kind of volume of food method of estimation provided in an embodiment of the present invention and its device, by the food image of collection or regard Frequency according to this and food true volume data, is trained predetermined deep learning neural network model study, obtains volume and estimates Count model, the volumetric estimate model for then again learning the image of food to be measured or video input, due to the volumetric estimate Neural network models obtained from model is a large amount of training study of progress early period, can learn to calculate the volume of food to be measured automatically Predicted value, so as to effectively realize the estimation of volume of food.The embodiment party of volume of food estimation based on the embodiment of the present invention Case, at least has the advantages that compared with prior art：

(1) model provided in an embodiment of the present invention learns to obtain by deep neural network completely, and food is not carried out Explicitly feature extraction (such as contours extract), can accurately realize the volume reconstruction of single image under complex background；

(2) for new complicated shape food, only model need to be trained again, without the new food of manual analysis Characteristic updates feature database；

(3) multiple food regions in input picture can be detected at the same time, and predict its corresponding volume respectively, be adapted to The scene of single or multiple food, also, this method is not limited to food type, is answered suitable for fruit, cake, disk meal etc. are various Use scene；

(4) method that the embodiment of the present invention proposes avoids the complex process of three-dimensional reconstruction, it is not necessary to substantial amounts of to calculate money Source, is more suitable for practical application, is particularly applied to the intelligent eating and drinking management of mobile terminal；

(5) method proposed by the present invention is simple to input requirements, can directly input a width food image or a bit of phase Video is closed, there is more preferable user experience.

Generally speaking, volumetric estimate method provided in an embodiment of the present invention is due to the use of deep learning neutral net, both Other characteristics of image such as food profile, food geometric characteristic and background characteristics need not explicitly be extracted, it is not required that right Image background carries out a priori assumption, it is possible to satisfactory food image area data under complex scene is robustly extracted, Subgraph in the food image region extracted is input to trained volumetric estimate model, obtains volume of food prediction Value.The volumetric estimate method is simply efficient, and user only need to simply input a width food image or a bit of food video, it is possible to The prediction volume of the food is quickly obtained, flow is simple, efficient, can be widely applied to need frequently rapidly to estimate to eat In the network information services such as the similar intelligent eating and drinking management of product volume.

Brief description of the drawings

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those of ordinary skill in the art, without creative efforts, other can also be obtained according to these attached drawings Attached drawing.

Fig. 1 is volume of food method of estimation flow diagram provided in an embodiment of the present invention；

Fig. 2 is volume of food method of estimation flow diagram provided in an embodiment of the present invention；

Fig. 3 is the first stage flow diagram of volume of food method of estimation provided in an embodiment of the present invention；

Fig. 4 is the second stage flow diagram of volume of food method of estimation provided in an embodiment of the present invention；

Fig. 5 is the structure diagram of volume of food estimation device provided in an embodiment of the present invention；

Fig. 6 is monocular logo image food court domain testing result schematic diagram in application example；

Fig. 7 is monocular logo image food court domain testing result schematic diagram in application example；

Fig. 8 is multi-Target Image food area detection result schematic diagram in application example；

Middle monocular logo image volume of food estimation flow diagram in Fig. 9 application examples；

Figure 10 is that multi-Target Image volume of food estimates flow diagram in application example.

Embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention Figure, is clearly and completely described the technical solution in the embodiment of the present invention, it is clear that described embodiment is only this Invention part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist All other embodiments obtained under the premise of creative work are not made, belong to the scope of protection of the invention.

It should be noted that term " first ", " second " etc. are only used for description purpose, and it is not intended that instruction or hint Relative importance or the implicit quantity for indicating indicated technical characteristic.Thus, the feature of " first ", " second " etc. is defined It can express or implicitly include one or more this feature.In the description of the present invention, " multiple " are meant that two More than a, unless otherwise specifically defined.

An embodiment of the present invention provides a kind of volume of food method of estimation and its device, by the food image of collection or regard Frequency according to this and food true volume data, is trained predetermined deep learning neural network model study, obtains volume and estimates Count model, the volumetric estimate model for then again learning the image of food to be measured or video input, due to the volumetric estimate Neural network models obtained from model is a large amount of training study of progress early period, can learn to calculate the volume of food to be measured automatically Predicted value, so as to effectively realize the estimation of volume of food.The volumetric estimate method is due to the use of deep learning neutral net, both Other characteristics of image such as food profile, food geometric characteristic and background characteristics need not explicitly be extracted, it is not required that right Image background carries out a priori assumption, it is possible to satisfactory food image area data under complex scene is robustly extracted, Subgraph in the food image region extracted is input to trained volumetric estimate model, obtains volume of food prediction Value.The volumetric estimate method is simply efficient, and user only need to simply input a width food image or a bit of food video, it is possible to The prediction volume of the food is quickly obtained, can be widely applied to need frequently and rapidly estimate the similar intelligence of volume of food In the network information services such as dietary management.

Below in conjunction with specific embodiment, application example and its attached drawing, volume of food provided in an embodiment of the present invention is estimated Meter method and device is described further.

Embodiment 1

Fig. 1 is volume of food method of estimation flow diagram provided in an embodiment of the present invention.As shown in Figure 1, the present invention is real The volume of food method of estimation of example offer is provided, is comprised the following steps：

101st, collection includes the image or video data of plurality of classes food, and obtains the described image or video counts of collection According to the true volume data of middle food.

Specifically, under a variety of backgrounds, several scenes, a variety of shooting angle, collection includes the image of plurality of classes food Or video data, a variety of backgrounds include but not limited to simple background (such as desktop background, pure color white background) and complex background, scene Including general room scene and common outdoor scene, the inclination that a variety of shooting angle include at least positive visual angle and have certain deviation regards Angle.Metastable object of reference is preferably included in food image, such as：Coin, finger etc., to be easy to use, the present invention is implemented Example preferably chooses object of reference of the finger as food.

In addition, the true volume by measuring food in the image gathered or food data, to obtain the figure of collection The true volume data of food in picture or video data.Preferably, cubing is carried out to every kind of food by graduated cylinder method, obtained True volume data, to be used for a part for the training data of training pattern as the later stage.

102nd, instructed according to image or video data, true volume data using predetermined deep learning neural network model Practice, obtain volumetric estimate model.

Specifically, image or video data, true volume data input predetermined deep learning neural network model are carried out Training, preferably volume, which calculate, uses ResNet, VGG or DenseNet deep learning neural network model, and loss function uses Mean square error function calculates, and obtains volumetric estimate model.Here default predetermined deep learning neural network model, except preferred Ground is used and carried out as needed outside default ResNet, VGG or DenseNet deep learning neural network model, can also be used Any possible deep learning neural network model, the embodiment of the present invention do not limit it especially in the prior art.

Preferably, existing ResNet deep learnings neural network model is finely adjusted, by ResNet10 sorter networks most The full articulamentum of later layer is changed to one-dimensional output, and object function is changed to euclidean loss function, obtains this programme and is applicable in ResNet deep learning neural network models, i.e., default ResNet deep learnings neural network model, wherein desired value pair Answer the true volume of food.

103rd, the volumetric estimate model is utilized according to the image of food to be measured or video data, obtains the body of food to be measured Product estimated result.

Specifically, the image of food to be measured or video data are inputted into volumetric estimate model, when image or video data packet During multiple image containing food to be measured, volume is calculated according to the corresponding each bulking value of each image being calculated and is averaged Value, so far obtains the volumetric estimate result of food to be measured.

The data type of input volumetric estimate model can be image or video, if input is video data, preferably exist A point mirror is first carried out to video before input, respective image sequence is obtained, is then selected at random from the video sequence comprising food again A certain number of images are taken, do volume predictions respectively to every part of image, last output valve is the average value of all predicted values.

Embodiment 2

Fig. 2 is volume of food method of estimation flow diagram provided in an embodiment of the present invention.Fig. 3 is that the embodiment of the present invention carries The first stage flow diagram of the volume of food method of estimation of confession.Fig. 4 is volume of food estimation provided in an embodiment of the present invention The second stage flow diagram of method.As in Figure 2-4, volume of food method of estimation provided in an embodiment of the present invention can be divided into Two stages：First stage-training obtains food region detection model M1 and 2 stage of volumetric estimate model M；Second stage- Volume of food estimation stages to be measured.

Specifically, the first stage comprises the following steps：

201st, under a variety of backgrounds, several scenes, a variety of shooting angle, image or regard that collection includes plurality of classes food Frequency evidence；And by measuring the true volume of food in the image gathered or food data, to obtain the image of collection or video The true volume data of food, instant food data acquisition in data.A variety of backgrounds include but not limited to simple background (such as table Face background, pure color white background) and complex background, scene include general room scene and common outdoor scene, a variety of shooting angle Including at least positive visual angle and the oblique viewing angle for having certain deviation.Metastable object of reference is preferably included in food image, such as： Coin, finger etc., to be easy to use, the embodiment of the present invention preferably chooses object of reference of the finger as food.

202nd, the image comprising plurality of classes food or video data of collection are labeled, mark out food and its ginseng Position and frame according to thing, specifically, manually mark food subject area and its object of reference, to be next training Food region detection model is obtained to prepare.

In addition in the case of collection has the video data of food, above-mentioned 202 step further includes following sub-step：

2021st, the video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to operate, obtained Take the single width or multiple image of video.

203rd, the default SDD deep learning neural network models of result data input of mark are trained, obtain food Region detection model M 1, i.e., by a large amount of food region detections training output food region detection model M1.Specifically, existing SSD (Single Shot MultiBox Detector) algorithm (refers to following prior art data：Wei Liu, Dragomir Anguelov,Dumitru Erhan,Christian Szegedy,Scott Reed,Cheng-Yang Fu, and AlexanderC.Berg,“Ssd:Single shot multibox detector,”in ECCV 2016:14th European Conference on Computer Vision, Part I, 2016, pp.21-37) on the basis of, to being related to detection Model, the corresponding module of detection algorithm carry out model fine setting, default SDD deep learning neural network models are obtained, to be applicable in Then a large amount of food datas collected are inputted default SDD deep learning neural network models, obtain food region by this programme Detection model M1.In addition, being detected with food region detection model M1 to the food image of all collections, extract comprising food The frame (such as being set as rectangular area) of product and its object of reference, the input data as training volumetric estimate model.Here, obtain Default SDD deep learnings neural network model and acquisition pattern that food region detection model M1 processes use are obtained, not office Above-mentioned model and method are limited, can also other any possible detection model or detection methods, such as also may be used in the prior art To carry out food region detection using the food method for detecting area based on engineer's feature.

204th, the image comprising plurality of classes food or video data, true volume data are inputted into default ResNet depths Degree learning neural network model is trained, and volume calculates and uses ResNet deep learning neural network models, and loss function is adopted Calculated with mean square error function, obtain volumetric estimate model M 2；Wherein, default ResNet deep learnings neural network model For：The full articulamentum of ResNet10 networks last layer is changed to one-dimensional output, and changes Loss layers, and object function is changed The deep learning neural network model obtained for euclidean loss function, wherein desired value correspond to the true volume of food.

Second stage comprises the following steps：

205th, the image of food to be measured or video data are inputted into volumetric estimate model, when image or video data are included and treated When surveying the multiple image of food, bulk averaged value is calculated according to the corresponding each bulking value of each image being calculated, extremely This obtains the volumetric estimate result of food to be measured.

Image or food data gatherer process for food to be measured, can in the lump carry out in above-mentioned 201 step, also may be used To perform an independent food data gatherer process before above-mentioned 205 step, detailed process or be related to that food data gathers Ins and outs it is similar with above-mentioned 201 step, details are not described herein.

In addition in the case of collection has the video data of food, above-mentioned 205 step further includes following sub-step：

2051st, the video in the video data comprising plurality of classes food to be measured of collection is carried out taking out frame or divides mirror to grasp Make, obtain the single width or multiple image of video.

Embodiment 3

Fig. 5 is the structure diagram of volume of food estimation device provided in an embodiment of the present invention.It is as shown in figure 5, of the invention The volume of food estimation device that embodiment provides includes：

Acquisition module 31, for gathering the image or video data that include plurality of classes food；

Acquisition module 32, for obtaining the true volume data of food in the image or video data that gather；

Model training module 33, it is pre- for being utilized according to described multiple images or video data, the true volume data If deep learning neural network model be trained, obtain volumetric estimate model.Preferably, model training module 33 is used for root Default ResNet, VGG or DenseNet deep learning nerve is utilized according to multiple images or video data, true volume data Network model is trained, and obtains volumetric estimate model.Preferably, model training module is used for：By multiple images or video counts Default ResNet, VGG or DenseNet deep learning neural network model is inputted according to, true volume data to be trained, body Product calculates and uses ResNet, VGG or DenseNet deep learning neural network model, and loss function uses mean square error function meter Calculate, obtain volumetric estimate model.It is further preferred that model training module is used for：By multiple images or video data, true body Volume data inputs default ResNet10 deep learnings neural network model and is trained, by ResNet10 networks last layer Full articulamentum is changed to one-dimensional output, and object function is changed to the deep learning neutral net mould of euclidean loss function acquisition Type, volume, which calculates, uses ResNet10 deep learning neural network models, and loss function is calculated using mean square error function, is obtained Volumetric estimate model.

Model computation module 34, volumetric estimate model is utilized for the image according to food to be measured or video data, is obtained The volumetric estimate result of food to be measured.

Preferably, volume of food estimation device provided in an embodiment of the present invention further includes image pre-processing module 35, is used for Image preprocessing is carried out to the image comprising plurality of classes food or video data of collection, including：

The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its ginseng Position and frame according to thing；Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to grasp Make, obtain the single width or multiple image of the video；To the video in the image or video data of food to be measured carry out take out frame or Divide mirror operation, obtain the single width or multiple image of the video.Thus, model training module 33 is additionally operable to：By the result of mark The default SDD deep learning neural network models of data input are trained, and obtain food region detection model；

It should be noted that：The volume of food estimation device that above-described embodiment provides is when performing volumetric estimate business, only With the division progress of above-mentioned each function module for example, in practical application, can as needed and by above-mentioned function distribution by Different function modules is completed, i.e., the internal structure of device is divided into different function modules, described above complete to complete Portion or partial function.In addition, volume of food estimation device and volume of food method of estimation embodiment that above-described embodiment provides Belong to same design, its specific implementation process refers to embodiment of the method, and which is not described herein again.

Application example 1

Fig. 6 is monocular logo image food court domain testing result schematic diagram in application example.Fig. 7 is single goal in application example Image food area detection result schematic diagram.Fig. 8 is multi-Target Image food area detection result schematic diagram in application example.Figure Middle monocular logo image volume of food estimation flow diagram in 9 application examples.Figure 10 is multi-Target Image food in application example Volumetric estimate flow diagram.

In the application example, volume of food method of estimation implementation process provided in an embodiment of the present invention is divided into following two Stage：First stage-training obtains food region detection model M1 and 2 stage of volumetric estimate model M；Second stage-to be measured Volume of food estimation stages.

(1) first stage, comprises the following steps：

1) food data gathers.Enough food varieties and classification are collected, fruit is included but are not limited to, respectively to every The different classes of individual of kind food and every kind of food, image is gathered under different background different angle.Wherein, background includes doing Public scene, general room scene, road streetscape etc.；Angle is randomly selected from the left and right sides, so as to including whole food and ginseng It is principle according to thing.Meanwhile cubing is carried out to every kind of food individual, obtain true volume data (unit：Cubic centimetre).This In used 28 different food, acquire 8000 width images altogether.

2) food image marks.The image selection N width of collection is manually marked, marks out object of reference (finger) and food Product subject area.Preferably, setting N=100.

3) food region detection, obtains food region detection model M1.The present embodiment realizes food using object detection algorithm Product examine is surveyed, preferably, being trained using artificial labeled data on the basis of existing SSD algorithms, 3000 step of iteration, is optimized Parameter model afterwards is saved as food region detection model M1.As shown in figs 6-8, food image is detected using model M 1, The rectangular area comprising object of reference and food is extracted, the input data as later stage training volumetric estimate model.Extract altogether 28 different food, 112 width images.

4) training obtains volumetric estimate model M 2.The present embodiment carries out model training using deep neural network, as excellent Choosing, using ResNet10, the output of 1 dimension is revised as by the full articulamentum of last layer, and changes Loss layers, and object function uses Euclidean loss function, desired value correspond to the true volume (unit of food：Cubic centimetre).After the training of 20000 steps, obtain To volumetric estimate model M 2.

(2) second stage, as shown in figure 9, food estimation stages to be measured comprise the following steps：

1) input data pre-processes.Here the food data inputted can be completed in single acquisition step, be obtained to be measured Food single image or video segment, can also complete food data gatherer process in the lump in the first stage.If input is video Fragment, then first divide video in mirror with automatic lens border detection algorithm, then 3 width image conducts are chosen from obtained image sequence Input.

2) food region detection.Food region and reference are detected with food region detection model M1 to every input picture Thing (finger) region, can obtain P food object and 1 finger.It is as input pre- that this P food region is extracted respectively Survey data.Exemplarily, as shown in figs 6-8, Fig. 6,7 show food area detection result to the image containing single goal, Fig. 8 then shows the food area detection result to the image containing multiple target.

3) volume predictions.Respectively by food area detection result (the single goal image-region testing result and more after detection Object region testing result) it is input in volumetric estimate model M 2, obtain corresponding volume predictions value.If same width figure There is P food object as in, then P volume of food and the cumulative volume for being exactly food in the image.If initial input is to regard Frequency fragment, then the bulk averaged value of 3 images of prediction, then be the prediction volume of input food object.

Table 1 below is the predicted value (the third line) and true volume value (second of the food object volume in corresponding diagram 6-8 OK)：The predicted value of image is 408 cubic centimetres and 186 cubic centimetres respectively shown in image shown in Fig. 6 and Fig. 7；In Fig. 8 due to There are two food objects, its cumulative volume is the sum of two food predicted values, i.e. 289+252=541 cubic centimetres.It can be seen that its Predict that error is very low in image during single food object, about 5% or so, error is higher when more food occur, and about exists 10% or so, but still within tolerance interval.

(unit：Cubic centimetre)	(a)	(b)	(c)
				Actual value	390	180	240 (left side)+250 (right side)=490
Predicted value	408	186	289 (left side)+252 (right side)=541

Furthermore it is preferred that as shown in Figure 10,, will be to be measured when it is monocular logo image to gather view data in second stage The collection view data of food is directly inputted in volumetric estimate model M 2, obtains the volumetric estimate value of food to be measured.The process saves The food area detection step using food region detection model M1 is omited, food region detection model M1 is primarily directed to more mesh Logo image carries out target area detection, and so operation further simplify volumetric estimate flow, improve volumetric estimate efficiency.This In explained or definition, monocular logo image refer to there was only single food in image, it is necessary to make one to monocular logo image and multi-Target Image Items mark, multi-Target Image refer in image only have multiple food targets, above-mentioned explanation or definition also only to facilitate statement, It is merely exemplary, in the case where not departing from design of the embodiment of the present invention and spirit, other form of presentation can also be used, Regardless of whether which kind of form of presentation should not all cause to limit to the present invention.

Above-mentioned all optional technical solutions, can use any combination to form the alternative embodiment of the present invention, herein no longer Repeat one by one.

From above description and practice, a kind of volume of food method of estimation provided in an embodiment of the present invention and its device, By the food image or video data and food true volume data of collection, to predetermined deep learning neural network model into Row training study, obtains volumetric estimate model, then again estimates the volume that the image of food to be measured or video input learn Model is counted, since the volumetric estimate model is neural network model obtained from carrying out a large amount of training study early period, can be learned automatically The volume predictions value for calculating food to be measured is practised, so as to effectively realize the estimation of volume of food.There is provided based on the embodiment of the present invention Volume of food estimation embodiment, at least have the advantages that compared with prior art：

One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment To complete, relevant hardware can also be instructed to complete by program, the program can be stored in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent replacement, improvement and so on, should all be included in the protection scope of the present invention.

Claims

A kind of 1. volume of food method of estimation, it is characterised in that the described method includes：

Collection includes the image or video data of plurality of classes food, and obtain collection described image or video data in food True volume data；

Carried out according to described image or video data, the true volume data using default deep learning neural network model Training, obtains volumetric estimate model；

The volumetric estimate model is utilized according to the image of food to be measured or video data, the volume for obtaining the food to be measured is estimated Count result.
2. according to the method described in claim 1, it is characterized in that, collection includes the image or video counts of plurality of classes food According to, and the described image of collection or the true volume data of food in video data are obtained, including：

Under a variety of backgrounds, several scenes, a variety of shooting angle, collection includes the image or video data of plurality of classes food, A variety of backgrounds include simple background and complex background, and the scene includes general room scene and common outdoor scene, institute State a variety of shooting angle and include at least positive visual angle and oblique viewing angle；And eaten in the described image or food data by measuring collection The true volume of product, to obtain the true volume data of food in the described image of collection or video data.
3. according to the method described in claim 1, it is characterized in that, the predetermined deep learning neural network model includes presetting ResNet, VGG or DenseNet deep learning neural network model.
4. according to the method described in claim 1, it is characterized in that, according to described multiple images or video data, described true Volume data is trained using default deep learning neural network model, obtains volumetric estimate model, including：

Described image or video data, the true volume data are inputted default deep learning neural network model and instructed Practice, volume, which calculates, uses default deep learning neural network model, and loss function is calculated using mean square error function, obtains body Product estimation model.
5. according to the method described in claim 4, it is characterized in that, the default deep learning neural network model is default ResNet deep learning neural network models：The full articulamentum of ResNet10 networks last layer is changed to one-dimensional output, and And Loss layers are changed, and object function is changed to the ResNet10 deep learning neutral net moulds of euclidean loss function acquisition Type, wherein desired value correspond to the true volume of food.
6. according to the method described in claim 1, it is characterized in that, according to utilizing the image of food to be measured or video data Volumetric estimate model, obtain the volumetric estimate of the food to be measured as a result, including：

The image of food to be measured or video data are inputted into the volumetric estimate model, when described image or video data include institute When stating the multiple image of food to be measured, volume is calculated according to the corresponding each bulking value of each image being calculated and is averaged Value, so far obtains the volumetric estimate result of the food to be measured.
7. according to the method described in claim 1, it is characterized in that, according to described multiple images or video data, described true Volume data is trained using default ResNet, VGG or DenseNet deep learning neural network model, obtains volume Before estimating model, the method further includes：

Image preprocessing is carried out to the image comprising plurality of classes food or video data of collection, including：

The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its object of reference Position and frame.
8. the method according to the description of claim 7 is characterized in that the image or video that include plurality of classes food to collection After data are labeled, the method further includes：

The default SDD deep learning neural network models of result data input of mark are trained, obtain food region detection Model.
9. the according to the method described in claim 8, it is characterized in that, image or video that include plurality of classes food to collection Data carry out image preprocessing, further include：

Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to operate, obtains the video Single width or multiple image.
10. according to claim 1 to 9 any one of them method, it is characterised in that according to the image or video counts of food to be measured According to the volumetric estimate model is utilized, obtain the volumetric estimate of the food to be measured as a result, including：

Video in the image or video data of food to be measured, which take out, frame or divides mirror to operate, obtain the video single width or Multiple image.
11. a kind of volume of food estimation device, it is characterised in that described device includes：

Acquisition module, for gathering the image or video data that include plurality of classes food；

Acquisition module, for obtaining the true volume data of food in the described image gathered or video data；

Model training module, for utilizing default depth according to described multiple images or video data, the true volume data Degree learning neural network model is trained, and obtains volumetric estimate model；

Model computation module, utilizes the volumetric estimate model for the image according to food to be measured or video data, obtains institute State the volumetric estimate result of food to be measured.
12. volume of food estimation device according to claim 11, it is characterised in that the model training module, is used for Default ResNet, VGG or DenseNet depth is utilized according to described multiple images or video data, the true volume data Learning neural network model is trained, and obtains volumetric estimate model.
13. volume of food estimation device according to claim 11, it is characterised in that the model training module is used for： Described multiple images or video data, the true volume data are inputted into default ResNet, VGG or DenseNet depth Learning neural network model is trained, and volume calculates and uses ResNet, VGG or DenseNet deep learning neutral net mould Type, loss function are calculated using mean square error function, obtain volumetric estimate model.
14. volume of food estimation device according to claim 11, it is characterised in that the model training module is used for： Described multiple images or video data, the true volume data are inputted into default ResNet10 deep learnings neutral net mould Type is trained, and the full articulamentum of ResNet10 networks last layer is changed to one-dimensional output, and changes Loss layers, and by mesh Scalar functions are changed to the deep learning neural network model of euclidean loss function acquisition, and volume calculates and uses ResNet10 depth Learning neural network model, loss function are calculated using mean square error function, obtain volumetric estimate model.
15. volume of food estimation device according to claim 11, it is characterised in that described device further includes image and locates in advance Module is managed, for carrying out image preprocessing to the image comprising plurality of classes food or video data of collection, including：

The image comprising plurality of classes food or video data of collection are labeled, mark out the food and its object of reference Position and frame；Video in the video data comprising plurality of classes food of collection is carried out taking out frame or divides mirror to operate, is obtained Take the single width or multiple image of the video；Video in the image or video data of food to be measured is carried out taking out frame or divides mirror to grasp Make, obtain the single width or multiple image of the video.
16. volume of food estimation device according to claim 15, it is characterised in that

The model training module is additionally operable to：By the default SDD deep learnings neural network model of the result data of mark input into Row training, obtains food region detection model.