CN108597582B - Method and device for executing fast R-CNN neural network operation - Google Patents

Method and device for executing fast R-CNN neural network operation Download PDF

Info

Publication number
CN108597582B
CN108597582B CN201810352111.7A CN201810352111A CN108597582B CN 108597582 B CN108597582 B CN 108597582B CN 201810352111 A CN201810352111 A CN 201810352111A CN 108597582 B CN108597582 B CN 108597582B
Authority
CN
China
Prior art keywords
food
volume
cnn
network
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810352111.7A
Other languages
Chinese (zh)
Other versions
CN108597582A (en
Inventor
张团
陈云霁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201810352111.7A priority Critical patent/CN108597582B/en
Publication of CN108597582A publication Critical patent/CN108597582A/en
Application granted granted Critical
Publication of CN108597582B publication Critical patent/CN108597582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/60ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to nutrition control, e.g. diets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

A method and apparatus for performing fast R-CNN neural network operations, the method comprising: acquiring a plurality of images of the same portion of food at different angles; determining a recommended region for sample detection by using the RPN; predicting the category and the frame of the food object in the recommended area by using Fast R-CNN; predicting the Volume proportion of each food object by using Volume R-CNN according to the predicted food frame; calculating the volume proportion of different types of food according to the types of the food objects and the volume proportion of the food objects; respectively multiplying the calculated volume ratio of each kind of food with the density of each kind of food to obtain the mass ratio of each kind of food; multiplying the mass proportion of each kind of food by the total mass of the food to obtain the mass of each kind of food; multiplying the quality of each food with the corresponding nutrient content to obtain the nutrient element content of the food. The invention can measure complex and various foods, and the food can be identified more accurately and rapidly by adopting the artificial neural network technology and the chip.

Description

Method and device for executing fast R-CNN neural network operation
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a device for executing fast R-CNN neural network operation.
Background
With the acceleration of the pace of life and the improvement of the living standard of people in modern society, the requirements of people on diet are higher and higher. People no longer care about being able to eat full, but about whether the diet is healthy. However, many people lack sufficient knowledge of diet health, and therefore, there is a need for an apparatus capable of intelligently measuring food energy and nutritional ingredients to help people eat more properly.
One of the prior art is the calculation of food energy by weight. The metering device mainly comprises the following parts: tray, weight measurement device, and display screen. The weight measuring device is used for measuring the weight of the food and transmitting the weight information of the food to the microcomputer processor; the processor calculates the energy of the food and displays the result on the liquid crystal display screen.
The above technique has a problem in that the processing power is weak and it is suitable for measuring energy of only a single food kind. The measurement is inaccurate for food containing multiple food categories and the corresponding nutrient content cannot be calculated. The food information is not expandable, and the food information which is not recorded cannot be processed.
Another prior art is to take top view and side view of measured food by mobile phone, identify the food type by artificial neural network, and calculate the volume of each food according to formula; the nutrient content is calculated according to the volume of the food.
The above-mentioned technology has the problems of complicated operation and high requirement for inputting photos; side views are prone to food blockage problems. Parameters such as the length, the width and the like of food are predicted by using the focal length, certain deviation may exist for different mobile phones, the food volume is calculated by using a formula method, the method is not suitable for food with irregular shapes, and the error of the calculation result is large.
Disclosure of Invention
To solve the problems in the prior art, in one aspect, the present invention provides a method for performing fast R-CNN neural network operations, comprising:
acquiring a plurality of images of the same portion of food at different angles;
determining a recommended region for sample detection by using the RPN;
predicting the category and the frame of the food object in the recommended area by using Fast R-CNN;
predicting the Volume proportion of each food object by using Volume R-CNN according to the predicted food frame;
calculating the Volume proportion of different types of food according to the food object type predicted by Fast R-CNN and the Volume proportion of the food object predicted by Volume R-CNN;
respectively multiplying the calculated volume ratio of each kind of food with the density of each kind of food to obtain the mass ratio of each kind of food;
multiplying the mass proportion of each kind of food by the total mass of the food to obtain the mass of each kind of food;
multiplying the quality of each food by the corresponding nutrient content to obtain the content of the nutrient elements of the food;
wherein the RPN, Fast R-CNN and Volume R-CNN share a convolutional layer.
Preferably, when the recommended region is determined, the RPN performs multilayer convolution operation on the input picture to extract feature mapping of the picture, performs convolution operation on the feature mapping by using a sliding window, and calculates region classification and region regression by using two branches of a classification loss function and a frame regression loss function to obtain the recommended region.
Preferably, the Fast R-CNN maps the recommended regions to the feature maps to obtain RoIs, performs pooling operation on each RoI to convert into feature maps of the same size, and then performs two full-connection network operations on the pooled RoIs respectively to calculate the food object category in each recommended region and accurately predict the frame.
Preferably, the Volume R-CNN maps the predicted frame parameters to the feature map, performs pooling operation on the corresponding mapping regions to obtain sample regions of equal size, performs multi-layer full-connection network operation on each sample region, and calculates a Volume intermediate variable v of each food object in the graphi,viIs a positive number; then the volume intermediate variable is converted into the corresponding volume proportion fiThe calculation formula is as follows:
Figure BDA0001632538080000021
where i is 1,2 … … n, n being the number of food objects in the image.
Preferably, the method for mapping the predicted bounding box parameters onto the feature map comprises: each coordinate data is multiplied by the ratio of the size of the feature map and the original image.
Preferably, the loss function Volume loss in the Volume R-CNN is in the form of
Figure BDA0001632538080000031
Wherein f isiFor the predicted volume fraction of each food object, fi *And the actual value is the label data input in training.
Preferably, the output of the neural network in the prediction process comprises: the method comprises the steps of calculating an n-dimensional vector which represents the Volume proportion of each food object in an image by Volume R-CNN, wherein each element is positioned in an interval (0, 1), the sum of all elements is 1, calculating an n m matrix which represents the category of each food object in the image by Fast R-CNN, m is the number of the categories of identifiable food objects, only one element in each row of the matrix is 1, the rest m-1 elements are 0, the column corresponding to the element 1 is the category of the food object, and calculating a two-dimensional array which represents n 4 of the frame of each food object.
Preferably, the algorithm further includes multiplying the n-dimensional vector representing the volume proportion of each food object by the n × m two-dimensional array representing the category to which each food object belongs to obtain the volume proportion vector of each type of food, which is an m-dimensional vector, each dimension of the m-dimensional vector corresponds to one type of food, and the value in each dimension represents the volume proportion occupied by the corresponding type of food.
Preferably, the method further comprises calculating an m-dimensional vector representing the volume proportion of each category of food for each image, then adding all the m-dimensional vectors and dividing by the number of the images to obtain an average vector as the final volume proportion vector of each category of food.
Preferably, the method further comprises an adaptive training step comprising:
step one, an RPN network initializes network parameters, and calculates a class label and a region adjustment parameter of each detection region according to the forward propagation of input image information; updating relevant parameters of the RPN by using a random gradient descent algorithm or an Adam algorithm according to back propagation, wherein the relevant parameters comprise specific partial parameters of the RPN and parameters of a shared convolution part, and training until convergence;
step two, the Fast R-CNN initializes the convolutional layer parameters by using the shared convolutional layer parameters trained in the step one, trains the recommended region obtained in the step one as the recommended region in the neural network calculation process, and updates the network parameters including the shared convolutional network until the network converges;
thirdly, the RPN continues to train and update the unique partial parameters of the RPN by using the shared convolutional network obtained in the second step, and the parameters of the shared convolutional layer are not included;
step four, the Fast R-CNN network trains by using the recommended area obtained in the step three, and only the unique part of the Fast R-CNN network is updated, and the shared convolutional layer parameters are unchanged;
mapping the food object frame obtained in the fourth step to the last layer of feature mapping of the shared convolutional network by the Volume R-CNN network, and training and updating unique partial parameters until the network is converged;
the training operation of each step is to obtain a loss function of each part by carrying out forward calculation on input data through a network, then carry out backward propagation, and update network parameters by using a random gradient descent or Adam algorithm;
wherein the above-mentioned five-step training process can be executed circularly.
In another aspect, the invention provides an apparatus for performing fast R-CNN neural network operations, comprising
The information input part is used for acquiring a plurality of images of the same food in different angles, the total mass of the food, the density of different types of food in the food and the content of nutrient elements;
an information processing section for processing and calculating the image;
wherein the information processing section includes:
the storage unit is used for storing the image, the total mass, the density and the content of the nutrient elements;
a recommended region generation unit which determines a recommended region for sample detection using the RPN;
a category and frame prediction unit that predicts categories and frames of food objects in the recommended area using Fast R-CNN;
a food object Volume ratio prediction unit which predicts the Volume ratio of each food object in the image by using Volume R-CNN according to the predicted frame of the food object;
the food category Volume ratio prediction unit is used for calculating the Volume ratio of each category of food according to the food object category predicted by Fast R-CNN and the food object Volume ratio predicted by Volume R-CNN, and averaging the calculation results of different images;
the mass ratio prediction unit is used for multiplying the calculated volume ratio of different types of food and the density of different types of food respectively to obtain the mass ratio of different types of food;
the food quality prediction unit multiplies the mass proportion of different types of food by the total mass of the food to obtain the mass of the different types of food; and
the nutrition content prediction unit multiplies the quality of each food by the corresponding nutrition content to obtain the content of the food nutrient elements;
wherein the RPN, Fast R-CNN and Volume R-CNN share a convolutional layer.
Preferably, the information input section includes an image input device and a quality input device.
Preferably, the information processing part further comprises a data conversion unit for converting the q-dimensional nutrient content vector output by the processing unit into a corresponding output.
Preferably, the device further comprises an information output part for receiving the output information from the information processing part and displaying the information.
Preferably, the device further comprises a networking component for uploading the measurement data to the database in real time, and meanwhile, the latest parameter model can be updated from the cloud.
Preferably, the information processing unit is a neural network chip.
Compared with the prior art, the invention has the following beneficial effects:
1) compared with the prior invention, more complex and various foods can be measured.
2) The food identification is more accurate and rapid by adopting the artificial neural network technology and the chip.
3) The overlooking picture is obliquely arranged above, so that the shielding among different foods can be effectively avoided, and objects can be comprehensively known.
4) The food volume is calculated by adopting an artificial neural network technology and a chip, the calculation result is more accurate, and the prediction precision is improved along with the continuous increase of training data.
5) The artificial neural network chip has strong computing power, supports offline operation of the neural network, and can realize the detection and corresponding control of food nutrient components by offline of the user terminal/front end under the condition of no cloud server for assisting in computing. When the chips are networked and the cloud server assists in computing, the computing capacity of the chips is stronger.
6) The device is simple to operate, is more intelligent, and meets the daily life requirements of people.
7) Can provide more reasonable suggestions with the daily diet of people and improve the life quality of people.
Drawings
FIG. 1 is a block diagram of a neural network in accordance with an embodiment of the present invention;
FIG. 2 is a diagram illustrating the prediction of food object categories and borders in an embodiment of the present invention;
FIG. 3 is a network structure diagram of Volume R-CNN in an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
The invention discloses a method for executing fast R-CNN neural network operation, which mainly comprises the following steps: extracting and processing key features of the image, and identifying the type of food and the proportion of the volume of various foods in the image; calculating the weight proportion of each kind of food according to the density of each kind of food; and finally, the processing unit calculates the actual quality of each kind of food to be tested according to the weight proportion and the total weight of each kind of food, and the energy and the nutrient content of the food can be obtained by combining the element content of each kind of food.
The input image includes a plurality of top-view photographs from different angles for the same serving of food.
In the processing stage of a single image, the processor calculates the input image by using a modified fast Region conditional Neural Network (false R-CNN) Network, and marks each food class (class), each food border (bounding box) and the predicted volume ratio (volume) of each food in the image. Where volume represents a two-bit decimal number from 0 to 1.
The neural network structure in the invention is improved on the basis of the Faster R-CNN network, and a part for predicting the volume ratio of food is added. The network structure is shown in fig. 1:
the neural network can be divided into three parts: a Region pro-social Networks (RPN) network for predicting recommended regions; a Fast R-CNN network for predicting the class of objects in the image and fine-tuning the bounding box; volume RCNN network for predicting the Volume fraction occupied by individual food subjects in an image. The three networks share the convolutional layer to form a unified whole network.
The frame of a food means the smallest rectangular frame that can include an image of a certain food. Specifically, as shown in fig. 2, the oval and irregular figures in the figure represent different shapes of food, and the dashed border is the food border.
FIG. 3 is a Volume R-CNN network structure in which the convolutional layer (CNN) is a shared part. The frame required for this part of the operation is obtained by the second part (Fast R-CNN). The loss function Volume loss is in the form of
Figure BDA0001632538080000061
Wherein f isiFor the predicted volume fraction of each object, fi *And the actual value is the label data input in training.
The neural network uses Region pro-social Networks (RPN) to determine a target detection recommendation Region. The RPN method comprises the steps of firstly carrying out multilayer convolution operation on an input image, extracting feature maps (feature maps) of the image, then carrying out convolution operation on the feature maps by using a 3-by-3 sliding window, and then calculating region classification and region regression by using two branches to obtain a recommended region. The region classification is to judge the probability that the prediction region belongs to the foreground and the background; the parameters of the recommended region here are parameters with respect to the original input image.
In order to predict the food category of each recommended area and fine tune the food frame, the recommended area is mapped to the feature maps to obtain rois (region of interest), and then pooling operation is performed on each RoI to convert into feature maps with the same size. And then, two full-connection network operations can be respectively carried out on the pooled RoIs, the food category of each area is calculated, and accurate prediction is carried out on the frame.
Finally, mapping the frame parameters of the frame branch prediction to feature maps, and performing pooling operation on the corresponding mapping area to obtain areas with the same size. And carrying out multilayer full-connection operation on each target area, and calculating a volume intermediate variable of each food, wherein the intermediate variable is a positive number and does not represent the volume of the food. Each target region includes a food corresponding to a volume intermediate variable viThen converting the intermediate volume variable into the corresponding ratio fi(ii) a The concrete formula is as follows:
Figure BDA0001632538080000071
where i is 1,2 … … n, n being the number of food objects in the image. f. ofiThe number of (2) is the number n of food in the image, so the food volume ratio can be output as a vector containing n elements. The predicted output of the food categories is a two-dimensional matrix of n x m, wherein m is the number of the edible categories, each row vector of the matrix has only one element of 1, and the rest elements are 0; the number of columns in which element 1 is located corresponds to the category of food. The output of the predicted branch of the food frame is a two-dimensional matrix of n × 4, and the elements of each row respectively correspond to the center coordinates (x, y) and the height (h, w) of the frame. The operation of mapping the frame from the original image to the feature map is as follows: multiplying each coordinate data by the size of the feature map and the original imageThe ratio of.
Therefore, the output of the neural network in the prediction process is: an n-dimensional vector representing the ratio of the volume of each food, each element being located in the interval [0, 1] and the sum of the elements being 1; an n x m two-dimensional array representing the category to which each food belongs; a two-dimensional array of n x 4 representing each food border. Then, the n-dimensional vector representing the volume proportion of each food is multiplied by the two-dimensional array representing the category to which each food belongs to obtain the volume proportion vector of each food, which is an m-dimensional vector. Each dimension of the m-dimension vector corresponds to a class of food, and the numerical value of each dimension represents the volume proportion of the corresponding class of food.
The method of the invention also comprises the steps of calculating an m-dimensional vector representing the volume proportion of various types of food for each image in a group of images (photos of the same plate of food from different angles), then adding all the m-dimensional vectors, dividing the sum by the number of the images in each group, and calculating an average vector as a final object type volume proportion vector.
The method of the invention also comprises multiplying the calculated volume proportion vector of the food category by the density vector of the food category according to the bit to obtain the mass proportion vector of the food category, and then multiplying the mass proportion vector of the food category by the total mass of the input food to obtain the mass vector of the food category, wherein each bit represents the mass of the food of the corresponding category.
The method also comprises the step of multiplying the m-dimensional food category mass vector by the corresponding food category nutrient content matrix to obtain a food nutrient content vector, wherein each bit represents the content of a certain nutrient element in the food. Wherein the food category nutrient content matrix is a two-dimensional matrix of m x q, wherein q is the number of nutrient element types measurable by the system. Each row of the food category nutrient content matrix corresponds to a type of food, and each column corresponds to a nutrient element and represents the content of the nutrient element contained in the unit mass of each type of food. The finally obtained vector of the content of the nutrient elements of the food is a q-dimensional vector.
The method of the present invention also includes a method of adaptively training an information processing apparatus.
The input data is an image with mark data, the mark data corresponding to each image is the category (n-dimensional vector) of each food in the image, the frame information (n-4 two-dimensional matrix) of each food and the volume proportion (n-dimensional vector) occupied by each food; where n is the total number of food objects in the image. The processing unit preprocesses the input data information, for example, if the food category information is a character, it is converted into a number corresponding to the category.
The training process is divided into five steps, namely RPN network, Fast R-CNN network for food category detection and frame regression, and network cross training for predicting the food volume ratio.
Step one, an RPN network initializes network parameters, and calculates a class label and a region adjustment parameter of each detection region according to the forward propagation of input image information; and updating relevant parameters of the RPN by using a random gradient descent algorithm or an Adam algorithm according to back propagation, wherein the relevant parameters comprise specific partial parameters of the RPN and parameters of a shared convolution part. Training until convergence.
And secondly, initializing convolutional layer parameters by the Fast R-CNN by using the shared convolutional layer parameters trained in the step one, training the recommended area obtained in the step one as a recommended area in the network computing process, and updating network parameters including the shared convolutional network. Until the network converges.
And step three, the RPN continues to train and update the unique partial parameters of the RPN by using the shared convolutional network obtained in the step two, and the parameters of the shared convolutional layer are not included.
And step four, the Fast R-CNN network trains by using the recommended area obtained in the step three, and only the unique part of the Fast R-CNN network is updated, and the shared convolutional layer parameters are unchanged.
And step five, the Volume R-CNN network maps the food frame obtained in the step four to the last layer of feature mapping of the shared convolutional network, and trains and updates the unique partial parameters until the network converges.
The training operation of each step is to forward calculate the input data through the network to obtain the loss function of each part, then to reversely propagate, and to update the network parameters by using the random gradient descent or Adam algorithm.
The five-step training process described above may be performed in a loop.
The invention also provides a device for executing the Faster R-CNN neural network operation, which comprises an information input component, an information processing component and an information output component, as shown in FIG. 4.
The information input part comprises one or more cameras and is used for inputting a group of food overlooking images at different angles; a quality measuring device for measuring the quality of the food and transmitting it to the processing unit.
The information processing part comprises a storage unit and a data processing unit, wherein the storage unit is used for receiving and storing input data, instructions and output data, wherein the input data comprises a group of images and a positive number (food quality); the data processing unit firstly utilizes the neural network to extract and process key features contained in input data, a vector for representing the content of nutrient elements in food is generated for each image, and for the same group of images, the average value of the corresponding vectors of all the images is calculated to be used as the final nutrient content vector of the tested food.
The information processing component also comprises a data conversion module for converting the q-dimensional nutrient content vector output by the processing unit into corresponding output, wherein the output can be in the form of a table or a pie chart.
The information output section includes a liquid crystal display which receives output information from the information processing section and displays the information.
The information processing part controls an output result on the screen according to the predicted food nutrient content vector (q-dimensional vector). The data conversion processor converts the q-dimensional vector into corresponding storage information in the format of: name and content of nutrient elements. The names of the nutrient elements can be correspondingly obtained by each index subscript of the q-dimensional vector, and 0 element in the vector is ignored. In addition, the device can store or network to obtain the daily recommended nutrient element intake of people of all ages, and evaluate the test food, namely, the content of various nutrient elements in the food is too high or too low compared with the content required by each meal of the human body, and reasonable diet suggestions are given. And finally, outputting the content of the nutrient elements in the food and the diet suggestion on a display screen. The output form of the nutrient content can be a table and a pie chart.
The device can also include the networking part, can be connected to the internet, uploads the database with measured data in real time, enlarges the data bulk, also can follow the newest parameter model of high in the clouds update simultaneously, improves computational efficiency and precision.
The data processing unit adopts a neural network chip, is suitable for neural network calculation and has strong calculation capability.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A method for performing fast R-CNN neural network operations, comprising:
acquiring a plurality of images of the same portion of food at different angles;
determining a recommended region for sample detection by using the RPN;
predicting the category and the frame of the food object in the recommended area by using Fast R-CNN;
predicting the Volume proportion of each food object by using Volume R-CNN according to the predicted food frame;
calculating the Volume proportion of different types of food according to the food object type predicted by Fast R-CNN and the Volume proportion of the food object predicted by Volume R-CNN;
respectively multiplying the calculated volume ratio of each kind of food with the density of each kind of food to obtain the mass ratio of each kind of food;
multiplying the mass proportion of each kind of food by the total mass of the food to obtain the mass of each kind of food;
multiplying the quality of each food by the corresponding nutrient content to obtain the content of the nutrient elements of the food;
wherein the RPN, Fast R-CNN and Volume R-CNN share a convolutional layer;
the Volume R-CNN maps the predicted frame parameters to feature maps of pictures extracted by RPN, performs pooling operation on corresponding mapping areas to obtain sample areas with the same size, performs multilayer full-connection network operation on each sample area, and calculates a Volume intermediate variable v of each food object in the graphi,viIs a positive number; then the volume intermediate variable is converted into the corresponding volume proportion fiThe calculation formula is as follows:
Figure DEST_PATH_IMAGE002
wherein i =1,2 … … n, n being the number of food objects in the image;
the loss function Volume loss in the Volume R-CNN is in the form of
Figure DEST_PATH_IMAGE004
Wherein f isiFor the predicted volume fraction of each food object, fi *And the actual value is the label data input in training.
2. The method according to claim 1, wherein when determining the recommended region, the RPN performs a multi-layer convolution operation on the input picture to extract the feature mapping of the picture, performs a convolution operation on the feature mapping using a sliding window, and then calculates the region classification and the region regression using two branches of a classification loss function and a bounding box regression loss function to obtain the recommended region.
3. The method according to claim 1, wherein the Fast R-CNN maps recommended regions to the feature maps to obtain RoIs, performs pooling operation on each RoI to convert into feature maps of the same size, and then performs two full-connection network operations on the pooled RoIs respectively to calculate food object categories in each recommended region and accurately predict borders.
4. The method of claim 1, wherein the predicted bounding box parameters are mapped onto the feature map by: each coordinate data is multiplied by the ratio of the size of the feature map and the original image.
5. The method of claim 1, wherein predicting the output of the in-process neural network comprises: the method comprises the steps of calculating an n-dimensional vector which represents the Volume proportion of each food object in an image by Volume R-CNN, wherein each element is positioned in an interval (0, 1), the sum of all elements is 1, calculating an n m matrix which represents the category of each food object in the image by Fast R-CNN, m is the number of the categories of identifiable food objects, only one element in each row of the matrix is 1, the rest m-1 elements are 0, the column corresponding to the element 1 is the category of the food object, and calculating a two-dimensional array which represents n 4 of the frame of each food object.
6. The method of claim 1, wherein the method further comprises multiplying an n-dimensional vector representing the volume fraction of each food object by an n x m two-dimensional array representing the category to which each food object belongs to obtain a volume fraction vector of each category, wherein the volume fraction vector is an m-dimensional vector, each dimension of the m-dimensional vector corresponds to one category of food, and the value in each dimension represents the volume fraction of the corresponding category of food.
7. The method of claim 1, further comprising calculating an m-dimensional vector representing the volume fraction of each food category for each image, and then adding all m-dimensional vectors and dividing by the number of images to find the average vector as the final volume fraction vector for each food category.
8. The method of claim 1, wherein the method further comprises an adaptive training step comprising:
step one, an RPN network initializes network parameters, and calculates a class label and a region adjustment parameter of each detection region according to the forward propagation of input image information; updating relevant parameters of the RPN by using a random gradient descent algorithm or an Adam algorithm according to back propagation, wherein the relevant parameters comprise specific partial parameters of the RPN and parameters of a shared convolution part, and training until convergence;
step two, the Fast R-CNN initializes the convolutional layer parameters by using the shared convolutional layer parameters trained in the step one, trains the recommended region obtained in the step one as the recommended region in the neural network calculation process, and updates the network parameters including the shared convolutional network until the network converges;
thirdly, the RPN continues to train and update the unique partial parameters of the RPN by using the shared convolutional network obtained in the second step, and the parameters of the shared convolutional layer are not included;
step four, the Fast R-CNN network trains by using the recommended area obtained in the step three, and only the unique part of the Fast R-CNN network is updated, and the shared convolutional layer parameters are unchanged;
mapping the food object frame obtained in the fourth step to the last layer of feature mapping of the shared convolutional network by the Volume R-CNN network, and training and updating unique partial parameters until the network is converged;
the training operation of each step is to obtain a loss function of each part by carrying out forward calculation on input data through a network, then carry out backward propagation, and update network parameters by using a random gradient descent or Adam algorithm;
wherein the above-mentioned five-step training process can be executed circularly.
9. An apparatus for performing fast R-CNN neural network operations, comprising
The information input part is used for acquiring a plurality of images of the same food in different angles, the total mass of the food, the density of different types of food in the food and the content of nutrient elements;
an information processing section for processing and calculating the image;
wherein the information processing section includes:
the storage unit is used for storing the image, the total mass, the density and the content of the nutrient elements;
a recommended region generation unit which determines a recommended region for sample detection using the RPN;
a category and frame prediction unit that predicts categories and frames of food objects in the recommended area using Fast R-CNN;
a food object Volume ratio prediction unit which predicts the Volume ratio of each food object in the image by using Volume R-CNN according to the predicted frame of the food object;
the food category Volume ratio prediction unit is used for calculating the Volume ratio of each category of food according to the food object category predicted by Fast R-CNN and the food object Volume ratio predicted by Volume R-CNN, and averaging the calculation results of different images;
the mass ratio prediction unit is used for multiplying the calculated volume ratio of different types of food and the density of different types of food respectively to obtain the mass ratio of different types of food;
the food quality prediction unit multiplies the mass proportion of different types of food by the total mass of the food to obtain the mass of the different types of food; and
the nutrition content prediction unit multiplies the quality of each food by the corresponding nutrition content to obtain the content of the food nutrient elements;
wherein the RPN, Fast R-CNN and Volume R-CNN share a convolutional layer;
the Volume R-CNN maps the predicted frame parameters to feature maps of pictures extracted by RPN, performs pooling operation on corresponding mapping areas to obtain sample areas with the same size, performs multilayer full-connection network operation on each sample area, and calculates a Volume intermediate variable v of each food object in the graphi,viIs a positive number; then the volume intermediate variable is converted into the corresponding volume proportion fiThe calculation formula is as follows:
Figure 922970DEST_PATH_IMAGE002
wherein i =1,2 … … n, n being the number of food objects in the image;
in the Volume R-CNNThe loss function Volume loss is in the form of
Figure 715477DEST_PATH_IMAGE004
Wherein f isiFor the predicted volume fraction of each food object, fi *And the actual value is the label data input in training.
10. The apparatus of claim 9, wherein the information input section includes an image input device and a quality input device.
11. The apparatus of claim 9, wherein the information processing part further comprises a data conversion unit for converting the q-dimensional nutrient content vector output by the processing unit into a corresponding output.
12. The apparatus of claim 9, wherein the apparatus further comprises an information output section for receiving output information from the information processing section and displaying the information.
13. The apparatus of claim 9, wherein the apparatus further comprises a networking component for uploading the measurement data to a database in real time, while the latest parametric model is also updated from a cloud.
14. The apparatus according to claim 9, wherein the information processing section is a neural network chip.
CN201810352111.7A 2018-04-18 2018-04-18 Method and device for executing fast R-CNN neural network operation Active CN108597582B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810352111.7A CN108597582B (en) 2018-04-18 2018-04-18 Method and device for executing fast R-CNN neural network operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810352111.7A CN108597582B (en) 2018-04-18 2018-04-18 Method and device for executing fast R-CNN neural network operation

Publications (2)

Publication Number Publication Date
CN108597582A CN108597582A (en) 2018-09-28
CN108597582B true CN108597582B (en) 2021-02-12

Family

ID=63613739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810352111.7A Active CN108597582B (en) 2018-04-18 2018-04-18 Method and device for executing fast R-CNN neural network operation

Country Status (1)

Country Link
CN (1) CN108597582B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10303981B1 (en) * 2018-10-04 2019-05-28 StradVision, Inc. Learning method and testing method for R-CNN based object detector, and learning device and testing device using the same
CN109846303A (en) * 2018-11-30 2019-06-07 广州富港万嘉智能科技有限公司 Service plate surplus automatic testing method, system, electronic equipment and storage medium
CN111696151A (en) * 2019-03-15 2020-09-22 青岛海尔智能技术研发有限公司 Method and device for identifying volume of food material in oven and computer readable storage medium
CN110174399A (en) * 2019-04-10 2019-08-27 晋江双龙制罐有限公司 Solid content qualification detection method and its detection system in a kind of transparent can
CN110569759B (en) * 2019-08-26 2020-11-03 王睿琪 Method, system, server and front end for acquiring individual eating data
CN113539427A (en) * 2020-04-22 2021-10-22 深圳市前海高新国际医疗管理有限公司 Convolutional neural network-based nutrition intervention analysis system and analysis method
CN111564200A (en) * 2020-05-08 2020-08-21 深圳市万佳安人工智能数据技术有限公司 Old people diet feature extraction device and method based on rapid random gradient descent
CN114556444A (en) * 2020-09-11 2022-05-27 京东方科技集团股份有限公司 Training method of combined model and object information processing method, device and system
CN112257761A (en) * 2020-10-10 2021-01-22 天津大学 Method for analyzing food nutrient components in image based on machine learning
WO2022133985A1 (en) * 2020-12-25 2022-06-30 京东方科技集团股份有限公司 Food product recommendation method and apparatus, and storage medium and electronic device
CN113111925A (en) * 2021-03-29 2021-07-13 宁夏新大众机械有限公司 Feed qualification classification method based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103162627A (en) * 2013-03-28 2013-06-19 广西工学院鹿山学院 Method for estimating fruit size by citrus fruit peel mirror reflection
CN106709525A (en) * 2017-01-05 2017-05-24 北京大学 Method for measuring food nutritional component by means of camera

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103162627A (en) * 2013-03-28 2013-06-19 广西工学院鹿山学院 Method for estimating fruit size by citrus fruit peel mirror reflection
CN106709525A (en) * 2017-01-05 2017-05-24 北京大学 Method for measuring food nutritional component by means of camera

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
COMPUTER VISION-BASED FOOD CALORIE ESTIMATION:DATASET,METHOD,AND EXPERIMENT;Yanchao Liang等;《Computer Vision and Pattern Recognition》;20170524;参见第3节,图3 *
Estimating Food Calories for Multiple-dish Food Photos;Takumi Ege等;《2017 4th IAPR Asian Conference on Pattern Recognition (ACPR)》;20171130;参见第2节、图2 *
Estimating Fruit Volume from Digital Images;K A Forbes等;《1999 IEEE Africon. 5th Africon Conference in Africa (Cat. No.99CH36342)》;19991001;参见正文第109页,第3小节 *
Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks;Shao qing Ren等;《Advances in Neural Information Processing Systems 28 (NIPS 2015)》;20151212;参见第3节:Sharing Convolutional Features for Region Proposal and Object Detection *

Also Published As

Publication number Publication date
CN108597582A (en) 2018-09-28

Similar Documents

Publication Publication Date Title
CN108597582B (en) Method and device for executing fast R-CNN neural network operation
CN111353542B (en) Training method and device for image classification model, computer equipment and storage medium
WO2021000423A1 (en) Pig weight measurement method and apparatus
CN108921057B (en) Convolutional neural network-based prawn form measuring method, medium, terminal equipment and device
CN110532970B (en) Age and gender attribute analysis method, system, equipment and medium for 2D images of human faces
US20220351501A1 (en) Three-dimensional target detection and model training method and device, and storage medium
CN110490252B (en) Indoor people number detection method and system based on deep learning
CN108537329B (en) Method and device for performing operation by using Volume R-CNN neural network
WO2021242368A1 (en) Analysis and sorting in aquaculture
CN108766528B (en) Diet management system, construction method thereof and food material management method
CN110610149B (en) Information processing method and device and computer storage medium
CN115661943A (en) Fall detection method based on lightweight attitude assessment network
CN114331985A (en) Electronic component scratch defect detection method and device and computer equipment
CN114429459A (en) Training method of target detection model and corresponding detection method
CN115131783A (en) User diet nutrient component information autonomous perception method based on machine vision
Deshmukh et al. Caloriemeter: Food calorie estimation using machine learning
CN104657987B (en) Evaluation method and system based on the objective algorithm of PET/CT picture qualities
CN116863341B (en) Crop classification and identification method and system based on time sequence satellite remote sensing image
CN116662593B (en) FPGA-based full-pipeline medical hyperspectral image neural network classification method
CN114360690B (en) Method and system for managing diet nutrition of chronic disease patient
Patel et al. Deep Learning-Based Plant Organ Segmentation and Phenotyping of Sorghum Plants Using LiDAR Point Cloud
CN114359299A (en) Diet segmentation method and diet nutrition management method for chronic disease patients
CN113674205A (en) Method and system for measuring human body based on monocular depth camera
Liu et al. Research and application of dairy cows body condition score based on attention mechanism
CN109558791B (en) Bamboo shoot searching device and method based on image recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant