CN105512676A

CN105512676A - Food recognition method at intelligent terminal

Info

Publication number: CN105512676A
Application number: CN201510862931.7A
Authority: CN
Inventors: 郭礼华; 罗才; 廖启俊
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2015-11-30
Filing date: 2015-11-30
Publication date: 2016-04-20

Abstract

The invention discloses a food recognition method at an intelligent terminal, and the method is based on a convolution neural network method, and comprises a training process and an automatic classification process. The training process comprises the following steps: firstly building a training sample set; secondly constructing a network structure model according to a classic AlexNet network; thirdly carrying out network training through an open-source Caffe network framework, and continuously adjusting initial conditions so as to obtain an optimal network structure model and parameter configuration. The automatic classification process comprises the steps: enabling a to-be-recognized image photographed by a user to serve as an input image of the network; configuring the network according to parameters of a trained network structure in a computer, and classifying input images; and finally displaying the former 10 optimal classification results to the user. The method can achieve the automatic recognition of the type of food at the intelligent terminal, is high in speed, is small in storage capacity, is high in accuracy, and is good in user experience.

Description

Food recognition methods on a kind of intelligent terminal

Technical field

The present invention relates to artificial intelligence field, the food recognition methods particularly on a kind of intelligent terminal.

Background technology

Images Classification is a key areas of computer vision, has obtained significant progress and development at present in this field, and wherein automated graphics classification can provide convenient for our life, improves our quality of life.Multinomial patent of invention has been had in automated graphics classification, but these patents are all much adopt traditional feature extracting method, namely by extracting as SIFT feature, DenseSIFT feature, HOG feature etc., build dictionary by such as k-means clustering method again to encode to feature, carry out training by support vector machine (SVM) method again and obtain multiple SVM, finally utilize these SVM to carry out classification to test picture and obtain result, as the image classification method that patent CN104077597A extracts, use image Segmentation Technology to after each Object Segmentation in image, and extract the feature of this object in segmentation result, and carry out encoding and training, obtain the sorter of this object, then this sorter is utilized to carry out sort operation to test pattern.But this traditional extraction characteristics of image method be decide the final recognition accuracy of method by feature, but good feature needs good priori and design experiences, be difficult to design the optimum feature with discrimination in real system exploitation.

Convolutional neural networks is developed recently, and has been widely used in a kind of degree of depth learning method of many Images Classification problems, because convolutional neural networks can avoid characteristic extraction procedure complicated in traditional recognition method.In patent CN103544506A, propose a kind of image classification method based on convolutional neural networks and device, the method, by receiving the image pattern of multiple classifications of input, calculates the neural network weight that the image of each classification is corresponding; Adopt the corresponding neural network weight of hierarchy to multiple classification image to distribute, every one deck forms corresponding learning database; The category of test image sample data of input is processed and obtains corresponding one-dimensional characteristic and describe and carry out feedforward with the neural network weight in described learning database and learn, thus judge described category of test whether in learnt classification image.

Along with extensively popularizing of intelligent terminal, it has become the important tool that people obtain data.In addition, increasing people starts to pay close attention to own health problem, people wish there is a system software quick and precisely recording food, classic method lacks the food recognizer for this specific demand, thus needs to design a kind of food image categorizing system that may operate on intelligent terminal.Due to the restriction of intelligent terminal hardware condition, so the application on intelligent terminal must have real-time, need high accuracy of identification simultaneously.

Summary of the invention

In order to overcome the above-mentioned shortcoming of prior art with not enough, the object of the present invention is to provide the food recognition methods on a kind of intelligent terminal, the task of food identification can be completed on mobile terminals rapidly.

Object of the present invention is achieved through the following technical solutions:

A food recognition methods on intelligent terminal, comprises the following steps:

S1 carries out training process on computers:

S1.1 gathers the food picture for training, and obtains training picture; According to the classification of food, add class label to respectively the training picture of each classification;

S1.2 builds convolutional neural networks model:

Described convolutional neural networks model comprises eight stages, is respectively convolutional layer first stage, convolutional layer subordinate phase, convolutional layer phase III, convolutional layer fourth stage, convolutional layer five-stage, full articulamentum the 6th stage, full articulamentum the 7th stage, full articulamentum the 8th stage;

S1.3 training convolutional neural networks model: adopt the Caffe network frame of increasing income to carry out training convolutional neural networks model;

Training data is sent into Caffe network in step S1.3 in batches and is trained by S1.4, and can obtain current training sample by training and concentrate best classify accuracy, network parameter is now the network parameter of final optimum network structure model;

The configuration file of the network parameter of the final optimum network structure model that S1.4 obtains by S1.5 puts into the installation kit of intelligent terminal software;

Sample picture chosen by the food of S1.6 for each classification in the training picture of each classification, and puts into the installation kit of intelligent terminal software;

The automatic classification process of S2 on intelligent terminal:

S2.2 is using picture to be identified as input picture, and the convolutional neural networks of the network structure model parameter configuration test obtained according to S1.4, classifies to input picture.

Also following steps are carried out after step S2.2:

S2.3 carries out ascending sort to the classification results of step S2.2, the classification results of several optimum before showing on a user interface, and the result of each classification uses the sample picture chosen in S1.6 to show.

Convolutional layer first stage, convolutional layer subordinate phase, convolutional layer phase III, convolutional layer fourth stage, convolutional layer five-stage include the convolutional layer, active coating, abstraction, layer and the normalization layer that contain and connect successively; Described convolutional layer is all adopt Gaussian filter, and active coating uses ReLU function, and abstraction, layer adopts maximal value abstracting method, and normalization layer uses LRN method.

Described convolutional layer is Gaussian convolution layer, for realizing the convolution operation of received image signal and Gaussian filter; The function of Gaussian filter is defined as follows:

G (x, y, σ) = \frac{1}{2 {πσ}^{2}} e^{- (x^{2} + y^{2}) / 2 σ^{2}}

X, y are the coordinate of Gaussian filter function in two-dimensional image plane, and σ is that Gaussian filter functional standard is poor, control the smoothness of Gaussian filter, be worth larger, its Gaussian filter function is more level and smooth, and G (x, y, σ) then Gaussian filter function respective value.

Described active coating adopts the output of ReLU number to convolutional layer to process, and its function expression is:

f(IN)＝max(0,IN)

Wherein IN is input signal, and f (IN) is output signal.

Described abstraction, layer adopts the output of maximal value abstracting method to the convolutional layer after active coating process to sample, and the mode of extraction is not overlapping extraction, and extracting unit size is P*P, and the interval s of adjacent extraction unit is taken as 2.

Described normalization layer adopts the method for LRN the output of abstraction, layer to be carried out to the normalization of regional area, and its function expression adopted is:

b_{x, y}^{i} = \frac{u_{x, y}^{i}}{{(k + α Σ_{j = \max (0, i - n / 2)}^{\min (N - 1, i + n / 2} {(u_{x, y}^{j})}^{2})}^{β}}

Wherein, N is the sum of feature layer in this layer, and n is the feature layer number of neighbour, and k, α, β are constant, represent the activation unit at i-th feature layer place of (x, y) position; for right the activation unit exported after normalization; I is positive integer.

Described full articulamentum the 6th stage and full articulamentum the 7th stage all comprise full articulamentum, active coating and abandon layer; Described full articulamentum the 8th stage comprises full articulamentum, output layer and loss layer.

Describedly abandon layer in the training process, network concealed node layer, in the mode of random chance, selects fractional weight to calculate, and the weighted value of unchecked hidden layer node not participation network upgrade, but its weighted value retains;

Described full articulamentum input value is launched into an one-dimensional vector X _c, then proceed as follows:

Y _c＝w*X _c+b

Wherein, X _cinput vector, Y _cbe output vector, w is weighted value, and b is bias;

Described output layer selects accuracy layer prediction accuracy; Described accuracy layer obtains final degree of accuracy according to contrasting the prediction class label of output vector and its true class label and add up;

The described loss layer Softmax function error of calculation, and by error back propagation to seek the optimum solution of each network layer weight and bias.

Training convolutional neural networks model described in step S1.3, is specially:

A, training set changed into the document format data that Caffe specifies;

The average of b, calculation training picture

C, define grid parameter and training parameter;

D, training network: iteration is carried out to the network model that S1.2 builds by adopting the method for stochastic gradient descent, every iteration once just detects a subgradient, with the optimum solution of the weight and bias of seeking each network layer, finally obtain the optimum convolutional neural networks structural model of this training;

E, test network: the network structure model utilizing Step d to train is tested the food picture in test set, obtain the accuracy of classifying.

Compared with prior art, the present invention has the following advantages and beneficial effect:

(1) the present invention adopts the method training data based on convolutional neural networks, overcomes the feature extraction designs that tradition is complicated, and has automated characterization learning ability.

(2) the present invention adopts the Caffe network frame of increasing income, and this framework has the advantages such as modularization, speed is fast, and person easy to use is optimized network structure model.

(3) training process of the present invention is owing to carrying out off-line training on the server.This off-line training pattern, can utilize powerful server resource to save the training time.And the network profile after training is deposited on intelligent terminal, intelligent terminal recognizer is called network profile and is realized food identification fast.

Accompanying drawing explanation

Fig. 1 be embodiments of the invention intelligent terminal on food recognition methods carry out the process flow diagram of training process on computers.

Fig. 2 is the overall construction drawing of the convolutional neural networks of embodiments of the invention.

Fig. 3 is the process flow diagram of the identifying of food recognition methods on intelligent terminal on the intelligent terminal of embodiments of the invention.

Embodiment

Below in conjunction with embodiment, the present invention is described in further detail, but embodiments of the present invention are not limited thereto.

Embodiment

Food recognition methods on a kind of intelligent terminal of the present embodiment, comprises the following steps:

S1 carries out training process on computers, as shown in Figure 1:

The present embodiment downloads Food-101 food database and the class label (link http://www.vision.ee.ethz.ch/datasets/food-101/) in Eidgenoess Tech Hochschule's computer vision laboratory from network.Food-101 food database downloads most popular front 101 kinds of foods from website foodspotting.com, and often kind of food comprises 1000 pictures, then picture is divided into training set and test set.The packet mode that the present embodiment provides according to Eidgenoess Tech Hochschule's computer vision laboratory, every group food uses 750 to train picture and 250 test pictures, and final training set has 75750 pictures, and test set has 25250 pictures;

S1.2 builds convolutional neural networks model, as shown in Figure 2:

Described convolutional neural networks model comprises 8 stages, is respectively convolutional layer the 1st stage, convolutional layer the 2nd stage, convolutional layer the 3rd stage, convolutional layer the 4th stage, convolutional layer the 5th stage, full articulamentum the 6th stage, full articulamentum the 7th stage, full articulamentum the 8th stage;

Described convolutional layer 1 stage comprises the 1st convolutional layer, the 1st active coating, the 1st abstraction, layer and the 1st normalization layer that connect successively; Described convolutional layer the 2nd stage comprises the 2nd convolutional layer, the 2nd active coating, the 2nd abstraction, layer and the 2nd normalization layer that connect successively; Described convolutional layer the 3rd stage comprises the 3rd convolutional layer, the 3rd active coating, the 3rd abstraction, layer and the 3rd normalization layer that connect successively; Described convolutional layer the 4th stage comprises the 4th convolutional layer, the 4th active coating, the 4th abstraction, layer and the 4th normalization layer that connect successively; Described convolutional layer the 5th stage is containing the 5th convolutional layer, the 5th active coating, the 5th abstraction, layer and the 5th normalization layer that connect successively.Wherein the 1st convolutional layer is Gaussian convolution layer, for realizing the convolution operation of received image signal and Gaussian filter; The function of Gaussian filter is defined as follows:

G (x, y, σ) = \frac{1}{2 {πσ}^{2}} e^{- (x^{2} + y^{2}) / 2 σ^{2}}

X, y are the coordinate of Gaussian filter function in two-dimensional image plane, and σ is that Gaussian filter functional standard is poor, control the smoothness of Gaussian filter, be worth larger, its Gaussian filter function is more level and smooth, and G (x, y, σ) then Gaussian filter function respective value.Gaussian filter size in the present embodiment is set as 11*11, corresponding convolution kernel is 11*11 size, standard deviation sigma is 1, and always have 96 wave filters, and the step-length between each wave filter is 4 pixel sizes, the bias (bias) of this layer is set to 0 simultaneously, and therefore the output of the 1st convolutional layer has 96 feature layer.Because input size of data is 227*227*3, then the output through the 1st convolutional layer is 55*55*96.

Described 1st active coating adopts the output of ReLU number to convolutional layer to process, and its function expression is:

f(IN)＝max(0,IN)

Wherein IN is input signal, and f (IN) is output signal.Adopt this function that network can be made to have certain openness, can training speed be accelerated simultaneously.Concrete list of references (Rectifiedlinearunitsimproverestrictedboltzmannmachines.V .NairandG.E.Hinton.2010.);

Described 1st abstraction, layer adopts the output of maximal value abstracting method to the convolutional layer after active coating process to sample, extraction mode is not overlapping extraction, extracting unit size is P*P (P>1, in the present invention, the P of abstraction, layer is 3), the interval s of adjacent extraction unit is taken as 2.To being input as 55*55*96 after the maximal value extraction of s=2, the output of the 1st abstraction, layer is 27*27*96, is still 96 feature layer.

Described 1st normalization layer adopts the method for LRN the output of abstraction, layer to be carried out to the normalization of regional area, and its function expression adopted is:

b_{x, y}^{i} = \frac{u_{x, y}^{i}}{{(k + α Σ_{j = \max (0, i - n / 2)}^{\min (N - 1, i + n / 2} {(u_{x, y}^{j})}^{2})}^{β}}

Wherein, N is the sum of feature layer in this layer, and n is the feature layer number of neighbour, and k, α, β are constant, represent the activation unit at i-th feature layer place of (x, y) position; for right the activation unit exported after normalization; I is positive integer.(in normalization layer, all adopt LRN method in the present embodiment, and set k=2, n=5, α=0.0001, β=0.75).Thus be have 27*27*96 output in the convolutional layer first stage.

In the present embodiment, convolutional layer 2 employs the Gaussian filter of 256 5*5 sizes, be provided with every input on one side at this layer and all can increase by 2 pixel values, i.e. pad=2, bias value is 1 simultaneously, step value is set to 1, so the output of convolutional layer 2 is 27*27*256, altogether has 256 feature layer; The output of final convolutional layer subordinate phase is 13*13*256.

In the present embodiment, the 3rd convolutional layer employs the Gaussian filter of 384 3*3 sizes, pad=1, and step-length is also 1, bias=0, altogether has 384 feature layer; The output of final convolutional layer phase III is 13*13*384.

In the present embodiment, the optimum configurations of the 4th convolutional layer and the optimum configurations of the 3rd convolutional layer are identical.So be 13*13*384 in the output that convolutional layer the 4th stage is final.

In the present embodiment, the 5th convolutional layer employs the Gaussian filter of 256 3*3 sizes, pad=1, and step-length is 1, bias=1, always has 256 feature layer; Therefore the output that lamination the 4th stage is final is 6*6*256.

Articulamentum the 6th stage comprises the 6th full articulamentum, the 6th active coating and the 6th loss (Dropout) layer of connecting successively.

Input value is launched into an one-dimensional vector X by articulamentum 6 _c, then proceed as follows:

Y _c＝w*X _c+b

Wherein, X _cinput vector, Y _cbe output vector, w is weighted value, and b is bias.Full articulamentum 6 employs 4096 Gaussian filters in the present embodiment, and exporting so final is one 4096 vector tieed up.

Dropout layer the weight of subnetwork hidden layer node can be used to calculate according to certain probability when model training, in addition the weight of a part of network concealed node layer be then retain but and be used for calculate, effectively can prevent over-fitting like this, arranging probable value in the present embodiment is 0.5, concrete list of references (Improvingneuralnetworksbypreventingco-adaptationoffeatur edetectors.G.E.Hinton, N.Srivastava, A.Krizhevsky, I.Sutskever, andR.R.Salakhutdinov.2012).

Full articulamentum the 7th stage contains the 7th full articulamentum, the 7th active coating and the 7th loss layer that connect successively; In the present embodiment, each layer parameter in this stage arranges and arranges the same with each layer parameter in full articulamentum 6 stage.Thus the final output in this stage remains the vector of one 4096 dimension.

Full articulamentum the 8th stage comprises full articulamentum 8, output layer and loss (Loss) layer.The output of this layer corresponds to the class label of food picture; In the present embodiment, output layer selects accuracy layer to carry out prediction accuracy, and Loss layer selects the Softmax function error of calculation, and by error back propagation to seek the optimum solution of each network layer weight and bias.

S1.3 training convolutional neural networks model: adopt the Caffe network frame of increasing income to carry out training convolutional neural networks model, be specially:

A, training set changed into the document format data that Caffe specifies;

The average of b, calculation training picture

C, define grid parameter and training parameter;

E, test network: the network structure model utilizing Step d to train is tested the food picture in test set, obtain the accuracy of classifying; In the present embodiment, the initial learn rate of Setup Experiments is 0.01,0.008,0.006 respectively, and maximum training iterations is then 450,000 times respectively, 400,000 times and 350,000 times, final select initial learn rate be 0.008 and maximum frequency of training be 450,000 times.

3 sample pictures chosen by the food of S1.6 for each classification in the training picture of each classification, and put into the installation kit of intelligent terminal software;

The automatic classification process of S2 on intelligent terminal, as shown in Figure 3:

S2.2 is using picture to be identified as input picture, and the convolutional neural networks of the network structure model parameter configuration test obtained according to S1.4, classifies to input picture; And ascending sort is carried out to classification results;

The classification results that S2.3 obtains according to S2.2, shows front 10 optimum classification results on a user interface, and the result of each classification uses 3 the sample pictures chosen in S1.6 to show.

Above-described embodiment is the present invention's preferably embodiment; but embodiments of the present invention are not limited by the examples; change, the modification done under other any does not deviate from Spirit Essence of the present invention and principle, substitute, combine, simplify; all should be the substitute mode of equivalence, be included within protection scope of the present invention.

Claims

1. the food recognition methods on intelligent terminal, is characterized in that, comprise the following steps:

S1 carries out training process on computers:

S1.2 builds convolutional neural networks model:

The automatic classification process of S2 on intelligent terminal:

2. the food recognition methods on intelligent terminal according to claim 1, is characterized in that, also carries out following steps after step S2.2:

3. the food recognition methods on intelligent terminal according to claim 1, it is characterized in that, convolutional layer first stage, convolutional layer subordinate phase, convolutional layer phase III, convolutional layer fourth stage, convolutional layer five-stage include the convolutional layer, active coating, abstraction, layer and the normalization layer that contain and connect successively; Described convolutional layer is all adopt Gaussian filter, and active coating uses ReLU function, and abstraction, layer adopts maximal value abstracting method, and normalization layer uses LRN method.

4. the food recognition methods on described intelligent terminal according to claim 3, it is characterized in that, described convolutional layer is Gaussian convolution layer, for realizing the convolution operation of received image signal and Gaussian filter; The function of Gaussian filter is defined as follows:

G (x, y, σ) = \frac{1}{2 {πσ}^{2}} e^{- (x^{2} + y^{2}) / 2 σ^{2}}

5. the food recognition methods on described intelligent terminal according to claim 3, is characterized in that, described active coating adopts the output of ReLU number to convolutional layer to process, and its function expression is:

f(IN)＝max(0,IN)

Wherein IN is input signal, and f (IN) is output signal.

6. the food recognition methods on described intelligent terminal according to claim 3, it is characterized in that, described abstraction, layer adopts the output of maximal value abstracting method to the convolutional layer after active coating process to sample, extraction mode is not overlapping extraction, extracting unit size is P*P, and the interval s of adjacent extraction unit is taken as 2.

7. the food recognition methods on described intelligent terminal according to claim 3, is characterized in that, described normalization layer adopts the method for LRN the output of abstraction, layer to be carried out to the normalization of regional area, and its function expression adopted is:

b_{x, y}^{i} = \frac{u_{x, y}^{i}}{{(k + α Σ_{j = m a x (0, i - n / 2)}^{\min (N - 1, i + n / 2} {(u_{x, y}^{j})}^{2})}^{β}}

8. the food recognition methods on described intelligent terminal according to claim 1, it is characterized in that, described full articulamentum the 6th stage and full articulamentum the 7th stage all comprise full articulamentum, active coating and abandon layer; Described full articulamentum the 8th stage comprises full articulamentum, output layer and loss layer.

9. the food recognition methods on intelligent terminal according to claim 8, it is characterized in that, describedly abandon layer in the training process, network concealed node layer is in the mode of random chance, fractional weight is selected to calculate, and the not participation network renewal of the weighted value of unchecked hidden layer node, but its weighted value retains;

Y _c＝w*X _c+b

10. the food recognition methods on described intelligent terminal according to claim 1, it is characterized in that, training convolutional neural networks model described in step S1.3, is specially:

A, training set changed into the document format data that Caffe specifies;

The average of b, calculation training picture

C, define grid parameter and training parameter;