CN113887410A

CN113887410A - Deep learning-based multi-category food material identification system and method

Info

Publication number: CN113887410A
Application number: CN202111158365.3A
Authority: CN
Inventors: 陈石; 李文钧; 岳克强; 李瑞雪; 李懿霖; 王超; 李宇航; 张汝林; 沈皓哲
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2022-01-04

Abstract

The invention discloses a deep learning-based multi-category food material identification system and method, wherein the system comprises the following steps: the system comprises an initialization layer, a dense connection layer, a transition layer, an attention module and a classification layer; the method comprises the following steps: s1, carrying out picture sampling on food materials, S2 numbering the food material pictures, and constructing a multi-category food material data set; s3, dividing the food material data set into a training set, a verification set and a test set according to the proportion; s4, enriching samples through preprocessing and picture enhancement; s5, constructing a multi-category food material identification system based on deep learning; s6, adjusting the model by modifying the type labels and loading weight training according to the food materials in different units and different time; and S7, obtaining the food material pictures, transmitting the food material pictures to a local and/or cloud server, analyzing the food material pictures by using the system constructed in S5, and obtaining and displaying identification results.

Description

Deep learning-based multi-category food material identification system and method

Technical Field

The invention relates to the technical field of image recognition, in particular to a deep learning-based multi-category food material recognition system and method.

Background

In recent years, with the rapid development of artificial intelligence and deep learning, many links in the catering industry are upgraded more or less, and automation permeates into many links. However, in the detection process before the food materials are processed, the automatic classification and identification technology for various food material categories is not mature, for example, in many restaurants and enterprise dining halls, quality inspection of initial food materials provided by suppliers is finished by manual sorting judgment, which not only increases the labor cost, but also can cause problems of management disadvantages and misuse of rights of related personnel. With the wide application of computer vision technology to the classification tasks of various images, it becomes possible to complete the classification and identification of various food materials by using the related technology.

At present, food material identification and classification research and application are carried out, but the food material identification and classification research and application are directed to some common vegetables and fruits, and the types of the common vegetables and fruits are few in a transaction scene. Research on the problem of detecting and identifying multiple categories (more than 600 categories) of food materials, which is solved by the project, is still in a stage of not starting, and even no available data set exists. Meanwhile, when the number of categories reaches a certain number of levels, the difficulty of classification and identification is greatly increased, and ideal accuracy is difficult to achieve by directly using the existing algorithm.

In view of the above, it is necessary to design and build a deep learning-based multi-category food material identification method, which is deployed on a kitchen device and used for quickly and accurately identifying food materials conveyed by a supplier.

Disclosure of Invention

In order to solve the defects of the prior art and realize the purpose of identifying the multi-class food materials, the invention adopts the following technical scheme:

a deep learning-based multi-category food material recognition system comprises: the convolutional neural network is simple in structure, can comprehensively utilize shallow layer features with low complexity to amplify information quantity to obtain a decision function with good generalization performance, improves the overfitting resistance of the model, and adds an attention module after a convolution layer of last feature extraction to ensure that a system can learn various features in an input picture with great emphasis, and considers the overall features and local features to realize more effective feature extraction so as to improve the identification accuracy;

the initialization layer is used for initializing an input picture to obtain an initialized feature map;

the dense connection layer uses a plurality of convolution modules with the same feature diagram size to carry out deep feature extraction on an input feature diagram, all the convolution modules in the front are connected with the convolution modules in the back, features of different layers are reserved and are propagated backwards together, each convolution module comprises a batch normalization layer, a linear rectification layer and a convolution layer, data are normalized to meet standard normal distribution, the calculation time of the gradient of the whole training set is reduced, the linear rectification layer adopts a ReLU activation function to replace activation functions such as Sigmoid and Tanh, the processes of gradient descent and error back propagation are more efficient, and the problem of gradient disappearance is well avoided. When the input is a negative value, the neurons cannot be activated, namely only part of the neurons can be activated at the same time, so that the neurons in the neural network have sparse activation, and the operation efficiency is accelerated;

the transition layer is used for carrying out dimensionality reduction treatment on the characteristic graph obtained by the previous dense connection layer;

the classification layer classifies the features and maps the features to the space where the classification label is located;

the attention module strengthens important features and weakens unnecessary features from two dimensions of a channel and a space so as to improve the feature extraction efficiency of the network.

Further, the attention module extracts a feature formula as follows:

where F is a C-dimensional input feature of size H W, M_CFor a C-dimensional channel attention map of size 1 × 1, M_SFor an S-dimensional spatial attention map of size H × W, H, W denotes height and width, respectively, and C and S denote dimensions;

the channel attention calculation formula is as follows:

M_C(F)＝h(MLP(AvgPool(F))+MLP(MaxPool(F)))

the formula for calculating spatial attention is as follows:

M_S(F)＝h(f^7×7([AvgPool(F)+MaxPool(F)]))

where F is the input, AvgPool represents the average pooling, MaxPool represents the maximum pooling, h represents the activation function, MLP represents the multi-layer perceptron, F^7×7Representing a convolution operation with a convolution kernel of 7 x 7.

Go toStep by step, the attention mechanism adopts a soft attention mechanism, for N input information, the selection standard of the input information is determined by calculating the weighted average value of all the input information, and the attention weight alpha is used under the mechanism_iFor probability, the probability expression formula for selecting the ith information is as follows:

where p denotes a get probability operation, x_iI-th information representing input, q information requiring query, z attention variable, s (x)_iQ) denotes the attention scoring function, the probability results are normalized by the softmax function,

the attention weight alpha is obtained by calculation_iBefore the important information extraction is completed, the weight and the feature vector are fused, and the operation process is as follows:

the calculated value is the total amount of attention to be allocated for the batch.

Further, in the batch normalization layer, in the forward propagation process, each node has m outputs, and the batch normalization outputs the m outputs of each node in the layer, and the formula is as follows:

wherein

Indicates the value, x, of the ith input node of the layer when the b sample of the batch is input_iIs composed of

The row vector is formed, the length of which is the value m of the batch size, mu and sigma respectively represent the mean and standard deviation, epsilon is the introduced minimums, gamma and beta are the scale and shift parameters of the row respectively,

is the result after normalization.

Further, the transition layer comprises a convolution operation and an average pooling operation, the convolution operation is used for reducing the number of the feature maps, and the average pooling operation is used for reducing the size of the feature maps and reducing the calculation load.

Further, the classification layer performs global average pooling operation on the feature map obtained by the last dense connection layer, performs dimensionality reduction on data to fuse the learned features, and then normalizes the result obtained by global average pooling by using a Softmax function to obtain a probability vector of the input picture belonging to a certain class.

Furthermore, any two layers of the system are directly connected, the output information of all layers before each layer of the system can be collected to the current layer through merging operation, and meanwhile, the characteristic diagram information obtained by the current layer learning can also be transmitted to all layers after:

X_l＝H_l([X₀,X₁,...,X_l-1])

wherein l represents a layer, X_lDenotes the output of the l-th layer, H_lRepresenting a non-linear transformation.

A multi-category food material identification method based on deep learning comprises the following steps:

s1, the food material is subjected to picture sampling,

s2, dividing all the collected food material pictures according to categories, numbering the food material pictures by using a unified naming standard, and constructing a multi-category food material data set;

s3, dividing the food material data set into a training set, a verification set and a test set according to the proportion;

s4, enriching samples through preprocessing and picture enhancement;

s5, constructing a multi-category food material recognition system based on deep learning, and obtaining a trained recognition system and weight thereof by inputting a food material data set;

s6, according to food materials of different units and different time, the model is adjusted by modifying the mode including class labels and loading weight training so as to achieve higher recognition accuracy;

and S7, obtaining the food material pictures, transmitting the food material pictures to a local and/or cloud server, analyzing the food material pictures by using the system constructed in S5, and obtaining and displaying identification results.

Further, the sampling of S1, including application scene equipment collection and other scene collections, the conveyer belt that the application scene collection will eat the material and place under the camera is shot to simulate practical application scene, the material is put into different angles, relative position relation, and the quantity of eating the material slowly increases from few to many, and other scene collections include terminal real shooting and network picture screening.

In the S2, food materials are classified in a multi-level mode by using a three-digit letter plus number naming mode, a first level adopts two-digit letters and comprises four categories of non-staple food, meat, poultry, vegetables and fruits and aquatic products, a second level adds one-digit letters to further subdivide the four categories, a third level adds number numbers to the four categories, the number is accurately assigned to a specific category, all food material pictures are named and numbered, and the construction of a food material data set is completed.

In the step S4, the picture is preprocessed to 600 × 600, the size is selected by considering the result of accuracy and calculation cost comprehensively, multiple tests show that the accuracy and the calculation amount are well balanced when the resolution is selected, one or more operations of center cropping, turning, rotating, brightness adjustment, contrast adjustment and saturation adjustment are randomly selected, the operation sequence is randomly arranged, and the picture is enhanced, because the effective food material region in the photographed food material picture is usually located in the middle block, in order to avoid background interference as much as possible, the network enhances the learning of the main body characteristics, the center crop is performed by using the CenterCrop method in Pytorch, and the image of the middle region 450 × 450 is obtained, so that the sample capacity and diversity are increased.

Further, by using a strategy of adjustment while using, the system is continuously updated, a category label and a food material picture database are updated, food material categories which are not in the scene are removed, meanwhile, pictures shot in the using process are added into a food material data set for training, the training process is not started from the beginning, based on the initial model parameters obtained before, parameters of the corresponding layer can be fixed by setting the attribute value of the frezen to True, the parameters do not participate in the training, and only the parameters of the last layers of networks and the classifier are trained. The updating mode can not only prolong the training results before, but also learn the characteristics of the newly added pictures, and simultaneously has very high efficiency.

The invention has the advantages and beneficial effects that:

the method adopts a classification algorithm based on deep learning, utilizes a designed and built convolutional neural network to automatically extract the characteristics of the picture, avoids extracting complex physical characteristics by various manual algorithms, and has higher efficiency and accuracy; the method comprises the steps of collecting food material picture data from various sources, various categories and various different conditions, constructing a complete data set with more than 600 categories, covering all categories which can appear in most of domestic enterprise canteens, and obtaining a classification algorithm trained by using the data set, wherein the classification algorithm can guarantee accuracy and also give consideration to universality, so that the method has the potential of large-scale popularization; designing a fine-grained classification algorithm introducing an attention mechanism, building a classification network model corresponding to the fine-grained classification algorithm, improving a network structure while keeping the advantage of high feature utilization rate of a Densenet network model, further simplifying the network structure, achieving the optimal balance point of accuracy and computation through multiple tests, accelerating the identification speed of actual use, weakening the requirement on hardware and effectively reducing the cost; the method is not a method which is put into use, namely is fixed and unchanged, but can be continuously updated according to actual conditions to ensure that the accuracy is always at a high level, the method comprises the steps of adding pictures taken in the using process into training, updating a gallery, labels and the like, each updating training is based on the previous training weight, and the training speed is high.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Fig. 2 is a schematic structural diagram of a densenert 121 network in the present invention.

FIG. 3 is a schematic diagram of the structure of the present invention with an attention module

FIG. 4 is a schematic diagram of an attention module of the present invention.

Detailed Description

The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.

As shown in fig. 1, a food material identification method based on deep learning includes the following steps:

1. aiming at different application scenes, shooting and sampling food materials entered by a supplier by using equipment cameras in different catering industry kitchens; food material pictures of the categories are collected from various other channels, so that the sample diversity is increased;

2. dividing all the collected food material pictures according to categories, numbering the food material pictures by using a unified naming standard, and constructing a multi-category food material data set;

3. dividing data into a training set, a verification set and a test set according to a proper proportion;

4. various preprocessing and image enhancement means are comprehensively used, so that samples are further enriched;

5. designing and constructing a convolutional neural network which is improved based on the Densenet121 and added with an attention module, inputting a picture, and training to obtain a classification model and weight;

6. aiming at specific food material conditions of different units and different time, the model can be adjusted by modifying class labels, loading weight training and the like, so that higher identification accuracy is achieved;

7. in actual use, after the food materials are shot by the camera, the food materials are transmitted to a local or cloud server, the pictures are analyzed by using the model obtained by training, a recognition result is obtained, and the recognition result is fed back to the display screen;

in this embodiment, the collection of food material picture data is mainly divided into application scene device collection and other scene collection. The application scene acquisition is specifically operated in such a way that the food materials entering a supplier are photographed and sampled by using equipment cameras in different catering industry kitchens, and the food materials are placed on a conveyor belt below the cameras for photographing so as to simulate an actual application scene. When the food materials are shot specifically, the food materials are placed in baskets with different colors, the food materials are placed in different angles and relative position relations, the number of the food materials is gradually increased from small to large, bagging shooting is added for some food materials with bags, and food material pictures under various conditions are collected and stored as much as possible. The specific collection standard is that the number of effective pictures of each food material reaches 500. Other channel collection mainly comprises two modes of mobile phone real shooting and network picture screening. Shooting food materials of required categories in the market by using a mobile phone, wherein the specific requirement is that only required main categories appear in each picture as far as possible, and different angles and distances are properly selected during shooting; the specific operation standard for obtaining pictures from the network public picture library is to select food material pictures with high resolution and quality, and requires a clear and single main body.

In this embodiment, a set of universal naming standards is adopted for naming the food material pictures, and the specific operation is as follows. And (3) carrying out multi-level classification on the food materials by using a three-digit English letter plus number naming mode. The first level comprises four major categories of non-staple food (NF), meat and poultry (NR), vegetables and fruits (NS) and aquatic products (NX); the second level further subdivides the four major categories, such as the continuous division of the root and stem category (NSJ), the leaf vegetables (NSY) and the like in the vegetables and fruits; the third stage is accurate to specific categories based on the previous stage, and is indicated by number numbers, such as NSY001 for green vegetables and NSY019 for caraway in leafy vegetables (NSY). And by analogy, naming and numbering all food materials to complete the construction of the database. For the newly added category, the new category is named according to the standard and added into the database.

In the embodiment, the proportion of food material pictures from different sources is determined, the picture collected by equipment is taken as a main body through actual tests, part of mobile phone shooting and network pictures are added, the proportion of the three pictures is approximately controlled to be 3:1:1, and a model trained by the data composition can achieve high accuracy in an application scene, cannot generate an overfitting phenomenon, and has good generalization capability.

In this embodiment, regarding the preprocessing and the picture enhancement of the picture, several image enhancement modes integrated by using the Pytorch framework are mainly used in combination. Since the resolution and size of the pictures from the three sources are different, the Resize method of pytorech was used uniformly to process the pictures to 600 x 600. The selection of the size is a result of comprehensively considering the accuracy and the calculation cost, and multiple times of experiments show that the accuracy and the calculated amount reach better balance when the resolution is selected. Because the effective food material area in the shot food material picture is usually positioned in the middle block, in order to avoid background interference as much as possible and enable the network to strengthen the learning of the main body characteristics, the center crop is carried out by using a CenterCrop method in a Pyroch, and an image of the middle area 450, 450 is obtained; randomly and vertically flipping the picture by using a random vertical flip method in a Pythroch, setting a probability parameter to be 0.5, and increasing the number of samples; randomly rotating the picture by using a RandomRotation method in the Pythrch, wherein the rotation angle is a random value between plus and minus 30 degrees; the ColorJitter method in the pytoreh is used for adjusting the brightness, the contrast and the saturation of the picture, and the diversity of the sample is further enhanced. Finally, the RandomChoice and RandomOrder methods are used, the former for randomly selecting the above methods and the latter randomly giving the order of operation of the several methods selected, these two steps again increasing the sample size and diversity.

In the embodiment, a convolutional neural network based on the improvement of a Densenet121 model is constructed, and the main improvement is that the stacking times of a Dense Block module are appropriately reduced and an attention module is introduced.

The Densenet121 is a structure-intensive convolutional neural network, and the standard Densenet121 network is composed of an initialization layer, a Dense connection layer (Dense Block), a Transition layer (Transition layer) and a classification layer. The basic structure is shown in fig. 2. The Dense Block is a Dense connection module, and the Transition layer is a connection area between two adjacent Dense Block modules. And each Dense Block module in the Densenet fuses the feature information of all previous layers, so that the feature resource utilization rate is greatly improved. The structure of the Densenet model is simple, the shallow feature with low complexity of multiple layers can be comprehensively utilized to amplify information content, a decision function with good generalization performance is obtained, and the overfitting resistance of the model is improved.

In the Densenet121 neural network model, any two layers are directly connected, output information of all layers before each layer is collected to the layer through merging operation, and feature map information obtained by learning of the layer is transmitted to all layers after the layer. The schematic formula describing this feature of densenert is as follows:

X_l＝H_l([X₀,X₁,...,X_l-1])

Specifically, the initialization layer initializes data of an input picture, and includes a convolution operation having a convolution kernel size of 7 × 7 and a stride of 2 and a maximum pooling operation having a sampling kernel size of 3 × 3 and a stride of 2, so that the feature map size of the input picture is reduced and the number of the input pictures is increased.

The Dense connection layer Dense Block performs deep feature extraction on the input feature map by using a plurality of convolution modules with the same feature map size. The convolution module is a combination of 3 operations of Batch Normalization (BN), linear rectification function (ReLU) and convolution (Conv), and the integral structure is BN + Relu + (1 × 1) Conv + BN + Relu + (3 × 3) Conv, and the integral structure can be regarded as a nonlinear conversion function.

The data are normalized by batch normalization operation (BN) to meet standard normal distribution, and meanwhile, the calculation time of the gradient of the whole training set is reduced; specifically, if the batch size of the network is m, then in the forward propagation process, each node has m outputs, and batch normalization outputs m outputs of each node in the layer, and the formula is as follows:

wherein

Indicates the value, x, of the ith input node of the layer at the time of inputting the b sample of the current batch_iIs composed of

The constructed row vector, with length of value m for the batch size, μ and σ denote the mean and standard deviation, respectively, ε is the introduced minimums, γ and β are the scale and shift parameters for the row, respectively.

Is the result after normalization.

The linear rectification function ReLU is used for replacing activation functions such as Sigmoid and Tanh, so that the processes of gradient descent and error back propagation are more efficient, and the problem of gradient disappearance is well avoided; the ReLU function is a piecewise linear function that changes all negative values to 0, while positive values are unchanged, i.e., a single-sided suppression. When the input is a negative value, the neurons can not be activated, namely, only part of the neurons can be activated at the same time, so that the neurons in the neural network have sparse activation, and the operation efficiency is accelerated.

The formula is as follows:

F(x)＝max(0,x)

where x is the input value.

The convolution operation is used for carrying out feature extraction on the data subjected to batch normalization and linear rectification processing. The dense connection layer connects all the convolution modules in front with the convolution modules in back, retains the characteristics of different layers and propagates backwards together.

When the Densenet is used for constructing the classification network, a proper number of dense connection layers and convolution modules in each dense connection layer are selected according to the condition of an input feature diagram and the specific requirements of classification, so that high classification precision is ensured, high efficiency is achieved, and redundancy is reduced as far as possible. In the embodiment, 1 dense connecting block is reduced on the basis of the standard denseneret 121 according to actual conditions, and the number of repeated convolutions is properly reduced in each module.

The Transition layer is positioned among the 2 dense connection layers, the feature maps obtained by the previous dense connection layers are subjected to dimensionality reduction processing, the feature maps comprise convolution operation and average pooling operation, the overall structure is BN + Relu + (1 x 1Conv + (2 x 2) AvgPoling, the convolution operation is used for reducing the number of the feature maps, if the previous dense connection layer outputs Z feature maps, kZ output feature maps are generated after the convolution operation in the Transition layer, and k represents a compression coefficient, and the average pooling operation can also be used for reducing the size of the feature maps and reducing the calculation load.

And the classification layer classifies the learned characteristics and maps the characteristics to the space where the label is located. Firstly, the classification layer performs global average pooling operation on the feature map obtained by the last dense connection layer, and performs dimensionality reduction on data to fuse the learned features. Then, the result obtained by global average pooling is normalized by using a Softmax function to obtain a probability vector of input data belonging to a certain class.

In this embodiment, an attention module is added to the network model to achieve more efficient feature extraction.

The attention module is arranged after the last convolution layer of the Densenet121 network, the overall structure of the added network is shown in FIG. 3, wherein the attention module applies attention to both the channel and space dimensions, and the specific structure is shown in FIG. 4.

The extraction process can be expressed as follows:

where F is a C-dimensional input feature of size H W, M_CFor a C-dimensional channel attention map of size 1 × 1, M_SIs an S-dimensional space attention diagram with the size of H multiplied by W. H. W denotes height and width, respectively, and C and S denote dimensions.

The channel attention is calculated as:

M_C(F)＝h(MLP(AvgPool(F))+MLP(MaxPool(F)))

the formula for calculating the spatial attention is as follows:

M_S(F)＝h(f^7×7([AvgPool(F)+MaxPool(F)]))

The essence of the attention mechanism is that a weight mask is added on the basis of an original feature map to represent the importance degree of features, so that the important features are strengthened, unnecessary features are weakened, and the feature extraction efficiency of the network is improved. In the calculation process, a soft attention mechanism is generally used, namely for N input information, the selection criterion of the information is determined by calculating the weighted average of all input information. Under the mechanism with probability alpha_iThe probability of selecting the ith message is expressed as:

where p denotes the probability of taking, x_iIndicates the input of the ith information, q is the information to be queried, z is the attention variable, s (x)_iAnd q) is an attention scoring function, and the probability result is normalized by a softmax function so as to be convenient for subsequent use.

The attention weight alpha is obtained by calculation_iAnd then, before finishing the important information extraction work, fusing the weight and the feature vector, wherein the operation process is as follows:

wherein alpha is_iIs the attention probability, x, calculated above_iThe ith information is input, q is the information to be inquired, and the obtained value is the total amount of attention required to be allocated for the batch input.

In practical application, the introduced attention mechanism module is arranged behind the last convolution block, namely the feature extraction module, in the densenert network model, so that the model can learn various features in an input picture with emphasis, and the overall features and the local features are taken into consideration, so that the identification accuracy is improved.

In this embodiment, the algorithm model is continuously updated by adopting a strategy of adjusting while using. The specific method comprises the steps of updating category labels and a database regularly, eliminating food material categories which are not in the scene, and adding pictures shot in the using process into a sample set for training. The training process is not started from the beginning, but based on the initial model parameters obtained before, the parameters of the corresponding layers can be fixed by setting the value of the frezen attribute to True, and only the parameters of the last layers of networks and classifiers need to be trained without participating in the training. The updating mode can not only prolong the training results before, but also learn the characteristics of the newly added pictures, and simultaneously has very high efficiency.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A deep learning-based multi-category food material recognition system comprises: the system comprises an initialization layer, a dense connection layer, a transition layer and a classification layer, wherein the dense connection layer fuses characteristic information of all layers before the dense connection layer and is respectively connected with the initialization layer and the classification layer, and the transition layer is a transition region between adjacent dense connection layers and is characterized in that an attention module is added after a convolution layer extracted from the last characteristic;

the dense connection layer uses a plurality of convolution modules with the same feature diagram size to carry out deep feature extraction on the input feature diagram, establishes connection between all the convolution modules in front and the convolution module behind, retains features of different layers and propagates backwards together, the convolution modules comprise a batch normalization layer, a linear rectification layer and a convolution layer, the batch normalization layer normalizes data to enable the data to meet standard normal distribution, the linear rectification layer adopts a ReLU activation function, and the convolution layer is used for carrying out feature extraction on the data passing through the batch normalization layer and the linear rectification layer;

the attention module strengthens important features and weakens unnecessary features from two dimensions of a channel and a space.

2. The deep learning-based multi-category food material identification system according to claim 1, wherein the attention module extracts feature formulas as follows:

the channel attention calculation formula is as follows:

M_C(F)＝h(MLP(AvgPool(F))+MLP(MaxPool(F)))

the formula for calculating spatial attention is as follows:

M_S(F)＝h(f^7×7([AvgPool(F)+MaxPool(F)]))

3. The deep learning-based multi-category food material identification system as claimed in claim 1, wherein the attention mechanism is a soft attention mechanism, and for the N input information, the selection criteria of the input information is determined by calculating the weighted average of all the input information, and under the mechanism, the attention weight α is given_iFor probability, the probability expression formula for selecting the ith information is as follows:

4. The deep learning-based multi-category food material identification system as claimed in claim 1, wherein the batch normalization layer has m outputs per node in the forward propagation process, and the batch normalization performs normalization and re-output on the m outputs per node in the layer according to the following formula:

wherein

The row vector is constructed with the length of the value m for the batch size, μ and σ denote the mean and standard deviation, respectively, ε is the introduced minimums, γ and β are the scale and shift parameters for the row, respectively,

is the result after normalization.

5. The deep learning-based multi-category food material identification system as claimed in claim 1, wherein the transition layer comprises a convolution operation and an average pooling operation, the convolution operation is used for reducing the number of feature maps, and the average pooling operation is used for reducing the size of the feature maps and reducing the computational burden.

6. The deep learning-based multi-category food material identification system as claimed in claim 1, wherein the classification layer performs a global average pooling operation on the feature map obtained from the last dense connection layer, performs dimension reduction on the data to fuse the learned features, and then normalizes the result of the global average pooling operation using a Softmax function to obtain a probability vector that the input picture belongs to a certain category.

7. The deep learning-based multi-category food material identification system according to claim 1, wherein any two layers of the system are directly connected, the output information of all layers before each layer is collected to the current layer through a merging operation, and the feature map information learned by the current layer is also transmitted to all layers after the current layer:

X_l＝H_l([X₀,X₁,...,X_l-1])

8. The recognition method of the deep learning based multi-category food material recognition system according to claim 1, characterized by comprising the following steps:

s1, the food material is subjected to picture sampling,

s4, enriching samples through preprocessing and picture enhancement;

s6, adjusting the model by modifying the type labels and loading weight training according to the food materials in different units and different time;

9. The method of claim 8, wherein the deep learning-based multi-category food material identification system comprises:

the sampling of the S1 comprises application scene equipment acquisition and other scene acquisition, wherein the application scene acquisition is to place food materials on a conveyor belt under a camera for shooting, the food materials are placed into different angles and relative position relations, the number of the food materials is gradually increased from small to large, and the other scene acquisition comprises terminal real shooting and network picture screening;

in the S2, food materials are classified in a multi-level manner by using a naming mode of three-dimensional letters and numbers, wherein the first level adopts two-dimensional letters including four categories of non-staple food, meat, poultry, vegetables and fruits and aquatic products, the second level adds one-dimensional letters to further subdivide the four categories, the third level adds number to specific categories, all food material pictures are named and numbered, and the construction of a food material data set is completed;

in S4, the picture is preprocessed to 600 × 600, one or more operations of center cropping, flipping, rotating, adjusting brightness, adjusting contrast, and adjusting saturation are randomly selected, the operation sequence is randomly arranged, the picture is enhanced, and the image of the middle area 450 × 450 is obtained by center cropping.

10. The method as claimed in claim 8, wherein the system is continuously updated by using a strategy of adjustment while updating the class labels and the food material picture database, the food material classes absent in the scene are eliminated, the pictures taken during the using process are added into the food material data set for training, the training process is based on the initial model parameters obtained before, and only the parameters of the last layers of networks and classifiers are trained by fixing the parameters of the corresponding layers without participating in the training.