CN115641576A

CN115641576A - Refrigerator food material adaptive scale recognition method and device and storage medium

Info

Publication number: CN115641576A
Application number: CN202211312647.9A
Authority: CN
Inventors: 高洪波; 孔令磊
Original assignee: Qingdao Haier Refrigerator Co Ltd; Haier Smart Home Co Ltd
Current assignee: Qingdao Haier Refrigerator Co Ltd; Haier Smart Home Co Ltd
Priority date: 2022-10-25
Filing date: 2022-10-25
Publication date: 2023-01-24

Abstract

The invention relates to a method, equipment and a storage medium for identifying self-adaptive dimensions of refrigerator food materials, wherein the method comprises the following steps: constructing an deconvolution regression network framework, wherein the framework comprises a regression network and a classification network; the regression network adopts a scale predictor to perform self-adaptive scale distribution on a plurality of target candidate frames to generate a plurality of multi-scale target candidate frames, and then frame regression processing is performed on the multi-scale target candidate frames to obtain regression of the plurality of candidate frames; the classification network extracts feature maps in a plurality of multi-scale target candidate frames generated in the regression network, and performs splicing fusion of corresponding scales on the generated multi-scale feature maps respectively to obtain a final classification result; training according to the method to obtain a deconvolution regression network; and inputting the image to be identified into the network to obtain the target type and the corresponding storage position. The separation of regression and classification improves the flexibility and accuracy of extracting the characteristics of the characteristic graphs with different resolutions to regress positions.

Description

Refrigerator food material adaptive scale identification method and equipment and storage medium

Technical Field

The invention relates to the field of household appliances, in particular to a refrigerator food material adaptive scale identification method, equipment and a storage medium.

Background

In a refrigerator scene, food materials and objects put into different layers of a box body usually have different dimensions, and the food materials with different dimensions usually have different appearances and even different postures, and if the same characteristic description is used, the description is far from sufficient. Especially for small-scale food materials at the bottom of a refrigerator body, the universal feature representation often causes insufficient classification capability, and therefore, designing a scale-adaptive feature description is crucial to improving the food material detection and identification performance.

In the existing target detector based on the convolutional neural network, a two-stage method, namely Faster R-CNN and a single-stage method, namely YOLO, are the most popular detection method series at present. Although the methods process targets with different dimensions by introducing multi-dimension feature processing, for a refrigerator video scene, the dimensions of the same target object imaged on the uppermost layer (for example, the closest to a camera) and the lowermost layer (for example, the farthest from the camera) of the refrigerator are greatly changed, and the traditional target detection method often performs violent matching on a feature map according to the dimensions of the target with the greatly changed dimensions in the processing process, so that target information in the matching process is lost or missed, the feature representation capability is weak, and the target category is easily identified by mistake.

Disclosure of Invention

The invention aims to provide a method, equipment and a storage medium for identifying refrigerator food materials in an adaptive scale mode, wherein multi-scale target regression and classification are mutually classified, and the target identification effect is enhanced.

In order to achieve the purpose, the invention provides a refrigerator food material adaptive scale identification method, which comprises the following steps: acquiring food material image information, and making a target training data set; constructing an deconvolution regression network framework, wherein the deconvolution regression network framework comprises a regression network and a classification network; the regression network adopts a scale predictor to perform self-adaptive scale distribution on a plurality of target candidate frames to generate a plurality of multi-scale target candidate frames, and then frame regression processing is performed on the plurality of multi-scale target candidate frames according to the scale size to obtain regression of the plurality of multi-scale target candidate frames; the classification network extracts feature maps in a plurality of multi-scale target candidate frames generated in the regression network, and performs splicing fusion of corresponding scales on the generated multi-scale feature maps respectively to obtain a final classification result; training according to a method for constructing a regression network and a classification network to obtain an deconvolution regression network; and inputting the food material image to be identified into the trained deconvolution regression network to obtain the type of the target food material and the corresponding storage position.

As a further improvement of the present invention, the method further comprises: the scale predictor implements adaptive scale allocation according to the size and feature distribution of the target candidate box.

As a further improvement of the present invention, the method further comprises: the deconvolution regression network framework adopts a pretrained convolution neural network as a basic feature extractor.

As a further improvement of the present invention, the method further comprises: the pre-trained convolutional neural network comprises a VGG16 base network.

As a further improvement of the present invention, the step method further comprises: in the training process of constructing the deconvolution regression network, the extractor extracts the feature map by adopting an upper sampling method or an inverse pooling method.

As a further improvement of the present invention, the step of performing border regression processing on the multiple multi-scale target candidate frames according to the scale size specifically includes: inhibiting and removing repeated target candidate frames by using a non-maximum value for the target candidate frames to obtain the target candidate frames after duplication removal; and establishing a regression equation between the target candidate frame after the duplication removal and a preset target value, and adjusting the coordinates of the target candidate frame after the duplication removal to obtain a final position result.

As a further improvement of the present invention, the step of "extracting, by the classification network, feature maps in a plurality of multi-scale target candidate boxes generated in the regression network" specifically includes: splicing and fusing the multi-scale feature maps according to the scale size and the channel dimension respectively to obtain a plurality of fused feature maps; performing convolution operation on the multiple fusion feature maps to reduce the dimension into 1 channel, so that the multiple dimension-reduced fusion feature maps are obtained; and carrying out weighted fusion processing on the plurality of dimension reduction fusion characteristic graphs to obtain targets with various scales.

As a further improvement of the present invention, before the step of "splicing and fusing the multiple multi-scale feature maps according to the scale size and the channel dimension respectively", the method further comprises: and respectively carrying out ROI Align pooling operation on the multiple multi-scale feature maps to obtain multiple fixed-size feature maps.

The invention also provides a refrigerating device which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the program to realize the steps in the method for identifying the self-adaptive scale of the refrigerator food material.

The invention also provides a storage medium, which stores a computer program, wherein the computer program, when executed by a processor, implements the steps of the method for identifying the adaptive scale of the refrigerator food material.

Compared with the prior art, the invention has the following beneficial effects: the invention constructs the deconvolution regression network based on the pre-trained convolutional neural network training, and the network realizes the separation of target regression and target type identification and improves the robustness and the flexibility. A scale predictor is introduced in the regression network construction process, the extracted target candidate frames are subjected to adaptive scale distribution, and the target candidate frames are subjected to regression processing according to the scale size, so that targets with different scales are predicted on feature maps with different resolutions, and the flexibility and accuracy of regression of target positions with different scales are improved.

Meanwhile, in the construction process of the classification network, a plurality of multi-scale candidate frames generated by the classification network are subjected to up-sampling or anti-pooling to extract a feature map, so that the extracted feature map has high resolution, and a good data source is provided for subsequent position regression and type identification; and then correspondingly splicing and fusing a plurality of multi-scale feature maps according to the scale sizes to generate a plurality of fusion features with different scales, performing weighted fusion processing on the fusion features, wherein the fused features form a multi-scale feature expression.

Drawings

Fig. 1 is a schematic flow chart of a method for identifying refrigerator food material adaptive dimensions according to an embodiment of the present invention.

FIG. 2 is a block diagram of an exemplary deconvolution regression network.

FIG. 3 is a schematic flow chart of a candidate box regression process in the regression network construction process according to an embodiment of the present invention.

Fig. 4 is a schematic flow chart of fusion processing performed on feature maps of candidate frames extracted in the process of constructing a classification network according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more clear, the technical solutions of the present application will be clearly and completely described below with reference to the detailed description of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application.

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

The application discloses an embodiment of a method for identifying the adaptive scale of refrigerator food materials, and although the method operation steps are provided in the following embodiment or the flowchart 1, the method has no necessary causal relationship in the steps logically based on the conventional or non-creative labor, and the execution sequence of the steps is not limited to the execution sequence provided in the embodiment of the application. As shown in fig. 1, an embodiment of the present invention provides a method for identifying a refrigerator food material adaptive scale, where the method includes the following steps, and the method and each step are described below:

s1, acquiring food material image information and making a target training data set.

And S2, constructing an deconvolution regression network framework, wherein the framework comprises a regression network and a classification network.

And S3, the regression network adopts a scale predictor to perform self-adaptive scale distribution on the multilayer multi-scale candidate frame to obtain regression of the multiple scale candidate frames.

And S4, extracting the feature maps in the multiple multi-scale candidate frames generated in the regression network by the classification network, generating multiple multi-scale feature maps, and performing weighted fusion of corresponding scales on the multiple multi-scale feature maps to obtain a final classification result.

And S5, training according to the methods in the steps S3 and S4 to obtain a deconvolution regression network.

And S6, inputting the food material image to be identified into the network to obtain the type and the storage position of the food material.

In one embodiment of the invention, food material images in a refrigerator can be collected through a high-definition camera or an infrared camera, and a target training data set is manufactured by screening a large number of collected food material images. The target training data set provides training samples for the subsequent construction of the deconvolution regression network.

In the preferred embodiment of the invention, a VGG16 network is adopted, the VGG16 network is a convolutional neural network, the network layers containing parameters in the network structure have 16 layers in total, namely 13 convolutional layers, 5 pooling layers or 5 convolutional layers, 3 full-link layers and a softmax output layer, the layers are separated by using max-Pooling (maximization pool), and the activation units of all hidden layers adopt ReLu functions.

For the VGG16 convolutional neural network, the pooling layer is used as a boundary, the VGG16 network has 6 module structures, and the number of channels in each module structure is the same, because the convolutional layer and the fully-linked layer both have weight coefficients, which are also called weight layers, wherein the weight is not related to the convolutional layer 13, the fully-linked layer 3 and the pooling layer. So 13 convolutional layers and 5 pooling layers in the network are responsible for feature extraction, and the last 3 full link layers and softmax output layer are responsible for completing classification tasks.

In one embodiment of the invention, an deconvolution regression network framework is constructed on the basis of a VGG16 network, the framework comprises a regression network and a classification network, the regression network adopts any convolutional neural network pre-trained on ImageNet as a basic feature extractor, a plurality of target candidate frames with different scales in each layer of network are extracted on the basis of a selective search algorithm according to the target scale size in an input image, and a scale predictor is introduced to perform adaptive scale distribution on the plurality of multi-scale target candidate frames in the process of extracting the plurality of multi-scale target candidate frames. The scale predictor performs adaptive scale distribution according to the scale size and the feature distribution of the target candidate frames, performs respective processing on the target candidate frames with different scales, such as a large-scale target candidate frame and a small-scale target candidate frame, extracted from each layer to generate regression of a plurality of corresponding scale target candidate frames, and improves the regression accuracy of the targets with different scales by using the scale predictor to perform respective processing on the targets with different scales. And then performing candidate frame regression processing on the multiple corresponding scale target candidate frames respectively to obtain multiple scale target candidate frame regressions, such as large scale candidate frame regression and small scale candidate frame regression.

In an embodiment of the present invention, the classification network extracts feature maps in a plurality of multi-scale target candidate frames generated in the regression network, generates a plurality of multi-scale feature maps, and performs splicing fusion of corresponding scale feature maps to generate a multi-scale fusion feature map, and performs weighted fusion processing on the multi-scale fusion feature map to obtain a final target classification result. Specifically, the convolutional neural network has multiple layers, each layer of network adopts a selective search algorithm to extract multiple target candidate frames with different scales, and a scale predictor is introduced to classify the multiple target candidate frames with different scales according to the scale size, so that multiple target candidate frames with multiple scales can be generated, the multiple target candidate frames with multiple scales are transmitted to the classification network, feature extraction is performed on the multiple target candidate frames with each scale (because each layer extracts the target candidate frames with corresponding scales, multiple layers can generate multiple target candidate frames), multiple feature maps are generated, and the feature maps are spliced and fused respectively, so that target information is prevented from being lost in the process of extracting the feature maps from each layer, and the accuracy of later-stage image type identification is improved.

According to the construction method of the regression network and the classification network, the target training data set is used for training to obtain a deconvolution regression network, the food material image to be identified is input into the network, and the types of the targets with different scales in the food material image and the positions of the targets correspondingly stored in the refrigerator are automatically identified. The position of the target in the image is usually represented by (x, y, w, h) a target object position frame, where x, y are the center point of the target, w, h are the length and width of the target candidate frame; and the type of object is to indicate what the object is specifically, whether it is an apple or a tomato.

FIG. 2 is a schematic diagram of a framework of a deconvolution regression network. In the embodiment of the invention, on one hand, a VGG16 is preferably adopted by any pre-trained convolutional neural network, the network comprises a multi-layer network feature map extraction structure, each layer of network uses a selective search algorithm to extract a target candidate frame before feature map extraction, a scale predictor is introduced on the basis of the algorithm to realize the self-adaptive distribution of the candidate frame scale of the generated target candidate frame and generate multiple scale target candidate frames, so that the positions of a large-scale target and a small-scale target shot on a picture can be detected, namely targets of different scales are predicted on feature maps of different resolutions, the large target is predicted on a feature map of low resolution, the medium target is predicted on a feature map of medium resolution, the small target is predicted on a feature map of high resolution, and the regression of candidate frames of targets of different scales is realized, namely the determination of the positions of targets of different scales is realized.

On the other hand, as shown in fig. 2, the extracted feature maps of each layer are also fused at corresponding positions according to the size of the target scale, that is, first, a large-scale target candidate frame and a small-scale target candidate frame of each layer in the network are extracted, then, the large-scale target candidate frames at corresponding positions in the multilayer network are extracted, the generated large-scale target feature maps are spliced and fused to generate a fused feature map of the large-scale target, similarly, the small-scale target candidate frames are also subjected to the same fusion operation to generate a fused feature map of the small-scale target, and finally, the weighting processing is performed in combination with all the fused feature maps of different scales to identify the types of the various scale targets in the image.

Fig. 3 is a schematic flow chart of the regression processing of the candidate frame in the process of constructing the regression network in the embodiment of the present invention, which specifically includes the following steps:

and S31, removing repeated candidate frames by using non-maximum suppression to obtain the candidate frames after the duplication removal.

And S32, establishing a regression equation, and adjusting the coordinates of the candidate frame after the duplication elimination to obtain a final position result.

The method comprises the steps of obtaining multi-level and multi-scale target candidate frames based on a selective search algorithm, wherein redundancy may exist in the target candidate frames, for example, about hundreds of target candidate frames on a picture, and selecting a candidate frame with the highest score value and the low score by adopting Non-Maximum Suppression (NMS) so as to obtain the target candidate frame selected after de-duplication. And establishing a regression equation between the finally selected target candidate frame and a preset target value, and searching a regression candidate frame which enables the input target candidate frame to be closer to the real target candidate frame after the input target candidate frame is mapped by the regression equation, so that the new target candidate frame is closer to the real target, and the accuracy of prediction is improved. That is, the position of the input candidate frame is adjusted by establishing the hidden-projection relation between the input target candidate frame and the target value, and the final target position result is obtained.

Fig. 4 is a schematic flowchart of a process of performing fusion processing on feature maps of extracted candidate frames in a process of constructing a classification network in the embodiment of the present invention, which specifically includes the following steps:

s41, obtaining a plurality of fixed size candidate boxes by utilizing ROI Align pooling operation.

And S42, extracting a plurality of feature maps in the candidate frames, splicing and fusing the feature maps to generate a plurality of fused feature maps.

S43, weighting the multiple fusion feature maps to obtain the type of the target.

The classification network acquires a plurality of target candidate frames with various scales from the regression network, the sizes of the target candidate frames are not fixed, and ROI (Region of Interest) pooling operation is performed on the target candidate frames, so that the sizes of the target candidate frames are fixed, and better help is provided for type identification of subsequent targets.

After ROI (region of interest) pooling operation is carried out on the candidate frames of each scale, feature maps in each layer of target candidate frames are extracted, splicing and fusion are carried out according to channel dimensions to obtain a plurality of fusion feature maps, dimension reduction is carried out to 1 channel through convolution operation, the obtained fusion feature maps are consistent with the input feature maps in the space dimension, splicing and fusion of the feature maps are carried out on the target candidate frames of various scales by adopting the same method to obtain a plurality of fusion feature maps, then weighting fusion processing is carried out on the fusion feature maps, and the target type is identified. The weight selected in the weighting fusion processing process is related to the scene of the target during shooting, for example, the target is close to the camera, the size of the shot target is large, and the weight of the large-scale feature map is large, so that the feature information of the target can be enhanced, and the accuracy of identifying the type of the target is improved. And (3) passing a result generated by weighting and fusing the characteristic diagram through a full link layer of the VGG16, expanding the obtained result feature map into a vector, performing convolution operation on the vector to finally reduce the dimensionality of the vector, and then inputting the vector into an output layer, namely a softmax layer to obtain the confidence coefficient of the class of the target with different scales in the image so as to identify the type of the multi-scale target.

The embodiment of the invention also provides a refrigerating device, which comprises a memory and a processor, wherein the memory stores instructions, and the processor calls the instructions in the memory, so that the refrigerator food material adaptive scale identification method can be realized when the refrigerating device is executed.

The embodiment of the invention also provides a storage medium, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the method for identifying the self-adaptive scale of the refrigerator food material is realized.

In summary, according to the method, the device and the storage medium for identifying the self-adaptive scale of the refrigerator food material provided by the invention, on one hand, in order to effectively process the targets with different scales, firstly, on the basis of a pre-trained basic network, according to the size and the feature distribution of a target candidate frame, the features of the target candidate frame are extracted on feature maps with different resolutions in a self-adaptive manner through a scale predictor to regress the position, and in the traditional method, all scales of the targets are regressed back and forth on a final convolution layer, so that the regression with the self-adaptive scale can extract more accurate feature expressions, and thus the target regression with different scales is better represented.

On the other hand, in order to further increase the judgment capability of the target type classifier, the multi-layer depth features are fused to predict the type of the target, and the fused feature map forms a multi-scale feature expression, so that the loss of partial information of the target object during the ROI pooling operation can be reduced, the target feature expression is obviously enhanced, and compared with the traditional method of classifying only by using the final features, the method can effectively capture more precise information of the target, and enhances the type identification effect.

It should be understood that although the specification describes embodiments, not every embodiment includes only a single embodiment, and such description is for clarity purposes only, and it will be appreciated by those skilled in the art that the specification as a whole may be appropriately combined to form other embodiments as will be apparent to those skilled in the art.

The above-listed detailed description is only a specific description of a possible embodiment of the present invention and is not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A method for adaptively identifying food material images is characterized in that,

the method comprises the following steps:

acquiring food material image information and making a target training data set;

constructing an deconvolution regression network framework, wherein the deconvolution regression network framework comprises a regression network and a classification network;

the regression network adopts a scale predictor to perform self-adaptive scale distribution on a plurality of target candidate frames to generate a plurality of multi-scale target candidate frames, and then frame regression processing is performed on the plurality of multi-scale target candidate frames according to the scale size to obtain regression of the plurality of multi-scale target candidate frames;

the classification network extracts feature maps in a plurality of multi-scale target candidate frames generated in the regression network, and performs splicing fusion of corresponding scales on the generated multi-scale feature maps respectively to obtain a final classification result;

training according to a method for constructing a regression network and a classification network to obtain an deconvolution regression network;

and inputting the food material image to be identified into the trained deconvolution regression network to obtain the type of the target food material and the corresponding storage position.

2. The food material image adaptive identification method according to claim 1,

the method further comprises the following steps:

the scale predictor implements adaptive scale allocation according to the size and feature distribution of the target candidate box.

3. The food material image adaptive identification method according to claim 1,

the method further comprises the following steps:

the deconvolution regression network framework adopts a pretrained convolution neural network as a basic feature extractor.

4. The food material image adaptive identification method according to claim 3,

the method further comprises the following steps:

the pre-trained convolutional neural network comprises a VGG16 base network.

5. The food material image adaptive identification method according to claim 3,

the method comprises the following steps:

in the training process of constructing the deconvolution regression network, the basic feature extractor adopts an upper sampling method or an inverse pooling method to extract a feature map.

6. The food material image adaptive identification method according to claim 1, characterized in that,

the step of performing frame regression processing on the multiple multi-scale target candidate frames according to the scale sizes specifically includes:

inhibiting and removing repeated target candidate frames by using a non-maximum value for the target candidate frames to obtain the target candidate frames after duplication removal;

and establishing a regression equation between the target candidate frame after the duplication removal and a preset target value, and adjusting the coordinates of the target candidate frame after the duplication removal to obtain a final position result.

7. The food material image adaptive identification method according to claim 1, characterized in that,

the step of extracting feature maps in a plurality of multi-scale target candidate boxes generated in the regression network by the classification network specifically comprises the following steps:

splicing and fusing the multi-scale feature maps according to the scale size and the channel dimension respectively to obtain a plurality of fused feature maps;

performing convolution operation on the multiple fusion feature maps to reduce the dimension into 1 channel, so that the multiple dimension-reduced fusion feature maps are obtained;

and carrying out weighted fusion processing on the plurality of dimension reduction fusion characteristic graphs to obtain target identification types with various scales.

8. The food material image adaptive identification method according to claim 7,

before the step of splicing and fusing the multiple multi-scale characteristic graphs according to the scale size and the channel dimension respectively, the method further comprises the following steps:

and respectively carrying out ROI Align pooling operation on the multiple multi-scale feature maps to obtain multiple fixed-size feature maps.

9. A refrigeration appliance comprising a memory and a processor, the refrigeration appliance comprising

The memory stores a computer program operable on the processor, and the processor executes the program to realize the food material image adaptive identification method according to any claim 1-8.

10. A storage medium storing a computer program, wherein the computer program is executed by a processor to implement the steps of the food material image adaptive identification method according to any one of claims 1-8.