CN116259052A

CN116259052A - Food ripeness recognition method and device and cooking equipment

Info

Publication number: CN116259052A
Application number: CN202111511741.2A
Authority: CN
Inventors: 陈磊; 陈蔚; 魏中科
Original assignee: Foshan Shunde Midea Washing Appliances Manufacturing Co Ltd
Current assignee: Foshan Shunde Midea Washing Appliances Manufacturing Co Ltd
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2023-06-13

Abstract

The invention relates to the field of cooking equipment, and provides a food doneness recognition method and device and the cooking equipment, wherein the method comprises the following steps: acquiring an infrared image and a visible light image of a target food; inputting the infrared image and the visible light image into a doneness recognition model to obtain target doneness information of the target food output by the doneness recognition model; the doneness recognition model is obtained by training by taking a sample infrared image and a sample visible light image of food as samples and taking predetermined doneness information corresponding to the sample infrared image and the sample visible light image as sample labels. According to the method, the identification of the visible light image is corrected through the information represented by the infrared image, the degree of food ripeness is accurately positioned by utilizing the temperature information reflected by the infrared image, and the accuracy of identifying the degree of food ripeness is effectively improved.

Description

Food ripeness recognition method and device and cooking equipment

Technical Field

The invention relates to the technical field of cooking equipment, in particular to a method and a device for identifying food ripeness and the cooking equipment.

Background

Along with development of science and technology, the intelligent degree of cooking equipment further improves, and in the cooking process, cooking equipment can detect the maturity of food according to food surface colour and morphological change. However, due to the influence of cooking environments such as oil smoke and illumination, the accuracy of the maturity detection result through pictures is low.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the invention provides a food doneness recognition method, which improves the accuracy of doneness detection.

The invention also provides cooking equipment, which regulates and controls the cooking firepower according to the result of identifying the food maturity, thereby achieving the purposes of improving the cooking effect and reducing the cooking energy consumption.

The food ripeness recognition method according to the embodiment of the invention comprises the following steps:

acquiring an infrared image and a visible light image of a target food;

inputting the infrared image and the visible light image into a doneness recognition model to obtain target doneness information of the target food output by the doneness recognition model;

the doneness recognition model is obtained by training by taking a sample infrared image and a sample visible light image of food as samples and taking predetermined doneness information corresponding to the sample infrared image and the sample visible light image as sample labels.

According to the food ripeness recognition method provided by the embodiment of the invention, the recognition of the visible light image is corrected through the information such as the food position represented by the infrared image, and the food ripeness is precisely positioned by utilizing the temperature information reflected by the infrared image, so that the accuracy of the food ripeness recognition is effectively improved.

According to one embodiment of the present invention, the inputting the infrared image and the visible light image into a doneness recognition model to obtain target doneness information of the target food output by the doneness recognition model includes:

inputting the infrared image and the visible light image into an input layer of the doneness recognition model to obtain a fusion feature map output by the input layer;

inputting the fusion feature map to a feature fusion layer of the doneness recognition model to obtain a plurality of target feature vectors with different dimensions and output by the feature fusion layer;

and inputting the target feature vectors with the different dimensions to an output layer of the doneness recognition model to obtain the target doneness information output by the output layer.

According to one embodiment of the present invention, the inputting the target feature vectors of the multiple different dimensions into the output layer of the doneness recognition model, to obtain the target doneness information output by the output layer, includes:

inputting the target feature vectors with the different dimensions to the output layer, and carrying out convolution and classification to obtain the category information, the position information, the category confidence and the doneness type of the target food;

And obtaining the target maturity information output by the output layer based on the category information, the position information, the category confidence and the maturity type.

According to one embodiment of the present invention, the inputting the target feature vectors with the multiple dimensions to the output layer, performing convolution and classification to obtain category information, location information, category confidence and doneness type of the target food, includes:

inputting the target feature vectors with the different dimensions into a food category convolution network of the output layer to obtain the category information output by the food category convolution network;

inputting the target feature vectors with the different dimensions into a position convolution network of the output layer to obtain the position information and the category confidence degree output by the position convolution network;

and inputting the target feature vectors with the different dimensions into a doneness convolution network of the output layer to obtain the doneness type output by the doneness convolution network.

According to one embodiment of the invention, the classification identification of the output layer is non-anchor frame identification.

According to one embodiment of the present invention, the inputting the infrared image and the visible light image to the input layer of the doneness recognition model, obtaining a fusion feature map output by the input layer, includes:

inputting the infrared image and the visible light image into the input layer, and splicing the infrared image and the visible light image to obtain a target fusion image;

and extracting the characteristics of the target fusion image to obtain the fusion characteristic image output by the input layer.

inputting the infrared image and the visible light image into the input layer, and respectively extracting the characteristics of the infrared image and the visible light image to obtain an infrared characteristic image and a visible light characteristic image;

and splicing the infrared characteristic diagram and the visible light characteristic diagram to obtain the fusion characteristic diagram output by the input layer.

According to an embodiment of the present invention, the inputting the fused feature map to a feature fusion layer of the doneness recognition model, to obtain a plurality of target feature vectors with different dimensions output by the feature fusion layer, includes:

Inputting the fusion feature map to the feature fusion layer, and performing multi-layer convolution processing on the fusion feature map to obtain N first feature layers with different dimensions;

carrying out 1x1 convolution on the ith first characteristic layer to obtain an ith-1 third characteristic layer;

downsampling the 1 st first characteristic layer to obtain a 1 st second characteristic layer;

starting from the 1 st second feature layer, fusing an i second feature layer with an i third feature layer to obtain an i+1th fourth feature layer, wherein the i second feature layer is a downsampling result of the i+1th fourth feature layer;

carrying out convolution treatment on the N-M to N fourth feature layers to obtain M target feature vectors with different dimensions;

wherein i and M are integers greater than 1 and N is an integer greater than 2.

According to an embodiment of the present invention, the inputting the fused feature map to the feature fusion layer, performing a multi-layer convolution processing on the fused feature map to obtain a plurality of N first feature layers with different dimensions, includes:

inputting the fusion feature map to the feature fusion layer, and performing multi-layer convolution processing on the fusion feature map to obtain N fifth feature layers with different dimensions;

Carrying out 1x1 convolution on the N-1 fifth characteristic layer to obtain an N-1 sixth characteristic layer;

upsampling the nth fifth feature layer to obtain an nth-1 seventh feature layer;

and starting from the N-1 seventh characteristic layer, fusing the N-1 seventh characteristic layer and the N-1 sixth characteristic layer to obtain N-1 first characteristic layer, and taking the N fifth characteristic layer as the N first characteristic layer.

According to a second aspect of the present invention, a food doneness recognition apparatus includes:

the acquisition module is used for acquiring an infrared image and a visible light image of the target food;

the processing module is used for inputting the infrared image and the visible light image into a doneness recognition model to obtain target doneness information of the target food output by the doneness recognition model;

According to an embodiment of the third aspect of the present invention, a cooking apparatus includes:

the image acquisition device is used for acquiring infrared images and visible light images of the target food;

And the controller is electrically connected with the image acquisition device and is used for outputting the target maturity information of the target food based on the infrared image and the visible light image.

According to an embodiment of the present invention, the controller is further configured to output a target control instruction for controlling cooking power of the cooking apparatus based on the target doneness information.

An electronic device according to an embodiment of the fourth aspect of the present invention comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the food doneness recognition method as described in any of the above when executing the computer program.

A non-transitory computer readable storage medium according to an embodiment of the fifth aspect of the present invention has stored thereon a computer program which, when executed by a processor, implements the steps of the food doneness recognition method as described in any of the above.

A computer program product according to an embodiment of the sixth aspect of the invention comprises a computer program which, when executed by a processor, implements the steps of the method for identifying food doneness as described in any of the preceding.

The above technical solutions in the embodiments of the present invention have at least one of the following technical effects:

the identification of the visible light image is corrected through the information such as the food position represented by the infrared image, the degree of ripeness of the food is accurately positioned by utilizing the temperature information reflected by the infrared image, and the accuracy of identifying the degree of ripeness of the food is effectively improved.

Further, the cooking process is a process of heating food at a high temperature to ripen the food, the temperature information reflected by the infrared image can directly position the degree of ripeness of the food, the food is not limited by the type of food materials, and accurate identification of different degrees of ripeness of the food can be realized.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a method for identifying food ripeness according to an embodiment of the invention;

fig. 2 is a schematic structural view of a cooking apparatus according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a doneness recognition model according to an embodiment of the present invention;

FIG. 4 is a second schematic diagram of a doneness recognition model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a processing flow of a doneness recognition model according to an embodiment of the present invention;

FIG. 6 is a second flowchart of a process for determining a doneness recognition model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a food doneness recognition device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Reference numerals:

21: a smoke machine; 22: a stove; 23: an infrared camera; 24: visible light camera.

Detailed Description

Embodiments of the present invention are described in further detail below with reference to the accompanying drawings and examples. The following examples are illustrative of the invention but are not intended to limit the scope of the invention.

In the description of the embodiments of the present invention, it should be noted that the terms "center", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the embodiments of the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In describing embodiments of the present invention, it should be noted that, unless explicitly stated and limited otherwise, the terms "coupled," "coupled," and "connected" should be construed broadly, and may be either a fixed connection, a removable connection, or an integral connection, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium. The specific meaning of the above terms in embodiments of the present invention will be understood in detail by those of ordinary skill in the art.

In embodiments of the invention, unless expressly specified and limited otherwise, a first feature "up" or "down" on a second feature may be that the first and second features are in direct contact, or that the first and second features are in indirect contact via an intervening medium. Moreover, a first feature being "above," "over" and "on" a second feature may be a first feature being directly above or obliquely above the second feature, or simply indicating that the first feature is level higher than the second feature. The first feature being "under", "below" and "beneath" the second feature may be the first feature being directly under or obliquely below the second feature, or simply indicating that the first feature is less level than the second feature.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

The intelligent kitchen is an important ring of intelligent home, and intelligent cooking is the weight of the intelligent kitchen, and food doneness is detected by analyzing food image information in the cooking process, so that the intelligent kitchen is an important way for realizing intelligent cooking.

Firstly, the image quality of analyzing the doneness cannot be ensured under the influence of external factors such as lamplight and lampblack of a cooking environment, secondly, the analysis of the doneness of food has higher requirements on the image texture details, the cost of the image acquisition device is increased, the high cost makes the image acquisition device difficult to popularize, and finally, due to the limitation of the attribute of the food, some foods cannot judge the doneness of the food from the color or the form obtained by the analysis of the image.

The following describes a method for identifying food ripeness according to an embodiment of the present invention with reference to fig. 1 to 6, in which different kinds of information are complementarily fused by reference to a visible light image and an infrared image, thereby improving accuracy of food ripeness detection.

As shown in fig. 1, the food doneness recognition method of the present invention includes step 110 and step 120, and the execution subject of the method may be a controller of a device terminal, or a cloud or an edge server.

Step 110, obtaining a visible light image and an infrared image of the target food.

Wherein the target food is food of which the doneness is to be detected.

In this step, the visible light image and the infrared image of the target food may be acquired by installing an image acquisition device in the cooking apparatus.

In the embodiment of the invention, the cooking equipment comprises, but is not limited to, cooking equipment such as a kitchen ventilator, a kitchen range, a microwave oven, an air fryer, an electric cooker, an electromagnetic oven and the like.

The visible light image is acquired by the visible light image acquisition equipment and can reflect detailed information such as color, form, texture and the like of the target food and the environment where the target food is located.

For example, as shown in fig. 2, the visible light image may be an RGB image of a target food being cooked on a kitchen range 22 under the range 21, which is collected by a visible light camera 24 mounted on the range 21.

The infrared image is acquired by the infrared image acquisition equipment and can reflect the heat radiation information of the target food and the environment where the target food is located.

In practical implementation, the infrared image is to detect the infrared specific wave band signal of the target food and the environment thereof by the infrared thermal imaging principle, convert the signal into a graph or an image, and further obtain the thermal value information of the temperature and the like of the target food and the environment thereof.

For example, as shown in fig. 2, the infrared image may be a thermal radiation image of a target food being cooked on a cooker 22 under the range 21, which is acquired by an infrared camera 23 mounted on the range 21.

In actual implementation, the visible light image and the infrared image of the target food acquired at the same visual angle can be acquired, so that the correspondence of different position information of the visible light image and the infrared image is facilitated.

And 120, inputting the visible light image and the infrared image into a doneness recognition model to obtain target doneness information corresponding to the target food.

In this embodiment, the input of the doneness recognition model is two images of a visible light image and an infrared image, and the output of the doneness recognition model is target doneness information corresponding to the target food.

The target doneness information output by the doneness recognition model may be a doneness level of the target food, for example, the doneness level of the target food may include doneness levels of full-life, half-life, full-ripeness, over-ripeness, and the like.

It is understood that the target food may be a food comprising a plurality of different food materials, and the target doneness information output by the doneness recognition model comprises doneness levels of all food materials in the target food.

For example, the target food is a potato meat, the target food comprises two foods of potato and meat, the doneness recognition model outputs corresponding target doneness information by analyzing two images of a visible light image and an infrared image, the doneness level of the potato is half-life, and the doneness level of the meat is full-ripeness.

It can be understood that the doneness recognition model needs to be trained before doneness recognition of the target food, and in the training process of the doneness recognition model, the sample visible light image and the sample infrared image are taken as samples, and the predetermined doneness information corresponding to the sample visible light image and the sample infrared image is taken as a sample label, so that the doneness recognition model is trained.

The image acquisition device is used for acquiring visible light images and infrared images of foods to serve as sample visible light images and sample infrared images through cooking different foods under different cooking equipment and different cooking environments.

The food maturity can be primarily determined by the detail information such as the color, the form and the texture of target food through the recognition of the visible light image by the maturity recognition model, the recognition of the visible light image is corrected by the information such as the food position represented by the infrared image through the recognition of the infrared image, and the food maturity can be accurately positioned by the temperature information reflected by the infrared image.

In the embodiment, the doneness recognition model fuses the information of the visible light image and the infrared image, so that the accuracy and the intelligent level of doneness recognition of target food can be effectively improved.

The doneness recognition model is a detection model based on a deep convolutional neural network, improves the accuracy of food doneness recognition by analyzing a visible light image and an infrared image, and can accurately recognize food doneness due to unobvious color or texture change in the cooking process.

In the related technology, the technology of detecting the degree of ripeness by combining the physical structure and the chemical structure of the food material appears, the technology supplements food color information by taking the density or the volume of the food material, the density or the volume of the food material does not change greatly in the cooking process for some rhizome vegetables or some liquid food materials, the degree of ripeness is limited by the type of the food material, and the accuracy is lower.

According to the cooking degree identification model, the cooking degree grade of the food is identified through the identification of the visible light image and the infrared image, the cooking process is a process of heating the food at high temperature to enable the food to be cooked, the temperature information reflected by the infrared image can be used for directly positioning the cooking degree of the food, the limitation of food material types is avoided, and the accurate identification of different cooking degrees of the food can be realized.

It can be understood that when the doneness recognition model analyzes the temperature information reflected by the infrared image, the position information of the target food in the image can be judged according to the visible light image, so that the heat source influence of non-target food such as a heating source of the cooking device in the infrared image is removed, and the accuracy of determining the doneness of the food by the infrared image is improved.

According to the food ripeness recognition method provided by the invention, the recognition of the visible light image is corrected through the information such as the food position represented by the infrared image, and the food ripeness is precisely positioned by utilizing the temperature information reflected by the infrared image, so that the accuracy of the food ripeness recognition is effectively improved.

In some embodiments, as shown in FIG. 3, the doneness recognition model includes an input layer, a feature fusion layer, and an output layer.

The input part of the input layer comprises a visible light image and an infrared image, and the fusion characteristic diagram of the output of the input layer is obtained by extracting the characteristics of the visible light image and the infrared image.

The fusion characteristic diagram comprises detail information such as color, form, texture and the like of the target food reflected by the visible light image, and also comprises heat radiation information of the target food reflected by the infrared image.

The next layer of the input layer is a feature fusion layer, the input layer outputs the fusion feature images to the feature fusion layer, and the feature fusion layer can perform feature extraction and feature fusion on the fusion feature images.

In actual implementation, the feature extraction part in the feature fusion layer may be a classical network such as VGG, renet, and acceptance, or a lightweight network such as mobilene, shufflenet and ghostnet.

The network model architecture of the feature extraction part of the feature fusion layer has great flexibility, and suitable modules can be distributed or selected according to the deployment requirements of practical applications to flexibly replace so as to meet the requirements of scenes.

And the feature extraction part in the feature fusion layer fuses the information reflected by the visible light image and the infrared image, and more accurate semantic information can be extracted by carrying out feature extraction on the extracted fusion feature image.

And the feature fusion layer is used for fusing the features extracted by the feature extraction part, outputting a plurality of target feature vectors with different dimensions and providing richer feature information for the detection of the following output layer.

And the output layer of the doneness recognition model outputs target doneness information corresponding to the target food by detecting a plurality of target feature vectors with different dimensions.

In some embodiments, as shown in fig. 4, the output layer of the doneness recognition model takes as input a plurality of target feature vectors with different dimensions, and can obtain the position information, the doneness type, the category information and the category confidence corresponding to the target food through convolution operation and classification processing.

In this embodiment, the output layer outputs the target doneness information according to the position information, doneness type, category information, and category confidence corresponding to the target food.

The category information refers to categories of objects such as target food in visible light images and infrared images, tableware, cookware and the like in the environment where the target food is located, the target food can comprise a plurality of different food materials, and the category information comprises categories of all food materials in the target food.

Wherein the dimension of the category information is HxWxC, C refers to channel of output, W refers to width of the feature, and H refers to height of the feature.

The position information refers to position coordinates corresponding to objects such as tableware, cookware and the like in the environment where the target food is located in the visible light image and the infrared image.

For example, the position information obtained by the output layer may include coordinates (x 1, y1, w1, h 1) of the target food in the visible and infrared images and coordinates (x 2, y2, w2, h 2) of the tableware.

Where x, y and w in the coordinates represent the position of the item on the plane relative to the reference point and h represents the height position of the item relative to the plane.

Wherein the dimension of the position information is HxWx4, W refers to the width of the feature, H refers to the height of the feature, and 4 refers to x, y, W and H.

Confidence is also called reliability, or Confidence level, confidence coefficient, i.e. how large the estimated value is within a certain allowed error range from the overall parameter, the corresponding probability is called Confidence.

In this embodiment, the category confidence refers to the probability of whether the classification result of the target food and the items such as tableware and cookware in the environment where the target food is located is reliable in the visible light image and the infrared image.

Wherein the dimension of the category confidence is HxWx1, W refers to the width of the feature, and H refers to the height of the feature.

The doneness type (quality) refers to doneness information corresponding to a target food in the visible light image and the infrared image, the target food may include a plurality of different food materials, and the doneness type includes doneness information of all food materials in the target food.

Wherein, the dimension of the category confidence is HxWxM, M refers to the output maturity, W refers to the width of the feature, and H refers to the height of the feature.

In this embodiment, the output layer may take the output of the feature fusion layer as input according to the requirement of the output task, and obtain and output the position information, the doneness type, the category information and the category confidence through convolution classification.

In some embodiments, the output layer includes a food category convolution network, a location convolution network, and a doneness convolution network.

The input of the food category convolution network is a plurality of target feature vectors with different dimensions, and the input is category information; the input of the position convolution network is a plurality of target feature vectors with different dimensions, and the output is position information and category confidence; the input of the doneness convolution network is a plurality of target feature vectors with different dimensions, and the doneness type is output.

In this embodiment, three classification networks are established at the output layer to detect respectively, the food category convolution network detects category information corresponding to the target food, the location convolution network detects location information corresponding to the target food and category confidence, the doneness convolution network detects doneness type corresponding to the target food, and the output layer belongs to a design that does not share weight.

The maturity recognition model is added with three full convolution networks after a feature fusion layer to output different classification detection results, namely, decoupling (decoupling) operation is carried out on different classification information such as position information, maturity type, category information and category confidence, and the position information, the maturity type, the category information and the category confidence are respectively classified and output, so that the output speed and the output accuracy can be improved.

It should be noted that the food category convolution network, the location convolution network, and the doneness convolution network in the output layer need to calculate the output loss respectively.

In some embodiments, the output layer classification identification for location information, maturity type, category information, and category confidence is based on non-anchor frame identification.

In this embodiment, the classification of the output layer with respect to location information, maturity type, category information, and category confidence is directly predicted from a plurality of different dimensional sized target feature vectors, i.e., directly predicted from points on each feature map.

The anchor frame is a priori frame with anchor points as centers in the target detection algorithm, and a plurality of different aspect ratios are predefined by the algorithm, and the frame selection object is detected.

The aspect ratio of the anchor frame is a priori value on the existing dataset, is a hyper-parameter, and needs to be configured in the configuration file of the detection model.

In this embodiment, by the non-anchor frame recognition method, the position information, the doneness type, the category information and the category confidence are recognized, and the setting of the super-parameters in the output layer is reduced.

In some embodiments, for the input layer of the doneness recognition model, it is necessary to process two images of the visible light image and the infrared image at a time, and the fusion of the multiple input images can be achieved in the following two ways.

1. And (5) splicing and fusing the images.

In this embodiment, the input layer performs stitching processing on the visible light image and the infrared image to obtain a target fusion image, and then performs feature extraction processing on the target fusion image to obtain a fusion feature map output to the feature fusion layer.

The stitching processing refers to directly stitching the visible light image and the infrared image in the image channel dimension.

For example, the sizes of the visible light image and the infrared image input by the input layer are 640x480x3, after the stitching processing is performed, the obtained target fusion image is a tensor with the size of 640x480x6, and then a series of convolution operations are performed on the tensor with the size of 640x480x6, so as to obtain a fusion feature map.

2. And (5) feature splicing and fusion.

In this embodiment, the input layer extracts features of the visible light image and the infrared image respectively to obtain a corresponding visible light feature map and an infrared feature map, and then splices the visible light feature map and the infrared feature map together to obtain a fusion feature map output to the feature fusion layer.

The splicing processing refers to splicing of the visible light characteristic diagram and the infrared characteristic diagram in the channel dimension.

For example, the input layer inputs a visible light image and an infrared image, the two images are subjected to convolution to continuously extract features, two feature images of the visible light feature image and the infrared feature image are obtained, the two feature images are spliced in the channel dimension, and convolution operation is performed on the spliced feature images to obtain a fusion feature image.

In some embodiments, by performing feature fusion on the feature fusion layer, more abundant feature information is provided for the following output layer, and the feature fusion operation on the feature fusion layer can be implemented in the following two ways.

In the feature fusion layer, the feature fusion operation is performed after the feature extraction operation.

1. The features are fused from bottom to top.

The fusion feature map is subjected to multi-layer convolution processing to obtain a plurality of N first feature layers with different dimensions, for example, as shown in fig. 5, the feature fusion layer carries out multi-layer convolution on the fusion feature map to obtain 5 first feature layers such as C1, C2, C3, C4 and C5.

Where N is an integer greater than 2, N also characterizes the number of layers of the multilayer convolution process.

The ith first feature layer is convolved by 1x1 to obtain the ith-1 third feature layer, where i is an integer greater than 1, for example, as shown in fig. 5, 4 first feature layers such as C2, C3, C4, and C5 are convolved to obtain 4 third feature layers.

And downsampling the 1 st first feature layer to obtain a corresponding 1 st second feature layer, and fusing the i second feature layer and the i third feature layer from bottom to top from the 1 st second feature layer to obtain an i+1th fourth feature layer.

And in the feature fusion process from bottom to top, performing downsampling processing on the i+1 fourth feature layers, and taking the downsampling processing result as an i second feature layer.

For example, as shown in fig. 5, after down-sampling C1, the 1 st second feature layer obtained by down-sampling is fused and added with the 1 st third feature layer obtained by convoluting C2 by 1x1, and then the fusion and addition result is used as the 2 nd second feature layer, and the fusion and addition result is fused and added with the 2 nd third feature layer obtained by convoluting C3 by 1x1, so that feature fusion is performed sequentially from bottom to top.

After feature fusion is carried out from bottom to top, convolution processing is carried out on the N-M to N fourth feature layers to obtain corresponding M target feature vectors with different dimensions, wherein M is an integer greater than 1, N is an integer greater than 2, and the target feature vectors can be output.

For example, as shown in fig. 5, N is 5, m is 3, and after feature fusion from bottom to top, 3x3 convolution processing is performed on the 3 rd, 4 th, and 5 th fourth feature layers, so as to output 3 target feature vectors P3, P4, and P5.

Where C1 is the bottom-most feature layer in the multi-layer convolution and C5 is the top-most feature layer in the multi-layer convolution.

In this embodiment, the features obtained by the multi-layer convolution are downsampled so that the features obtained by the downsampling become the same size as the features obtained by the previous layer convolution, and the instrument performs addition processing on the features obtained by the downsampling of the layer and the previous layer with the number of channels and the feature size unchanged.

After the characteristics obtained by downsampling are subjected to characteristic fusion from bottom to top, a plurality of target characteristic vectors with different dimensions are obtained through convolution processing.

In actual implementation, the feature layer C1 at the bottom layer is changed into the same channel number as the previous layer through convolution of 1x1, then is changed into the feature layer with the same size as the previous layer through convolution operation of downsampling or step length of 2, and then is added with the previous layer through convolution of 1x 1.

This is passed up in turn, and finally added to the uppermost feature layer, and after a convolution of 3x3 size, P3, P4, and P5 are taken as the final target feature vectors.

The target feature vectors output by each of P3, P4 and P5 contain information of all feature layers, and targets with different dimensions can be detected by outputting a plurality of target feature vectors with different dimensions, so that the classification detection of the output layers is more accurate.

2. The features merge twice from the upper layer down and up again. In this embodiment, a top-down and bottom-up feature fusion process is included, where the bottom-up feature fusion is the same as the top-down feature fusion process described above.

And carrying out multi-layer convolution processing on the fusion feature map to obtain N fifth feature layers with different dimensions, wherein N is an integer greater than 2, and N also represents the number of layers of the multi-layer convolution processing.

For example, as shown in fig. 6, the feature fusion layer performs multi-layer convolution on the fused feature map to obtain 5 fifth feature layers such as C1, C2, C3, C4, and C5.

The N-1 fifth feature layer is convolved by 1x1 to obtain an N-1 sixth feature layer, that is, the other feature layers except the N fifth feature layer at the top layer need to be convolved by 1x1, for example, as shown in fig. 6, 4 first feature layers such as C1, C2, C3 and C4 are obtained, and C5 is not processed.

And upsampling the nth fifth feature layer to obtain an nth-1 seventh feature layer, e.g., upsampling C5 as shown in fig. 6 to obtain a 4 th seventh feature layer.

And then starting from the N-1 seventh characteristic layer, carrying out fusion addition on the N-1 seventh characteristic layer and the N-1 sixth characteristic layer from top to bottom to obtain N-1 first characteristic layers.

In the process of feature fusion from top to bottom, carrying out up-sampling treatment on the N-1 fourth feature layers, taking the up-sampling treatment result as the N-2 seventh feature layers, and carrying out fusion addition on the N-2 sixth feature layers.

For example, as shown in fig. 6, after up-sampling C5, the 4 th seventh feature layer obtained by up-sampling is fused and added with the 4 th sixth feature layer obtained by convolving C4 with 1x1, the result of the fusion and addition is regarded as the 4 th first feature layer, the 4 th first feature layer is up-sampled, and the fusion and addition are performed with the 3 rd sixth feature layer obtained by convolving C3 with 1x1, so that feature fusion is performed sequentially from top to bottom.

The nth fifth feature layer is directly used as the nth first feature layer, and after the second feature fusion from the bottom to the top is performed on the N first feature layers, a plurality of target feature vectors are output.

And (3) up-sampling the features obtained by the multi-layer convolution, so that the features obtained by the up-sampling become the same size as the features obtained by the next-layer convolution, and adding the features obtained by the up-sampling and the next-layer convolution in the same channel number and feature size, namely, performing feature fusion from top to bottom on the multiple features obtained by the up-sampling.

And carrying out convolution processing on the multiple features fused from top to bottom, then carrying out downsampling on the features obtained by convolution, so that the features obtained by downsampling are changed into the same size as the features obtained by convolution of the previous layer, and carrying out addition processing on the features obtained by downsampling and convolution of the previous layer, wherein the channel number and the feature size of the features are unchanged, namely carrying out feature fusion on the multiple features obtained by downsampling from bottom to top.

After the feature fusion from top to bottom and the feature fusion from bottom to top are performed for two times, the convolution processing is performed to obtain a plurality of target feature vectors with different dimensions.

In actual implementation, the feature layer C5 at the uppermost layer is convolved by 1x1 to become the same number of channels as the next layer, then deconvoluted to become the feature layer with the same size as the next layer, and then added to the next layer by convolution by 1x 1.

The output is used as the new bottom layer output layer characteristic through convolution of 3x3, and then P3, P4 and P5 are used as the final output target characteristic vector through feature fusion from bottom to top.

It can be understood that the bottom layer characteristic information is greatly enriched through the two-time fusion from top to bottom and from bottom to top, so that the detection of the subsequent output layer is more accurate, and the outputs of different dimensions of different layers are adopted, so that the maturity identification model can adapt to detection targets of different sizes.

The invention also provides cooking equipment which comprises the image acquisition device and a controller.

The image acquisition device is used for acquiring visible light images and infrared images of target foods, the controller is electrically connected with the image acquisition device, and the target maturity information corresponding to the target foods is output according to the visible light images and the infrared images acquired by the image acquisition device.

In actual implementation, the doneness recognition model described above may be mounted in the controller so that the controller can obtain the target doneness information from the visible light image and the infrared image.

In this embodiment, the image capturing apparatus includes a visible light image capturing device and an infrared image capturing device.

For example, the visible light image may be an RGB image of a target food being cooked on a kitchen range below the range, acquired by a visible light camera mounted on the range.

For example, the infrared image may be a thermal radiation image of a target food being cooked on a cooktop mounted on the range below the range.

In this embodiment, the output of the target doneness information may be expressed as at least one of the following:

first, the output of the target doneness information may be represented as a voice output.

In this embodiment, after the target doneness information is obtained by analyzing the visible light image and the infrared image, the target doneness information may be output by playing a voice through a speaker mounted on the cooking apparatus.

Second, the output of the target doneness information may be represented as a display output.

In this embodiment, a display screen is provided on the cooking device, or the cooking device is connected to a terminal device such as a mobile phone used by a user, and after obtaining the target doneness information according to the analysis of the visible light image and the infrared image, the target doneness information is displayed on the display screen of the cooking device or the terminal device in a text or image manner.

Third, the output of the target doneness information may be represented as a vibration output.

In this embodiment, after the target doneness information is obtained from the visible light image and the infrared image analysis, it may be output through a vibration device on the cooking apparatus or the terminal apparatus.

Of course, in other embodiments, the output of the target doneness information may also take other forms, including but not limited to flashing of a warning light, buzzing of a buzzer, etc., which may be specifically determined according to actual needs, and the embodiment of the present invention is not limited thereto.

A specific embodiment of outputting the target doneness information is described below.

And a visible light camera and an infrared camera are deployed on the smoke machine, once ignition of a user is found, the visible light image and the infrared image below the smoke machine are continuously distributed, and the target maturity information is output in a voice broadcasting mode.

The cooking device comprises a cooker and a cooking machine, wherein the cooker is arranged below the cooking machine, target food in the cooker is cooked potato meat, the target food comprises two foods of potato and meat, and the two images of visible light images and infrared images are analyzed to obtain target doneness information, wherein the doneness level of the potato is half-life, and the doneness level of the meat is full-ripeness.

The target food potato meat is broadcast through a loudspeaker arranged on the smoke machine, the meat is fully cooked, the potato is half-cooked, and the cooking can be continued, so that the user is prompted that the target food is not edible at present, and the cooking needs to be continued.

The target food is steak, two images of a visible light image and an infrared image are analyzed, and the target food steak is broadcasted through a loudspeaker and is fully cooked and edible, so that a user is prompted that the current target food is edible and needs to stop cooking continuously.

According to the cooking equipment provided by the invention, the identification of the visible light image is corrected through the information such as the food position represented by the infrared image, the degree of cooking of the food is accurately positioned by utilizing the temperature information reflected by the infrared image, the accuracy of identifying the degree of cooking of the food is effectively improved, the control of the cooking equipment is assisted, and the use experience of a user is improved.

In some embodiments, the controller of the cooking device may output a target control command for controlling the cooking fire according to the target doneness information, and adjust the energy consumption output of the cooking device in real time according to the judgment of the doneness, so as to achieve the purposes of improving the cooking effect and saving the energy consumption.

For example, a visible light camera and an infrared camera are deployed on the range hood, once ignition of a user is found, the visible light image and the infrared image are continuously distributed to obtain target maturity information, and cooking firepower of the cooker is adjusted according to the target maturity information.

The cooking device is characterized in that target foods in the cooking device heated by the cooking range are cooked potatoes, the target foods comprise two foods of potatoes and meats, two images of visible light images and infrared images are analyzed, corresponding target doneness information is output, the doneness level of the potatoes is half-life, and the doneness level of the meats is full-ripeness.

In order to ensure that the potatoes are cooked while avoiding overripening of the meat, the firepower of the kitchen range can be increased, so that two food materials in the potato meat are cooked quickly, and a good cooking effect is ensured.

For another example, the target food in the cooker heated by the cooker is steak, two images of a visible light image and an infrared image are analyzed, the doneness level of the output steak is fully cooked, the firepower of the cooker can be closed, excessive cooking of the steak is avoided, and a good cooking effect is ensured.

A specific embodiment of a cooking apparatus as a microwave oven will be described.

Step 1: and closing the door of the microwave oven, activating the trigger, and enabling the visible light camera and the infrared camera to be turned on.

Step 2: and simultaneously, a visible light image and an infrared image of the interior of the microwave oven are photographed.

Step 3: and (3) inputting two pictures of the visible light image and the infrared image as a group into the doneness recognition model to obtain position information, doneness type, category information and category confidence level which are output by the doneness recognition model in response.

Step 4: and determining whether the type of the maturity is required to be judged according to the category confidence, if not, ending, and not performing task processing.

Step 5: if yes, judging whether the category information of the target food in the image is the food which is already recorded during training, and if not, turning to the step 9.

Step 6: if yes, the target cooking degree information can be output through voice, and the current cooking progress of the user is prompted.

Step 7: the energy consumption output of the microwave oven can be automatically adjusted according to the current degree of cooking, such as weakening the microwave intensity of the microwave oven.

Step 8: when the food is detected to be cooked, the microwave oven can be controlled to stop running, and the user can be prompted according to the output information.

Step 9: the category information of the target food in the image is transmitted to a subsequent task for training analysis of the doneness recognition model.

The following describes a food doneness recognition device provided in an embodiment of the present invention, and the food doneness recognition device described below and the food doneness recognition method described above may be referred to correspondingly to each other.

As shown in fig. 7, the food doneness recognition device provided by the present invention includes:

an acquisition module 710 for acquiring an infrared image and a visible light image of a target food;

the processing module 720 is configured to input the infrared image and the visible light image into the doneness recognition model, and obtain target doneness information of the target food output by the doneness recognition model;

the maturity identification model is obtained by training a sample infrared image and a sample visible light image of food serving as samples and predetermined maturity information corresponding to the sample infrared image and the sample visible light image serving as sample labels.

According to the food ripeness recognition device provided by the invention, the recognition of the visible light image is corrected through the information such as the food position represented by the infrared image, and the food ripeness is precisely positioned by utilizing the temperature information reflected by the infrared image, so that the accuracy of the food ripeness recognition is effectively improved.

In some embodiments, the processing module 720 is configured to input the infrared image and the visible light image to an input layer of the doneness recognition model, and obtain a fusion feature map output by the input layer; inputting the fusion feature map to a feature fusion layer of the maturity identification model to obtain a plurality of target feature vectors with different dimensions and sizes output by the feature fusion layer; and inputting a plurality of target feature vectors with different dimensions to an output layer of the doneness recognition model to obtain target doneness information output by the output layer.

In some embodiments, the processing module 720 is configured to input a plurality of target feature vectors with different dimensions to the output layer, and perform convolution and classification to obtain category information, location information, category confidence and doneness type of the target food; and obtaining target maturity information output by the output layer based on the category information, the position information, the category confidence and the maturity type.

In some embodiments, the processing module 720 is configured to input a plurality of target feature vectors with different dimensions into the food category convolution network of the output layer, and obtain category information output by the food category convolution network;

inputting a plurality of target feature vectors with different dimensions into a position convolution network of an output layer to obtain position information and category confidence coefficient output by the position convolution network;

And inputting a plurality of target feature vectors with different dimensions into a doneness convolution network of an output layer to obtain the doneness type output by the doneness convolution network.

In some embodiments, the classification identification of the output layer in the processing module 720 is a non-anchor frame identification.

In some embodiments, the processing module 720 is configured to input the infrared image and the visible light image to the input layer, and stitch the infrared image and the visible light image to obtain a target fusion image; and extracting the characteristics of the target fusion image to obtain a fusion characteristic image output by the input layer.

In some embodiments, the processing module 720 is configured to input an infrared image and a visible light image to the input layer, and perform feature extraction on the infrared image and the visible light image respectively to obtain an infrared feature map and a visible light feature map; and splicing the infrared characteristic diagram and the visible light characteristic diagram to obtain a fusion characteristic diagram of the input layer output.

In some embodiments, the processing module 720 is configured to input the fused feature map to a feature fusion layer, and perform a multi-layer convolution processing on the fused feature map to obtain N first feature layers with different dimensions; carrying out 1x1 convolution on the ith first characteristic layer to obtain an ith-1 third characteristic layer; downsampling the 1 st first characteristic layer to obtain a 1 st second characteristic layer; starting from the 1 st second characteristic layer, fusing the i second characteristic layer with the i third characteristic layer to obtain an i+1th fourth characteristic layer, wherein the i second characteristic layer is a downsampling result of the i+1th fourth characteristic layer; carrying out convolution treatment on the N-M to N fourth feature layers to obtain M target feature vectors with different dimensions; wherein i and M are integers greater than 1 and N is an integer greater than 2.

In some embodiments, the processing module 720 is configured to input the fused feature map to a feature fusion layer, and perform a multi-layer convolution processing on the fused feature map to obtain N fifth feature layers with different dimensions; carrying out 1x1 convolution on the N-1 fifth characteristic layer to obtain an N-1 sixth characteristic layer; upsampling the nth fifth feature layer to obtain an nth-1 seventh feature layer; and starting from the N-1 seventh characteristic layer, fusing the N-1 seventh characteristic layer with the N-1 sixth characteristic layer to obtain N-1 first characteristic layer, and taking the N fifth characteristic layer as the N first characteristic layer.

Fig. 8 illustrates a physical structure diagram of an electronic device, as shown in fig. 8, which may include: processor 810, communication interface (Communications Interface) 820, memory 830, and communication bus 840, wherein processor 810, communication interface 820, memory 830 accomplish communication with each other through communication bus 840. The processor 810 may invoke logic instructions in the memory 830 to perform a food doneness recognition method comprising: acquiring an infrared image and a visible light image of a target food; inputting the infrared image and the visible light image into a doneness recognition model to obtain target doneness information of target food output by the doneness recognition model; the maturity identification model is obtained by training a sample infrared image and a sample visible light image of food serving as samples and predetermined maturity information corresponding to the sample infrared image and the sample visible light image serving as sample labels.

Further, the logic instructions in the memory 830 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Further, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being storable on a non-transitory computer readable storage medium, the computer program, when executed by a processor, being capable of executing the method for identifying food doneness provided by the above method embodiments, the method comprising: acquiring an infrared image and a visible light image of a target food; inputting the infrared image and the visible light image into a doneness recognition model to obtain target doneness information of target food output by the doneness recognition model; the maturity identification model is obtained by training a sample infrared image and a sample visible light image of food serving as samples and predetermined maturity information corresponding to the sample infrared image and the sample visible light image serving as sample labels.

In another aspect, embodiments of the present invention also provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method for identifying food doneness provided in the above embodiments, the method comprising: acquiring an infrared image and a visible light image of a target food; inputting the infrared image and the visible light image into a doneness recognition model to obtain target doneness information of target food output by the doneness recognition model; the maturity identification model is obtained by training a sample infrared image and a sample visible light image of food serving as samples and predetermined maturity information corresponding to the sample infrared image and the sample visible light image serving as sample labels.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

The above embodiments are only for illustrating the present invention, and are not limiting of the present invention. While the invention has been described in detail with reference to the embodiments, those skilled in the art will appreciate that various combinations, modifications, or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and it is intended to be covered by the scope of the claims of the present invention.

Claims

1. A method for identifying the doneness of a food, comprising:

acquiring an infrared image and a visible light image of a target food;

2. The method of claim 1, wherein the inputting the infrared image and the visible light image into a doneness recognition model to obtain the target doneness information of the target food output by the doneness recognition model comprises:

3. The method of claim 2, wherein inputting the plurality of target feature vectors of different dimension sizes to an output layer of the doneness recognition model, obtaining the target doneness information output by the output layer, comprises:

4. The method of claim 3, wherein inputting the plurality of target feature vectors of different dimensions to the output layer for convolution and classification to obtain the category information, location information, category confidence and doneness type of the target food comprises:

5. The method of claim 3, wherein the classification of the output layer is non-anchor frame identification.

6. The method of claim 2, wherein the inputting the infrared image and the visible light image to the input layer of the doneness recognition model, obtaining a fusion profile of the input layer output, comprises:

7. The method of claim 2, wherein the inputting the infrared image and the visible light image to the input layer of the doneness recognition model, obtaining a fusion profile of the input layer output, comprises:

8. The method for identifying food doneness according to any one of claims 2-7, wherein inputting the fusion feature map to a feature fusion layer of the doneness identification model, obtaining a plurality of target feature vectors of different dimensions output by the feature fusion layer, comprises:

wherein i and M are integers greater than 1 and N is an integer greater than 2.

9. The method for identifying food doneness according to claim 8, wherein inputting the fusion feature map to the feature fusion layer, performing a multi-layer convolution processing on the fusion feature map to obtain a plurality of N first feature layers with different dimensions, and the method comprises:

10. A food doneness recognition device, comprising:

11. A cooking apparatus, comprising:

12. The cooking apparatus of claim 11, wherein the controller is further configured to output a target control instruction for controlling cooking power of the cooking apparatus based on the target doneness information.

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the food doneness recognition method according to any one of claims 1 to 9 when the program is executed by the processor.

14. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the food doneness recognition method according to any one of claims 1 to 9.

15. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the food doneness recognition method according to any one of claims 1 to 9.