WO2021091481A1 - Système d'identification d'objet et d'estimation de quantité de contenu par l'utilisation d'images thermiques et à spectre visible - Google Patents
Système d'identification d'objet et d'estimation de quantité de contenu par l'utilisation d'images thermiques et à spectre visible Download PDFInfo
- Publication number
- WO2021091481A1 WO2021091481A1 PCT/SG2020/050494 SG2020050494W WO2021091481A1 WO 2021091481 A1 WO2021091481 A1 WO 2021091481A1 SG 2020050494 W SG2020050494 W SG 2020050494W WO 2021091481 A1 WO2021091481 A1 WO 2021091481A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- thermal
- images
- visible spectrum
- processor
- camera
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/12—Details of acquisition arrangements; Constructional details thereof
- G06V10/14—Optical characteristics of the device performing the acquisition or on the illumination arrangements
- G06V10/143—Sensing or illuminating at different wavelengths
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/255—Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/087—Inventory or stock management, e.g. order filling, procurement or balancing against orders
Definitions
- the present invention relates to a system for object identification and content quantity estimation through use of thermal and visible spectrum images.
- An Internet-connected fridge that can automatically track the identity and quantity of items placed inside them can enable several useful applications, such as allowing a consumer to ascertain commonly used items that need to be replenished or purchased.
- an object profiling system comprising: a visible spectrum camera; a thermal camera, wherein the visible light spectrum camera and the thermal camera are disposed to capture a common zone; and a processor configured to: receive a stream of visible spectrum images and thermal images of the common zone, taken respectively by the visible spectrum camera and the thermal camera, when an object is detected to pass through the common zone; and isolate the object in the visible spectrum images through cross reference against co-ordinates of its thermal silhouette found in the thermal images.
- an object profiling system comprising a thermal camera disposed to capture a zone; and a processor configured to: receive a stream of thermal images of the zone taken by the thermal camera when an object is detected to pass through the zone; and determine a percentage of content remaining in the object based on a detected temperature difference between its occupied portion and its empty portion through analysis of a thermal silhouette of the object in the thermal images.
- Figure 1 shows visual representations of various operation stages of an object profiling system configured to perform object isolation in both thermal and visible spectrum images to facilitate object identification and content quantity estimation.
- Figure 2 shows one possible workflow of object isolation in both thermal and visible spectrum images to facilitate object identification and content quantity estimation.
- Figures 3(a) to 3(e) illustrate application of the principal of optical flow to isolate objects that have high motion components and are thereby closer to the image sensors.
- Figure 4 shows a thermal silhouette
- Figure 5A is a thermal image when containers are just taken out from a refrigerator.
- Figure 5B shows the thermal image of the same containers after they are kept outside after an interval.
- Figure 6 shows a processing pipeline that is applied to thermal images after their capture.
- Figure 7 plots the estimation error for juice, milk and water as a function of the ambient exposure duration.
- Figure 8 shows potential camera deployment positions.
- Figure 9 plots the fraction of extracted images whose Intersection Over Union (IoU), a measure of the accuracy of our isolation technique, exceeds a specified threshold.
- IoU Intersection Over Union
- Figure 10 plots precision/recall values for item identification for images whose IoU value exceeds the corresponding x-axis value, illustrating how state-of-the-art techniques for item identification work better when the isolated images have higher IoU
- Figure 11 plots the distribution of Item Coverage ( ICov ) values for both combined and RGB motion-vector only methods.
- Figure 12 plots estimated quantity for juice, milk and water and 3 different fractional quantities.
- Figure 13A shows mean quantity estimation error and Figure 13B shows the result of applying clustering to separate empty and filled portions of the item.
- the present application in a broad overview, has machine-learning (ML) based computer vision applications and relates to specific object isolation from an image which also contains extraneous or irrelevant background or foreground objects.
- the isolated object can then be extracted to be identified more readily and accurately by a machine learning algorithm, as opposed to having the machine learning algorithm process the entire unfiltered image to determine which the object of interest is and then identify it.
- thermal images and visible spectrum images i.e. images having red green and blue components in the visible light spectrum
- a temperature regulated enclosure such as a cooled environment like a refrigerator; or a heated environment.
- the thermal images and the visible spectrum images are of a common zone, taken over a same time interval.
- the occurrence or appearance of a thermal silhouette, which contrasts with its surroundings, in thermal images of the common zone, signifies that one or more objects for identification are detected in the common zone.
- the thermal silhouette is distinctive of an object appearing in the common zone because its thermal profile (or temperature profile) is markedly different from the thermal profile of at least its adjacent surroundings, whereby the thermal profde of the surroundings is generally homogenous, especially if the surroundings are objects at ambient temperature. Focusing on this thermal silhouette thus readily excludes irrelevant objects.
- the co-ordinates of the thermal silhouette are cross-referenced to, or mapped onto, the visible spectrum images, i.e. co-ordinates in the visible spectrum images that correspond to the co-ordinates of the thermal silhouette are found. These corresponding co-ordinates in the visible spectrum images then provide the location of the object in the visible spectrum images, which serves to isolate the object in the visible spectrum images.
- the visible spectrum images are also captured over the same time interval as the thermal images, whereby the cross reference against the thermal silhouette in the thermal images allows for exclusion of irrelevant objects in the visible spectrum images to be more readily achieved.
- this cross referencing is performed for each pair of visible spectrum and thermal images taken at the same time, and may optionally be performed for such pairs of visible spectrum and thermal images whose timestamps lie within a small interval D, the interval being effectively used to accommodate slight divergences in the timestamps associated with the thermal and RGB sensors
- thermal silhouette in the thermal images that is used as the basis for isolating the object in the visible spectrum images is a measurement of heat exchange occurring with respect to the object.
- thermal images of the common zone will include objects that have ambient temperature.
- the object for identification has a temperature that is different from ambient temperature
- its thermal silhouette will have a contrast which allows it to be separable from thermal silhouettes of other objects in the thermal images.
- isolation also occurs when the thermal images are processed for presence of a thermal silhouette caused by introduction of an object.
- isolation thus refers to being able to separate the object to be identified from all other objects in thermal images or visible spectrum images, depending on context. Object isolation in both the thermal and visible spectrum images facilitates the following further applications: object identification and content quantity estimation,
- a boundary containing the isolated object is extracted and transmitted for the object to be identified.
- Such extraction causes only a segment of the visible spectrum images to be transmitted, this segment being a crop of the object and excludes other objects present in the original visible spectrum images.
- Identification of the object becomes more accurate and more efficient since there is less data and less likelihood of the presence of extraneous objects that, for example, a classifier has to process to identify the object.
- a percentage of content, by volume, remaining in the object is estimated. This is achieved by identifying which fraction of the thermal silhouette is attributable to an occupied portion and which is attributable to an empty portion.
- the occupied portion is distinguished from the empty portion for there being a temperature difference between the two, which can be detected in the thermal silhouette. This is because the occupied portion has a specific heat property that is different to that of the empty portion, whereby the occupied portion gains heat slower than the empty portion (when the environment is cooled) and the occupied portion loses heat slower than the empty portion (when the environment is heated).
- a percentage of content remaining in the object is then obtained by comparing the occupied portion against a sum of the occupied and empty portions. It will be appreciated that quantity estimation does not require data from the visible spectrum images. The visible spectrum images become required when the object, whose content is estimated using the thermal images, needs to be identified.
- Figure 1 shows visual representations of various operation stages of an object profiling system configured to perform object isolation in both thermal and visible spectrum images to facilitate object identification and content quantity estimation.
- Stage 101 occurs when an object is detected to pass through a common zone monitored by a thermal camera and a visible spectrum camera. For instance, once the door of a refrigerator is opened, the visible spectrum camera and the thermal camera capture images of user interaction with the object.
- Stage 103 uses a combination of thermal & optical based approaches to locate and isolate an image segment (415, 413; explained in greater detail in respect of the description for Figure 2) containing the object using the co-ordinates of its thermal silhouette as reference. The isolated image segment from each of the visible spectrum images is then extracted.
- Stage 105 sees the extracted image segment fed to a classification tool (such as Deep Neural Networks (DNNs)) to visually identify the object.
- a classification tool such as Deep Neural Networks (DNNs)
- DNNs Deep Neural Networks
- Stage 107 uses another machine learning based pipeline over the thermal images of to quantify an unoccupied portion of the object. Stage 107 occurs, for example, when the object is returned to the fridge.
- Figure 2 shows one possible workflow 200 of object isolation in both thermal and visible spectrum images to facilitate object identification and content quantity estimation.
- the workflow 200 is effected by an object profiling system that has a visible spectrum camera 204, a thermal camera 206 and a processor 208.
- the embodiment used in Figure 2 is a refrigerator 202.
- Figure 2 shows the object profiling system being used in a temperature regulated enclosure that provides a cooled environment, use in or use with other temperature regulated enclosures is also possible.
- the object profiling system may be part of inventory management architecture of a warehouse to monitor cargo being removed or inserted into a temperature regulated enclosure of a refrigerator truck.
- the object profiling system is external to the refrigerator truck and is used with such a temperature regulated enclosure, where the visible spectrum camera 204 and the thermal camera 206 are located, for instance, in a location of the warehouse where cargo removed or inserted into the refrigerator truck passes.
- the object profiling system is used with a temperature regulated enclosure that provides a heated environment, such as a thermostat controlled heating system.
- the visible light spectrum camera 204 and the thermal camera 206 are disposed to capture a common zone, i.e. the two cameras 204 and 206 are located such that each of their field of view focuses on a same space.
- the common zone is selected to provide an unobstructed view even when the temperature regulated enclosure is full, such as proximate to a space occupied by a door of the temperature regulated enclosure when closed.
- the common zone could be, for example, its doorway.
- the two cameras 204 and 206 may be located inside the temperature regulated enclosure; or located outside the enclosure through mounts that are coupled to an exterior surface of the temperature regulated enclosure.
- the processor 208 is configured to receive a stream of visible spectrum images 212 and thermal images 214 of the common zone, taken respectively by the visible spectrum camera 204 and the thermal camera 206, when an object is detected to pass through the common zone.
- the processor 208 is signalled to expect for objects to pass through when the refrigerator 202 door is opened.
- a door sensor 210 (such as a magnetic reed switch attached to the door) is activated to trigger the visible spectrum camera 204 and the thermal camera 214 to take images of the common zone.
- the visible spectrum camera 204 and the thermal camera 214 may then capture a whole sequence or sample images of the user adding items into the refrigerator 202 or removing objects from the refrigerator 202.
- the object profiling system Based on the data collected (i.e. stream of visible spectrum images 212 and thermal images 214), the object profiling system initiates one of several processing pipelines to identify the object and its residual content , as described below.
- Object identification has two parts: (a) segmentation of object from full frames of a video/image taken by the visible spectrum camera 204, whereby a boundary containing the object in the visible spectrum images 212 is extracted; and (b) classification of the segmented object image to obtain an object label, whereby the extracted boundary is sent to a classifier to identify the object.
- One or more of two pipelines, namely Visual-only Pipeline 216; and Thermal + Visual Pipeline 220 may be used for object isolation.
- the Visual-only Pipeline 216 recognises that a user’s interaction with an object such as a food item (either removal or insertion into the refrigerator 202) involves a directional motion either away from or towards the visible spectrum camera 204.
- the approach illustrated in Figures 3(a) to 3(e) first applies the principal of optical flow to identify image segments that are moving (across consecutive frames), thereby eliminating the parts of the image that form a static background.
- Such optical flow estimation identifies motion vectors (direction and displacement magnitude) for each pixel in an image.
- a feature vector is created where each pixel feature consists of its coordinates, as well the magnitude and direction of its motion vector-i.e., ⁇ x, y, motion-mag, and motion-dir ⁇ .
- AMM average motion magnitude
- Figure 3(c) This is based on intuition that the food item of interest is usually the moving object closest to the visible spectrum camera 204, and thus likely to have the largest displacement magnitude from the visible spectrum camera 204 perspective.
- the resulting cluster ( Figure 2(c)) contains both the food item, as well as possibly additional background pixels.
- the Thermal + Visual Pipeline 220 is based on the insight that a refrigerated item will typically be colder than either a body part handling the refrigerated item or ambient temperature. Generalising this concept onto objects that are intended for storage in a heated environment, these objects are at temperatures higher than a body part handling the object or ambient temperature.
- the processor 208 is configured to locate co-ordinates of a region within the thermal images 214 having a thermal silhouette 411 (refer Figure 4) of different intensity from its surroundings.
- the thermal camera 206 easily isolates a cold object, as its pixels are darker than other ambient objects (such as a hand holding the object) and a background of the thermal images 214.
- a pixel intensity-based segmentation mechanism may be used, where one or more cold objects from the thermal image 214 (a frame with timestamp t) is located by selecting all pixels below a threshold value. The Cartesian coordinates of all the selected pixels are computed, thus segmenting the cold item from its surroundings in the thermal image 214.
- a bounding box 413 (i.e., the smallest rectangular region that contains an entire contour area, see Figure 4) is calculated to represent the segmented object; in a more generalized embodiment, the bounding box can have additional predefined shapes (e.g., circle or trapezoid) to reflect the different possible shapes of the object being identified.
- the bounding box 413 in the thermal camera 206 coordinates are identified, those coordinates are translated into corresponding pixel coordinates for the visible spectrum camera 204.
- corresponding co-ordinates within the visible spectrum images 212 to those of the thermal silhouette 411 are identified by translating the co-ordinates of the thermal silhouette 411 in the thermal images 214.
- a region defined by the corresponding co-ordinates is demarcated as the object in the visible spectrum images 212. Isolation of the object (refer label 415 in Figure 1) in the visible spectrum images 212 is thus achieved through cross reference against co-ordinates of its thermal silhouette 411 found in the thermal images 214.
- the processor 208 is configured to correct for clock offset between the thermal camera 206 and the visible spectrum camera 204 when selecting the thermal images 214 and the visible spectrum images 212 of the common zone to perform the isolation of the object in the visible spectrum images 212. For instance, all the visible spectrum images 212 that have a timestamp (t - D, t + D), where D represents the time offset and t is the timestamp of the thermal camera 206 are selected. For each of the visible spectrum images 212, a boundary containing the isolated object is extracted, this isolation being performed through identifying corresponding co-ordinates in the visible spectrum images 212 to the co-ordinates of the thermal silhouette 411 in the thermal images 214, as explained above.
- this extracted boundary 415 is a crop or segment of the visible spectrum image 212, whereby the extracted boundary 415 is a bounding box whose co-ordinates in the visible spectrum image 212 are set from a transformation of the bounding box 413 of the corresponding thermal image 214.
- Each of these extracted boundaries 415 (one corresponding to each frame) is then sent downstream to the item recognition DNN classifier 218 to identify the object, where it is also determined which of them indeed contain the object.
- the output of a pipeline 222 that uses the working principles of the Visual-only Pipeline 216 described above may be combined with the output of the Thermal + Visual Pipeline 220.
- the pipeline 222 sees the visible spectrum camera 204, when triggered, send a stream of visible spectrum images 212 to the processor 208.
- the processor 208 then isolates an object in the visible spectrum images 212 through detection of its motion.
- the approach used in the pipeline 222 uses the visible spectrum camera 204 data to first compute object motion vectors. This is followed by clustering, which locates within the visible spectrum images 212 a boundary containing the object isolated through motion detection, and thresholding of such vectors to extract the portion of the visible spectrum images 212 where the object is isolated (i.e. the boundary containing the object isolated through motion detection is extracted).
- This pipeline 222 is applied to a selection of frames, t e ⁇ t - D, t + D ⁇ ), from the visible spectrum images 212 so as to provide another set of candidate images to the DNN classifier 218, being the image content within the extracted boundary containing the object isolated through motion detection.
- the DNN classifier 218 then has two extractions, each from the same visible spectrum images 212, to perform object identification. These two extractions are the image content from the extracted boundary resulting from the object isolation through its thermal silhouette and the image content from the extracted boundary resulting from the object isolation through its motion detection.
- each individual extracted image is classified using machine learning (ML) techniques to obtain the probability of the object belonging to different ‘classes’ (including a null class — i.e., the possibility that the image does not correspond to any object).
- the DNN classifier 218 first discards images where the ‘null’ class has the highest probability, these images being content from the extracted boundary resulting from the object isolation through its thermal silhouette and content from the extracted boundary resulting from the object isolation through its motion detection. Finally, the different probabilities, across multiple classes, for the remaining set of images (i.e. the image contents of the remaining extracted boundaries) are combined statistically to infer a most probable label for the object in question.
- the classifier is not limited to DNN configuration and may also use other machine learning techniques such as support vector machines (SVMs).
- SVMs support vector machines
- the classifier may also be external to or integrated with the object profiling system. In the case where an external classifier is used, the extracted images are received through a transmitter.
- a suitable classifier for the DNN classifier 218 may be pre-trained by an external entity (e.g., an image analytics company) with a, preferably large, corpus of representative images of, for example, various food items if these are the objects that are to be identified.
- an external entity e.g., an image analytics company
- the DNN classifier need not be trained specifically with images corresponding to a specific fridge deployment, as the object isolation technique described above ensures that the isolated images have minimal extraneous background and thus appear similar to the representative images used for such training. For each image frame, the classifier then outputs the likely label (along with the confidence values).
- the extraction process retrieves a sequence of multiple (typically 30-40) images, of which 5-10 contain the food item.
- This series of classifier output labels are then further fed through a separate classifier that uses the frequency of occurrence and associated confidence levels to output the food item label with the highest likelihood, above a minimum threshold.
- a suitable classifier receives multiple possible food item images.
- the Thermal + Visual Pipeline 220 provides one coordinate-transformed image for each frame 226 of the visible spectrum images 212 with a timestamp within A of a frame of the thermal images 214, whereas the approach used in the pipeline 222 provides an image for every frame 228 of the visible spectrum images 212 with a foreground cluster exceeding the motion threshold.
- the object recognition process uses the following steps:
- each interaction involves a sequence of, for example, S image frames, provided by both the Thermal + Visual Pipeline 220 and the pipeline 222.
- Each frame (226, 228) is individually passed through the classifier, generating a probability/confidence value for each of the K + 1 labels 230, 232.
- the classifier is thus configured to compute label probabilities 230, 232 for each of the extracted boundaries 415 and identify the object from the label 234 with the highest probability from the computed label probabilities.
- the object profiling system implementing the workflow 200 in Figure 2 also has a display 224 with which the processor 208 is in communication.
- the processor 208 receives the identity 236 of the object from the DNN classifier 218; and shows the identity 236 of the object in the display 224
- the thermal images 214 are also fed through a quantity estimation pipeline 238.
- the quantity estimation pipeline 238 uses a non-intrusive quantity estimation technique that is both robust to different ambient lighting conditions and opaqueness of the object to determine an amount (by volume) of content left in the object.
- This pipeline 238 works on the principle of differential heating of container versus dispensable content in the container, both the container and its dispensable content being associated with the object sought to be identified in the Thermal + Visual Pipeline 220 and the pipeline 222.
- Table 1 lists the specific heat capacity of common liquid/solid food items and typical container material.
- food items have significantly higher specific heat than typical container material.
- the part of the container in direct contact with the food item liquid or solid
- the larger the specific heat of the food item the higher the difference between itself and the container and thus the larger the expected differential between the thermal intensity of the empty versus occupied parts of the container.
- KJ/kg/ C Specific Heat of Substances
- the thermal camera 206 utilises this temperature difference to estimate a remaining quantity inside the container. Differentiation depends on the thermal resolution of the thermal camera 206; where commodity cameras (e.g., the Raspberry® PI compatible Bricklet camera) typically have resolution of 0.1 ° C or lower.
- Figures 5 A and 5B each show two thermal images, taken by such commodity cameras, of two cold containers (the left one 502 being full and the right one 504 partially filled).
- the thermal image of the partially filled container 504 shows two regions of different pixel intensities, with the empty region having higher temperature values (less dark pixels) and the ‘filled’ region having lower temperature values (darker pixels).
- the quantity estimation pipeline 238 is triggered when the object is returned into the refrigerator 202 because a temperature differential is absent when the refrigerated object is first taken out (assuming that the duration for which the object was inside the fridge was sufficiently long to ensure that the entire object had cooled uniformly).
- the thermal silhouette of the object in each of the thermal images 214 is fed through an unsupervised classifier that demarcates the object container pixels into two spatially contiguous clusters.
- the partial area of the colder cluster (corresponding to an occupied portion of the container), relative to the area of the overall container, is used to estimate the residual content (by volume percentage).
- the processor 208 distinguishes an occupied portion 153 of the thermal silhouette from an empty portion 155 of the thermal silhouette, based on a detected temperature difference between the occupied portion 153 and the empty portion 155. A percentage of content remaining in the object is then estimated by comparing the occupied portion 153 against a sum of the occupied and empty portions 153, 155. The processor 208 is also further configured to distinguish the occupied portion 153 from the empty portion 155 by identifying which has a temperature closer to that of ambient temperature.
- each image is passed through processing pipeline 600 shown in Figure 6, which performs the following functions: partial capture check; object segmentation and noise removal; and occlusion removal and clustering. Each of these functions is described below.
- thermal camera 206 Due to continuous capture during user-item interaction the thermal camera 206 will generate multiple thermal images 214 of the object. Because of underlying motion dynamics, some of the thermal images 214 will capture the object only partially, while others will have a more complete view of the object. To eliminate partial captured images (which can be ignored for estimating quantity), partial capture check has the processor 208 checks if a contour of the object intersects with a boundary of the thermal silhouette of the object. If so, the processor 208 concludes that the thermal silhouette only partially captures the object and discards the thermal image 214 with such a thermal silhouette.
- the invention employs clustering and contour detection.
- this clustering and contour detection has the processor 208 configured to construct a contour of the object from the occupied and empty portions of the thermal silhouette in a thermal image 214.
- Occluded pixels Depending on user interaction pattern, one or more parts of the object can be occluded by, for example, the user’s hand. This occlusion is also evident (as high brightness pixels) in the thermal image 214, and can cause an under-estimation of the object volume.
- Occlusion is determined to be present from the processor 208 analysing a collection of the thermal images 214. If the processor 208 detects that object contours from each of these thermal images 214 have different pixel sizes, the processor 208 concludes that occlusion is present in at least one of these thermal images 214, where the thermal image 214 with the largest or biggest object contour is likely to be the one providing a contour that is closest to the actual shape of the object.
- the processor 208 Upon detection of the presence of occlusion, the processor 208 is configured to recognise one or more portions of the thermal silhouette attributable to such occlusion, based on determining a difference between a size of the thermal silhouette in a thermal image and a thermal image with the biggest thermal silhouette.
- the processor 208 populates the occluded portions using interpolation to fit the constructed contour of the object, such as extending the detected contour to a more regular (often rectangular) shape.
- the processor 208 then confers onto the populated portion a thermal profile that corresponds with its surroundings, for instance by giving the extended contour an estimated thermal value, computed as the median of neighboring non-occluded pixels.
- Clustering is applied on the pixel values of the extended contour obtained from the previous step. If the object is full, there should only be a single cluster, whereas a partially occupied item should be separable into two clusters. As one embodiment of techniques for determining the optimal number of clusters, a Silhouette Coefficient method may be used to resolve between these two alternatives. If the number of preferred clusters is two, compute a fractional quantity of content remaining in the object by dividing a pixel count of the cluster attributable to the occupied portion (which will have a lower temperature) by the total pixel count in both the cluster attributable to the occupied portion and the cluster attributable to the empty portion.
- the processing pipeline 600 may also include additional functionality which is not shown in Figure 6, such as averaging. Given multiple valid thermal images 214 for an interaction episode, the final quantity estimate is obtained by averaging the fractional estimates of each image.
- the processor 208 can update a repository of refrigerated food contents with the object identity and its remaining content.
- Such changes may be pushed to a Web server, which can analyse whether they meet conditions that trigger the generation of relevant alerts (e.g. “send an SMS if a container with residual quantity i ' 20% has been sitting in the fridge without any user interaction for more than a week”).
- a user opens the refrigerator 202, takes a juice carton and consumes a portion of its content.
- the Thermal + Visual Pipeline 220 is triggered when the juice carton is removed and infers the retrieved item: Juice Carton Product A.
- the user reaches into the refrigerator 202 and grabs two pouches of yoghurt, which are emptied.
- the Thermal + Visual Pipeline 220 tracks the new food items that the user has retrieved — two pouches of Yogurt Product B.
- the quantity estimation pipeline 238 monitors this act of inserting a food item, identifies that the item is Juice Carton Product A and also estimates that the carton is now only 25% full. This quantity estimation can be transmitted to a back-end portal, which can asynchronously trigger relevant actions-e.g., generating a ‘Low Juice’ alert.
- the user may also insert a can of beverage in the refrigerator 202 before closing the door.
- the Thermal + Visual Pipeline 220 tracks this object insertion, identifying the object as “can of Beverage Product C" and thereby updates the repository of the refrigerator 202 content.
- the object profiling system allows combined use of a thermal camera (which detects salient temperate differences and can thus eliminate the ambient background) and a visible spectrum camera to precisely extract a segment/portion of an image that contains only the object that is to be identified, along with its remaining content. This extraction is done automatically, without additional user input and occurs while the user performs natural/normal interactions with the fridge. Accurate extraction of the object image is important as Machine Learning-based (ML) techniques for item recognition work well if they are provided with a well-segmented portion of the object.
- ML Machine Learning-based
- Residual quantity estimation that uses visual (RGB) sensing with application of a ML-based technique to infer the remaining quantity of food, can operate only on transparent containers, or requires additional contact sensors (e.g. weight sensors).
- the quantity estimation pipeline 238 is based on appropriate analysis of the thermal profde of the object and minute temperature differences between empty and occupied portions of the object. Accordingly, there is no requirement for the object to be transparent and has been demonstrated to work on objects having opaque containers made of paper, as well as translucent containers (e.g., plastic).
- Privacy concerns can be addressed by appropriate adaptation of the camera hardware, such as narrowing their Field-of-View or altering their placement such that the thermal camera 206 and the visible spectrum camera 204 capture a common zone that is within a temperature regulated enclosure that both cameras are designated to monitor. Any increase in occlusion can be resolved by modifying the image extraction to accommodate multiple simultaneous images (of varying occlusion) from multiple cameras.
- the processor 208 transmits extracted images to an external classifier for object identification, the processor 208 ensures that it is only the extracted images that are cropped from the full frame visible spectrum images 212 that are transmitted. The portion of the visible spectrum images 212 that remains after image extraction is deleted by the processor 208 so as to limit privacy exposure.
- IR+RGB thermal and visual sensing
- IR infra-red
- Similar approaches can also find use in industrial/manufacturing scenarios — e.g., to improve the identification of defective parts of products by using thermal profiles to isolate the contours of such parts.
- IR-based Remote Quantity Estimation can be used for remote sensing of hot/cold objects inside containers-e.g.: (a) to perform remote inspection of liquid quantities in cargo containers by simply placing them in hot/cold environments and noting resulting possible thermal variations; (b) to verify the purity of unconsumed refrigerated medicines, by combining thermal based quantity estimation of such medicines with weight sensors to verify the specific density of the liquid content.
- the quantity estimation pipeline 238 can function as a thermal only pipeline, whereby the only image stream that the object profiling system requires in this mode is from the thermal camera 206. Visible spectrum images 212 from the visible spectrum camera 204 are not required by the object profiling system to estimate a percentage of content remaining in objects that are to be monitored.
- An object profiling system which is mainly directed at estimating a percentage of content remaining in objects thus has the thermal camera 206 disposed to capture a zone where such objects will pass through.
- the processor 208 is configured to receive a stream of thermal images of the zone taken by the thermal camera 206 when an object is detected to pass through the zone.
- the processor 208 determines a percentage of content remaining in the object based on a detected temperature difference between an occupied portion of the object and an empty portion of the object through analysis of a thermal silhouette of the object in the thermal images 214.
- This object profiling system omits the visible spectrum camera 204.
- an object profiling system Possible applications of such an object profiling system would be in circumstances where the identity of the objects to be monitored is already known, such as delivery of goods that are in accordance with an itinerary or a consignment of expected objects having dispensable content, negating the requirement for a visible spectrum camera to facilitate object identification.
- One purpose of the object profiling system may therefore be to ensure that the goods have not been tempered with.
- the objects that are to be monitored need not necessarily have identical shapes, since the content residual assessment performed by the object profiling system would be to ascertain that the objects contents are at full occupied levels.
- the object profiling system is optionally integrated with a temperature regulated enclosure that allows the object to maintain its temperature difference.
- the object profiling system can thus operate independently from its associated temperature regulated enclosure with which it is associated.
- One possible use case is to monitor content of chilled bottles, all of the same brand, being taken out from a refrigerator truck, by an automated robot.
- Each of the chilled bottles may be heated by a thermal source for a relatively short duration that allows its occupied portion and its empty portion to have a temperature difference, which then allows the processor 208 working in conjunction with the thermal camera 206 to determine which of the bottles are not full.
- the robot can then automatically flag such bottles for subsequent inspection.
- each of the 3 liquids was placed outside the fridge for a duration T a that varied between ⁇ 0, 5, 15, 30, 60, 90, 150, 200, 450, 800, 1100, 1800 ⁇ seconds.
- Figure 7 plots the estimation error for all 3 liquids, as a function of the ambient exposure duration T a .
- the quantity estimation error is typically less than 15-20% for all liquids, indicating our thermal based approach provides good coarse-grained quantity discrimination capability.
- the IR sensor has a relatively low resolution (80 by 60 pixels).
- Door Contact sensor Normally open magnetic reed switch.
- the IR and visible light camera sensors were positioned to support multiple concurrent objectives: (a) maximize gesture coverage-i.e., support the video based capture of user-item interactions performed in a variety of ways, across different shelves of the fridge; (b) minimize occlusion-i.e., ensure that the food item is maximally visible within individual frames (to aid proper computation of the residual quantity); (c) maximize visible frames-to have the item be visible in the maximum number of possible frames (to maximize the chances of correct food item classification).
- Position 1 The cameras (IR+ visible light) sensors are installed on top of the refrigerator, thus providing a top view of the items while they are being added/removed from the fridge. Although this view is likely to capture most of the item interactions, it is often unable to capture the height of the containers properly (see Figure 8) especially when the containers are picked from the lower racks, leading to lower accuracy of quantity estimation.
- Position 2 The thermal and visible light camera sensors are deployed on the left side (closer to the door) of the refrigerator.
- the captured items include items kept in the trays mounted on the fridge door. While such images can possibly be eliminated by optical flow techniques, the presence of such cold items is likely to increase the error of the thermal segmentation process.
- Position 3 Both thermal and visible light camera sensors are deployed on the right side (away from the door hinge) of the refrigerator. From our sample observations, we found that the vast majority of interactions (across a variety of ‘removing’ or ‘inserting’ gestural patterns) were visible with this placement, with the cameras' field-of-view (FoV) primarily capturing user-item interactions. Furthermore, occlusion of the food items was rare. Accordingly, we used Position 3 as the preferred placement in our prototype.
- Item Coverage com p U t e s the ratio of the intersection area of the ground-truth and computed bounding boxes to the ground-truth bounding box.
- Figure 9 plots the fraction of extracted images (across all episodes in the user study) whose IoU exceeds the specified threshold.
- the combined approach provides the best extraction performance: over 80% of images have IoU values greater than 0.6 (object detection frameworks typically require IoU values higher than 0.45-0.5).
- the pure RGB motion vector-based approach performs the poorest, achieving IoU values greater than 0.6 in less than 20% of the images.
- Figure 10 plots the precision/recall values for DNN-based item identification for those images whose IoU value exceeds the corresponding x-axis value. We observe that the item identification accuracy increases with IoU, reaching 95+% when the IoU value exceeds 0.7.
- Figure 11 plots the distribution of ICov values, for both the combined and the RGB motion- vector only methods.
- the combined technique achieves ICov values of 0.8 or higher in 80% of the interaction episodes.
- the higher ICov values observed for the “RGB motion-vector only” occur because this approach typically extracts a larger fraction of the image but also includes a disproportionately larger ‘background’ component (hence, the lower IoU score).
- the presence of a larger background leads to poorer performance of the DNN-based item identifier.
- Table 3 quantifies the number of episodes (out of a randomly selected 20% of the total episodes) that contain at least 1 extracted item image with ICov values higher than ⁇ 75%, 95% ⁇ .
- Table 4 plots the item classification results (for episodes involving the original 12 users who interacted with the original 15 food item classes), for both the 15 -class classifier and the subsequent 19-class classifier.
- the combined pipeline results in the highest and identical precision/recall values (of -0.84).
- the results are fairly stable over the 15-class and 19-class classifiers.
- the food item precision/recall is 74% and 72% respectively, for the episodes involving the 7 new users, who interacted solely with the 4 new fruits & vegetable items.
- the overall item recognition accuracy is high but not as high as the 97%+ accuracy reported on the externally curated training data. In large part, this is due to the lack of sufficient relevant training data for our classifiers.
- the training corpus consists entirely of images of items extracted from the Web or shot in close proximity by a video camera. These training images are quite distinct from the partial views of items captured by the RGB+IR sensors. We fully anticipate that the accuracy will improve as the corpus is continuously expanded in the real world (similar to approaches used by consumer ML-based devices such as Amazon’s AlexaTM) to include more such in-the-wild images. • The accuracy is lower for the newer episodes that involved the 4 new food items. This was principally due to the lack of sufficient appropriate training images — unlike canned items, fruits and vegetables have greater diversity in shape and color, and thus require more diverse training data.
- Figure 12 plots the estimated quantity for 3 different liquids ⁇ juice, milk, water ⁇ , and 3 different fractional quantities ⁇ 30%, 60%, 100% ⁇ . The plot shows that these 3 levels are distinguishable (distinct mean values, with low overlap between 5/95% confidence intervals). However, the estimates are significantly more noisy for juice when the container is only 30% full). Studies with additional semi-solid items ⁇ yogurt, ketchup, peanut butter ⁇ show that the estimation error remains within 10-20%, indicating the robustness of our technique.
- Coarser Estimation/Classification While fine-grained quantity estimation is challenging for certain (liquid, container) combinations, coarser-grained estimates are acceptable for many applications. For example, an application that generates alerts (when the food quantity becomes very low) may just need to know when the quantity drops below, say, 20%. Accordingly, we now study the accuracy of the coarser-grained classifier that assigns the captured IR image into one of 3 bins/classes: 30
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
Selon un premier aspect, la présente invention concerne un système de profilage d'objet comprenant : une caméra à spectre visible ; une caméra thermique, la caméra à spectre de lumière visible et la caméra thermique étant disposées pour capturer une zone commune ; et un processeur configuré : pour recevoir un flux d'images à spectre visible et des images thermiques de la zone commune, prises respectivement par la caméra à spectre visible et la caméra thermique, lors de la détection du passage d'un objet à travers la zone commune ; et pour isoler l'objet dans les images à spectre visible par référence croisée avec des coordonnées de sa silhouette thermique trouvée dans les images thermiques. Un second aspect comprend un système de profilage d'objet, lequel possède une caméra thermique et un processeur qui détermine un pourcentage de contenu restant dans un objet sur la base de ses images thermiques prises par la caméra thermique.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG10201910482VA SG10201910482VA (en) | 2019-11-08 | 2019-11-08 | System for object identification and content quantity estimation through use of thermal and visible spectrum images |
SG10201910482V | 2019-11-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021091481A1 true WO2021091481A1 (fr) | 2021-05-14 |
Family
ID=75848605
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2020/050494 WO2021091481A1 (fr) | 2019-11-08 | 2020-08-24 | Système d'identification d'objet et d'estimation de quantité de contenu par l'utilisation d'images thermiques et à spectre visible |
Country Status (2)
Country | Link |
---|---|
SG (1) | SG10201910482VA (fr) |
WO (1) | WO2021091481A1 (fr) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102021204149A1 (de) | 2021-04-27 | 2022-10-27 | BSH Hausgeräte GmbH | Objekterkennung für ein Hausgerät |
WO2023055668A1 (fr) * | 2021-09-28 | 2023-04-06 | Stirred Inc. | Moteur de recommandation d'ingrédient et de recette de cocktail artisanal à base d'apprentissage automatique |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160037088A1 (en) * | 2014-07-30 | 2016-02-04 | Toshiba Tec Kabushiki Kaisha | Object recognition apparatus that performs object recognition based on infrared image and visible image |
WO2016056009A1 (fr) * | 2014-10-07 | 2016-04-14 | The State Of Israel, Ministry Of Agriculture & Rural Development, Agricultural Research Organization (Aro) (Volcani Center). | Classement des arachides par imagerie thermique |
CN108288028A (zh) * | 2017-12-29 | 2018-07-17 | 佛山市幻云科技有限公司 | 校园发热监控方法、装置及服务器 |
CN110097030A (zh) * | 2019-05-14 | 2019-08-06 | 武汉高德红外股份有限公司 | 一种基于红外和可见光图像的突出标识方法及系统 |
-
2019
- 2019-11-08 SG SG10201910482VA patent/SG10201910482VA/en unknown
-
2020
- 2020-08-24 WO PCT/SG2020/050494 patent/WO2021091481A1/fr active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160037088A1 (en) * | 2014-07-30 | 2016-02-04 | Toshiba Tec Kabushiki Kaisha | Object recognition apparatus that performs object recognition based on infrared image and visible image |
WO2016056009A1 (fr) * | 2014-10-07 | 2016-04-14 | The State Of Israel, Ministry Of Agriculture & Rural Development, Agricultural Research Organization (Aro) (Volcani Center). | Classement des arachides par imagerie thermique |
CN108288028A (zh) * | 2017-12-29 | 2018-07-17 | 佛山市幻云科技有限公司 | 校园发热监控方法、装置及服务器 |
CN110097030A (zh) * | 2019-05-14 | 2019-08-06 | 武汉高德红外股份有限公司 | 一种基于红外和可见光图像的突出标识方法及系统 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102021204149A1 (de) | 2021-04-27 | 2022-10-27 | BSH Hausgeräte GmbH | Objekterkennung für ein Hausgerät |
WO2023055668A1 (fr) * | 2021-09-28 | 2023-04-06 | Stirred Inc. | Moteur de recommandation d'ingrédient et de recette de cocktail artisanal à base d'apprentissage automatique |
Also Published As
Publication number | Publication date |
---|---|
SG10201910482VA (en) | 2021-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111415461B (zh) | 物品识别方法及系统、电子设备 | |
US11216868B2 (en) | Computer vision system and method for automatic checkout | |
US9784497B2 (en) | Smart refrigerator | |
EP2985553B1 (fr) | Procédé de gestion de produit de stockage dans un réfrigérateur utilisant un système de reconnaissance d'image et réfrigérateur associé | |
CN108596187B (zh) | 商品纯净度检测方法及展示柜 | |
WO2021091481A1 (fr) | Système d'identification d'objet et d'estimation de quantité de contenu par l'utilisation d'images thermiques et à spectre visible | |
Sharma et al. | SmrtFridge: IoT-based, user interaction-driven food item & quantity sensing | |
CN113468914B (zh) | 一种商品纯净度的确定方法、装置及设备 | |
KR102017980B1 (ko) | 인공지능을 이용하여 제품을 식별하고 구분된 이미지를 출력하는 냉장고 및 방법 | |
CN112184751B (zh) | 物体识别方法及系统、电子设备 | |
Falcão et al. | Faim: Vision and weight sensing fusion framework for autonomous inventory monitoring in convenience stores | |
CN107527060B (zh) | 一种冷藏装置存储物管理系统和冷藏装置 | |
Lee et al. | Smart refrigerator inventory management using convolutional neural networks | |
Milella et al. | 3d vision-based shelf monitoring system for intelligent retail | |
CN116416556A (zh) | 物品信息识别方法、制冷设备和计算机存储介质 | |
US20220309766A1 (en) | Device and method for forming at least one ground truth database for an object recognition system | |
KR20180020214A (ko) | 저장 구조체를 위한 물체 인식 | |
US20220414391A1 (en) | Inventory management system in a refrigerator appliance | |
Falcão | Human Object Ownership Tracking in Autonomous Retail | |
CN213248024U (zh) | 货柜 | |
US20230076984A1 (en) | Inventory management system in a refrigerator appliance | |
SHARMA | Vision-based analytics for improved AI-driven IoT applications | |
Färnström et al. | Computer vision for determination of fridge contents | |
CN116415892A (zh) | 用于检测库存的方法及装置、电子设备、存储介质 | |
CN118135483A (zh) | 一种无人零售商品识别系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20884580 Country of ref document: EP Kind code of ref document: A1 |
|
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20884580 Country of ref document: EP Kind code of ref document: A1 |