WO2021222911A1

WO2021222911A1 - System and method for identifying grab-and-go transactions in a cashierless store

Info

Publication number: WO2021222911A1
Application number: PCT/US2021/036300
Authority: WO
Inventors: Davi Geiger; Carlos Henrique Cavalcanti CORRÊA
Original assignee: Kooick Inc.
Priority date: 2020-05-01
Filing date: 2021-06-08
Publication date: 2021-11-04
Also published as: EP4143800A1; WO2021222911A4; EP4143800A4

Abstract

A method and system for detecting a commercial transaction through physical interactions with items, the method comprising receiving data from a plurality of sensory modules associated with one or more shelves within a container, the plurality of sensory modules including a static cameras module, a weight sensors module, and a video cameras module, wherein the data includes physical activities corresponding to items on the smart shelves in a given session. The method further comprises resolving the data from the sensory modules using probabilistic reasoning and machine learning, determining a new container state after the given session based on the resolved data, and determining a final commercial transaction based on the new container state.

Description

SYSTEM AND METHOD FOR IDENTIFYING GRAB-AND-GO TRANSACTIONS IN A

CASHIERLESS STORE

COPYRIGHT NOTICE

[0001] A portion of the disclosure of this patent document contains material, which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever.

CROSS REFERENCE TO RELATED APPLICATION [0002] This application claims the priority of:

• United States Patent Application No. 17/246,757, entitled “SYSTEM AND METHOD FOR IDENTIFYING GRAB-AND-GO TRANSACTIONS IN A CASHIERLESS STORE,” filed on May 3, 2021, claiming priority to United States Provisional Patent Application No. 63/018,948 filed May 1, 2020, and

• United States Provisional Patent Application No. 63/079,623, entitled “HARDWARE SYSTEM FOR IDENTIFYING GRAB-AND-GO TRANSACTIONS IN A CASHIERLESS STORE,” filed on September 17, 2020, the disclosures of which are hereby incorporated by reference in its entirety.

BACKGROUND OF THE INVENTION FIELD OF THE INVENTION

[0003] This application generally relates to cashierless transactions, and in particular, tracking physical activities and states of items or shelves in a commercial environment to determine purchases. DESCRIPTION OF THE RELATED ART

[0004] The existence of commercial refrigerators and cabinets is abundant; millions of them exist in different formats and at different types of commercial places. For example, it is common to find in pharmacies closed cabinets with razors, electric tooth brushes, and other products. Also in pharmacies, one may find refrigerators with beverages. Typically today, users may take items from such refrigerators and/or cabinets and pay for these items at a cashier. Moreover, these items are filled or replenished by someone representing the vendor and an accurate account of the replenishment/stock is a requirement by the vendor.

SUMMARY OF THE INVENTION

[0005] The present invention provides a system, method, and non-transitory computer- readable media for detecting a commercial transaction through physical interactions with items. According to one embodiment, the system comprises a plurality of sensory modules associated with one or more shelves within a container, wherein the plurality of sensory modules including a static cameras module, a weight sensors module, and a video cameras module. The system further comprises an integration module configured to receive data from the plurality of sensory modules, wherein the data includes physical activities corresponding to items on the smart shelves in a given session, resolve the data from the sensory modules using probabilistic reasoning and machine learning, determine a new container state after the given session based on the resolved data, and determine a final commercial transaction based on the new container state. [0006] The static cameras module may be configured to retrieve images of inside the container before the given session and images of inside the container after the given session, determine state configurations of the one or more smart shelves, and transmit the state configurations to the integration module. The video cameras module may be configured to receive video recordings that start when the container is opened and ends when the container is closed, determine items that have been placed in and out of the container and times of which the items have been placed in and out of the container, and transmit associated with the determined items and times to the integration module. The weight sensors module may be configured to detect weight changes on the one or more shelves during the given session. The integration module may be further configured to resolve the static camera module detecting an item removal by confirming with data from the video cameras module and the weight sensors module.

[0007] According to one embodiment, the method comprises receiving data from a plurality of sensory modules associated with one or more shelves within a container, the plurality of sensory modules including a static cameras module, a weight sensors module, and a video cameras module, wherein the data includes physical activities corresponding to items on the smart shelves in a given session. The method further comprises resolving the data from the sensory modules using probabilistic reasoning and machine learning, determining a new container state after the given session based on the resolved data, and determining a final commercial transaction based on the new container state.

[0008] The method may further comprise detecting a change to a current container state has occurred. The current container state may comprise data identifying available inventory and placement of the inventory in the container prior to the detected change to the current container state. Determining the new container state may further comprise determining the new container state based a change to the available inventory or placement of the inventory. The final commercial transaction may comprise data including a description of which items have been taken from the container and an indication that the taken items are desired to be purchased. [0009] According to one embodiment, the computer-readable media comprises computer program code for receiving data from a plurality of sensory modules associated with one or more shelves within a container, the plurality of sensory modules including a static cameras module, a weight sensors module, and a video cameras module, wherein the data includes physical activities corresponding to items on the smart shelves in a given session. The computer-readable media further comprises computer program code for resolving the data from the sensory modules using probabilistic reasoning and machine learning, computer program code for determining a new container state after the given session based on the resolved data, and computer program code for determining a final commercial transaction based on the new container state.

[0010] The non-transitory computer-readable media may further comprise computer program code for detecting a change to a current container state has occurred. The current container state may comprise data identifying available inventory and placement of the inventory in the container prior to the detected change to the current container state. The computer program code for determining the new container state may further comprise computer program code for determining the new container state based a change to the available inventory or placement of the inventory. The final commercial transaction may comprise data including a description of which items have been taken from the container and an indication that the taken items are desired to be purchased.

[0011] According to another embodiment, the system comprises a shelf hardware and firmware system equipped with cameras and weight sensors to replace conventional shelves used in pods, where a pod can be a refrigerator or a cabinet. The shelf may communicate with a computer placed nearby (or inside) the pod. Such shelves will provide data for artificial intelligence (“AI”) systems to detect a commercial transaction through physical interactions with product items available and displayed at these shelves. The shelves may also provide support for the AI system to automatically identify the replenishment of a pod (e.g., a cooler or a cabinet), by providing the data needed to identify which items were replenished and reporting to the vendor (e.g., managing the pod). Thus, the shelf hardware with the use of AI software effectively provides an automatic pod management system.

[0012] The system may also include a platform that provides analysis of human behavior while using the pod, by providing all the data for recognizing physical interactions of the consumer with the products inside the pod. This includes providing the data to identify that an item was taken and placed back (possibly elsewhere) in the pod. As such, the shelf hardware not only provide the data for the final transactions (for which a commercial receipt is prepared), but also for all transactions during a session. All of these functionalities can run locally on the pod computer as a consequence of the shelf hardware providing data communication from the shelves to the computer at the pod. For the local processing to be sufficient, the pod computer includes processing capability to run AI software. Alternatively, the computer at the pod may preprocess or filter the data reducing the amount of data, to transmit to cloud computing infrastructure where AI applications are hosted.

[0013] The local computer can infer which shelf was manipulated by the consumer and send the data associated to such shelf. The local computer can restrict the video camera frames to where items are present in the hand of the consumer. Other local computations can occur. These different possibilities correspond to different embodiments of the invention.

[0014] A shelf can also provide data communication between the pod and the human

(consumer) that maybe interested in the products provided by the pod. A shelf may include display screens (such as liquid crystal display (“LCD”)) to indicate the items and price, and the information can be dynamically updated. This mechanism of dynamic pricing and screen display may be used by a software system that learns consumer behavior and as a consequence, return marketing strategies of prices and items for promotion inside the pod. Another mechanism of communication for a shelf may be via voice with the user of the pod. Voice recognition capability is useful for the visually impaired customers and to many activities for everyone. [0015] The system may provide the hardware compactly placed in the shelves, with a printed circuit board supporting the electronics of cameras and load cells, boxes to place cameras in stable positions and adjustment for the angles of view of the cameras, and provides mechanisms for inclining the shelves so gravity can bring all items towards the front without affecting the sensors. Despite having weight sensors placed on them, shelves may also be configured for large height differences between two consecutive shelves. The disclosed shelf system provides the flexibility needed to build different size coolers and cabinets for different types of items, while simplifying the need for cables. A cable for each shelf may connect to a central hub or computing device that interfaces with a main computer placed nearby or inside the pod.

BRIEF DESCRIPTION OF THE DRAWINGS [0016] The invention is illustrated in the figures of the accompanying drawings which are meant to be exemplary and not limiting, in which like references are intended to refer to like or corresponding parts.

[0017] Fig. 1 illustrates an exemplary process of an automatic check out in a grab-and-go environment according to an embodiment of the present invention.

[0018] Fig. 2 illustrates a computing system according to an embodiment of the present invention. [0019] Figs. 3A, 3B, 3C, 4A, 4B, and 4C illustrate a smart shelf system according to an embodiment of the present invention.

[0020] Fig. 5 illustrates a data flow diagram of a module system according to an embodiment of the present invention.

[0021] Fig. 6 illustrates a flowchart of a method for detecting a commercial transaction according to an embodiment of the present invention.

[0022] Fig. 7 illustrates shelf and computing hardware according to an embodiment of the present invention.

[0023] Fig. 8 illustrates a side view of a shelf as viewed by a camera according to an embodiment of the present invention.

[0024] Fig. 9A illustrates a side view of a shelf including a box and mechanism that adjusts a camera according to an embodiment of the invention.

[0025] Fig. 9B illustrates a shelf including three frontal cameras covering six lanes according to an embodiment of the invention.

[0026] Fig. 10 illustrates load cells placed on a shelf frame

[0027] Figs. 11 A and 1 IB illustrate a shelf unit according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION [0028] Subject matter will now be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, exemplary embodiments in which the invention may be practiced. Subject matter may, however, be embodied in a variety of different forms and, therefore, covered or claimed subject matter is intended to be construed as not being limited to any example embodiments set forth herein; example embodiments are provided merely to be illustrative. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention. Likewise, a reasonably broad scope for claimed or covered subject matter is intended. Throughout the specification and claims, terms may have nuanced meanings suggested or implied in context beyond an explicitly stated meaning. Likewise, the phrase “in one embodiment” as used herein does not necessarily refer to the same embodiment and the phrase “in another embodiment” as used herein does not necessarily refer to a different embodiment. It is intended, for example, that claimed subject matter include combinations of exemplary embodiments in whole or in part. Among other things, for example, subject matter may be embodied as methods, devices, components, or systems. Accordingly, embodiments may, for example, take the form of hardware, software, firmware or any combination thereof (other than software per se). The following detailed description is, therefore, not intended to be taken in a limiting sense.

[0029] The present application discloses a system and method for processing grab-and-go activities. According to one embodiment, the disclosed system may identify merchandise a user has taken from a storage or display of objects (such as, refrigerators and/or cabinets within a commercial environment) and determine intent of the user corresponding to the merchandise, e.g., a commercial transaction or a purchase. Fig. 1 presents an exemplary process of an automatic checkout in a grab-and-go environment according to an embodiment of the present invention. Users may scan (e.g., via quick response (“QR”) code), swipe, or input account information and/or a method of payment at a store, step 102. The users may be monitored for interactions with various items or merchandise within the store, step 104. The items or merchandise may be stored or placed within, for example, refrigerators where the users may open a door and take an item.

[0030] According to embodiments of the present invention, a system may automatically determine that certain ones of the interactions are finalized transactions that allow the users to purchase the items or merchandise and skip checkout lines or cashier systems, step 106. The disclosed system may include smart shelves and through different sensors, use of machine learning, computer vision, probabilistic reasoning, and artificial intelligence, can generate a final commercial transaction as well as a container state based on information from the smart shelves. The final commercial transaction may include a description of which merchandise items have been taken from the container by a user from the container during a session and an indication that the merchandise items are in the process of being purchased. A session may begin when the user opens a door of a container that includes the smart shelves and end when the door is closed. A container state may include a description of all merchandise items inside the container at any given time. The system may also determine that during a session, a user can manipulate the merchandise items and return them to possibly different shelves. A first container state at the start of a session and a second container state at the end of a session may be used by the system to determine the final commercial transaction.

[0031] Fig. 2 presents a computing system according to an embodiment of the present invention. The system presented in Fig. 2 includes container unit(s) 202, local computing device 204, central server 206, and network 208. Container unit(s) 202 may comprise one or more of shelves, racks, cases, cabinets, bins, floor locations, or other suitable storage mechanisms for holding, supporting, or storing merchandise. In one embodiment, the container unit(s) 202 include smart shelves which are described in further detail with respect to the description of Figs. 3 A through 4C. The container unit(s) 202 include sensor(s) 210 and camera(s) 212. Sensor(s) 210 may include, but are not limited to, weight sensors, radio frequency (RF) receivers, temperature sensors, humidity sensors, vibration sensors, and so forth. The sensor(s) 210 may be configured to acquire information on the container unit(s) 202. Cameras 212 may comprise optical sensors, cameras, or three-dimensional (3D) sensors, configured to acquire images of picking or placement of merchandise items on the container unit(s) 202.

[0032] During operation of the container unit(s) 202, the sensor(s) 210 and camera(s) 212 may be configured to gather information suitable for tracking the location of merchandise items within the container unit(s) 202 and their movement. The gathered information may be transmitted to local computing device 204 which conducts machine learning, computer vision, probabilistic reasoning, and/or artificial intelligence processes on the gathered information to perform item recognition and transaction processing related to the merchandise items on the container unit(s) 202. For example, a series of images acquired by the camera(s) 212 may indicate removal of an item 104 from a particular container unit(s) 202 by a user. In another example, sensor data from the sensor(s) 210 may be used to determine a quantity on hand at a particular container unit(s) 202, change in quantity of merchandise items resulting from a removal or placement, and so forth. The item recognition and transaction processing related to the merchandise items on the container unit(s) 202 may be transmitted from local computing device 204 to central server 206 over network 208 for billing, administrative, and inventory management/ordering purposes.

[0033] Network 208 may be any suitable type of network allowing transport of data communications across thereof. The network 208 may couple devices so that communications may be exchanged, such as between servers and client devices or other types of devices, including between wireless devices coupled via a wireless network, for example. Network 208 may also include mass storage, such as network attached storage (NAS), a storage area network (SAN), cloud computing and storage, or other forms of computer or machine readable media, for example. In one embodiment, the network may be the Internet, following known Internet protocols for data communication, or any other communication network, e.g., any local area network (LAN) or wide area network (WAN) connection, cellular network, wire-line type connections, wireless type connections, or any combination thereof. Communications and content stored and/or transmitted to and from central server 206 may be encrypted using, for example, the Advanced Encryption Standard (AES) with a 128, 192, or 256-bit key size, or any other encryption standard known in the art.

[0034] Servers, as described herein, may vary widely in configuration or capabilities but are comprised of at least a special-purpose digital computing device including at least one or more central processing units and memory. A server may also include one or more of mass storage devices, power supplies, wired or wireless network interfaces, input/output interfaces, and operating systems, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, or the like. In an example embodiment, a server may include or have access to memory for storing instructions or applications for the performance of various functions and a corresponding processor for executing stored instructions or applications. For example, the memory may store an instance of the server configured to operate in accordance with the disclosed embodiments. [0035] Figs. 3A through 3C present various views of a configuration of smart shelves according to an embodiment of the present invention. A smart shelf 302 may be configured with a container 300 as shown in the illustrated embodiment. Smart shelf 302 may include sensory hardware modules for capturing data. For example, smart shelf 302 may include static cameras and weight sensors. Given ones of the smart shelves may also include video cameras to monitor activity (e.g., physical interaction with items). A local computer or computing device may be coupled to the cameras and sensors for performing processing and analysis of data from the cameras and sensors. The local computer or computing device 306 may be located in a backroom, computing closet, or placed under, behind, or otherwise conspicuously placed at the container.

[0036] A container 300 (e.g., a refrigerator) may include a plurality of stacked smart shelves. The smart shelves may include weight sensors to determine a change in quantity of items that are stocked on the shelves. The weight sensors may comprise two strain gauge sensors placed on the smart shelf effectively transforming the shelf to a scale. A weight change from any item placed or removed from the smart shelf can be detected by the weight sensors.

The weight sensors may also be used to determine a position an item was taken from or put in a smart shelf. The smart shelves may include mechanisms that allow for the shelves to operate in a flat mode or a tilted mode (where items can slide to the front based on gravity).

[0037] Figs. 4A through 4C present detailed views of a smart shelf according to an embodiment of the present invention. One or more rear-facing static cameras 402 are configured on the smart shelf 302 to provide the ability to view and label (or identify) items placed on a shelf below. The smart shelf 302 may be situated above another smart shelf in the container 300. The rear-facing static cameras 402 may be placed under a frame or on the underbody of a shelf portion near or at the front of the container 300. The static cameras 402 may be pointed towards a rear of the container 300 and capture images of the immediate shelf below and its contents. A topmost shelf in the container 300 may include cameras placed above the shelf (or a dedicated shelf including cameras). The bottommost shelf of the container may not need cameras. [0038] According to one embodiment, a container may include shelves designed with lanes having physical separators where items are placed along the lanes and movement of the items is confined within the lanes. For such containers, static cameras in the smart shelves may be placed at a position between the lanes such that each static camera can capture two lanes. For very deep containers, front-facing static cameras that are pointed toward the front of the container may also be placed on the smart shelves in a position near or at the back of the container. As an example, a container may include six lanes for each smart shelf and three static cameras per smart shelf with each camera placed in between two lanes such that the camera may capture items along the two lanes. In an alternative embodiment, the smart shelves may include mechanisms such as, a step motor, which allow for replacing the static cameras with a single camera per smart shelf. The mechanisms may move the camera along the front of a smart shelf such that the camera can take pictures of the entirety of the smart shelf. As such, the number of static cameras may be reduced per smart shelf.

[0039] The container 300 may further include a video camera module including video cameras that can be strategically placed to monitor items, for example, coming in or out of the container. The video camera module can be positioned to capture items outside of the container 300 as well as items that enter the smart shelves. An exemplary location of the video camera module may be on the top corner of the container 300. The video camera module may further include a communication module that allows it to feed a central unit or server with real-time video streams. The central unit or server may comprise a computing device including hardware such as, a central processing unit, memory, and graphics processing units, software, and cloud computing functionality for conducting machine learning, computer vision, probabilistic reasoning, and artificial intelligence processes to conduct item recognition and transaction processing related to items on the smart shelves.

[0040] Fig. 5 presents a module system according to an embodiment of the present invention. The module system may include sensory modules including a static cameras module 502, a weight sensors module 504, and a video cameras module 506 that act independently and feed data input to an integration module 508. Modules, as described herewith, may include software logic, hardware, and/or a computing device configured to receive input data, process the input data, and generate output data from processing of the input data. The sensory modules may provide information regarding any change of items placed on a smart shelf. The data input received from the sensory modules may include information that can be used to track which items are in a container at all times. The combination of data from the static cameras module 502, weight sensors module 504, and video cameras module 506 may be resolved by the integration module 508 to corroborate events in the container. The resolved data can be used by the integration module 508 to secure and monitor what is taken from a container, what is put back in a container, and what is moved from one shelf to another within the container. The integration module 508 outputs final commercial transactions and container states to a server 510 based on the resolved data.

[0041] Static cameras module 502 may be configured to obtain images of inside a container to detect what is inside the container before and after a given session (e.g., detected interaction with the container by a user). If an action occurs and an item is moved out of a shelf position (or put back to a new shelf position) in the container, the static cameras module 502 may detect such an event. The static cameras module 502 may capture photos just before the start of a session (e.g., one photo per camera) and after the end of a session (e.g., one photo per camera). The static cameras module 502 may attempt to decide what has changed from the start of a session to the end of the session. Possible state configurations data of the smart shelves may be outputted from the static cameras module 502 to the integration module 508. The possible state configurations data based on the photos received by the static cameras module 502 may be corroborated with the other sensory modules by integration module 508. However, it is noted that the state configuration of an entire container (or all of the smart shelves) at the end of a session may be the same as the state configuration in the start of a next session. Thus, capturing only one photo per camera at the end of the session may reduce the data upload needed as well as data processing. The behavior of the static cameras module 502 is shelf invariant but may be trained or operated under different light conditions that could occur on any given shelf.

[0042] Video cameras module 506 may be configured to detect what leaves and enters a container. The video cameras module 506 may receive video recordings of the container by video cameras that start when a container is opened by the user and ends when the container is closed by the user. If an action occurs and an item is moved out of the container (even for a short period of time) the video cameras module 506 may capture and be used to detect such event. If the action occurs and the item is moved out of or put back to the container (even for a short period of time) the video cameras module 506 may also detect such events. The video cameras may be configured on a container to cover as many scenarios of, for example, hands bringing items in or out of the container. The video cameras module 506 may comprise a video processor or communicate data to a cloud platform that performs computer vision and machine learning methods to determine which items have been placed in and out of the containers and at what times. Data information of such are transmitted to integration module 508 which may also be corroborated with data from the other sensory modules by integration module 508. [0043] The weight sensors module 504 can be applied and operated with any of smart shelves in a container to detect weight changes on the shelves during a session. The weight sensors module 504 may comprise strain gauge sensors on each shelf that provide data over time that permits computation of weight changes on any position of the shelf. Weight sensor measurements may occur during a session. Additionally, with an implementation of multiple strain gauge sensors on a shelf, rough position information of a weight change may be determined. If an item is removed from a location or placed in a location of a smart shelf, the weight sensors module 504 may report or detect such actions.

[0044] There are certain situations where the sensory modules, individually, are neither enough to provide accurate final commercial transaction for all situations that occur within a container nor provide accurate container state of each smart shelf. Static cameras module 502 may not be able to detect items that are occluded by other items. The static cameras modules 502 may also make mistakes of either mislabeling a detected item or detect items where there is none. The video cameras module 506 may not detect objects if they move in or out quickly (faster than frame rates) or multiple objects are moved out together and one hides the other or if hands and body cover an item during this process. Also mistakes can be made of either mislabeling a detected item or detect items where there is none. The weight sensors module 504 may not be able distinguish two different items that weigh the same. For example, if one replaces a non-valuable item with a legitimate item of value, when both have the same weight, the weight sensors module 504 may not distinguish such a scenario. Additionally, conflicts may exist between the modules where a correct final result may not be always reached.

[0045] As such, an integration module 508 is disclosed for acting as a sensor fusion module. The integration module 508 may take as input the output of the static cameras module 502, a weight sensors module 504, and video cameras module 506 as well as a current container state and utilizes, for example, probabilistic reasoning methods and/or machine learning methods to output a new container state and a final commercial transaction after the given session to server 510. Exemplary scenarios are used to describe exemplary functionality of the integration module.

[0046] In a first scenario, a case of very similar items is considered. In some instances, similar items may have small differences hidden to static cameras and may cause mistaken labels. In a session that happens to be particularly quick, the static cameras module 502 may infer that an item was removed and a new item was put back, when in fact the item was mislabeled by the static cameras module 502. The weight sensors module 504 may detect no change of weight, as the session was quick, that could be explained by the removal and placement of another (very similar) item very quickly. Clearly, a more likely scenario is that no item was removed, but requires confirmation.

[0047] This may be resolved by analyzing all of the information from the sensory modules as a whole. For example, data from the video cameras module 506 may identify that no item left the container. The video cameras module 506 may also detect differences between similar looking items, as long as the different features can be seen while being removed (and not covered by the hands). Alternatively, the video cameras module 506 may detect if an stock- keeping unit (“SKU”) did come in or did come out which would make it possible to determine if a similar item to the one detected could have come or not. Additionally, the weight sensors module may also report similar such information and where in the shelf such a change occurred or not. A processor or cloud computing functionality may gather all the information from the sensory modules from the integration module 508 (including the resolving data from the video cameras module 506) and determine that it is likely that such item mislabeled by the static cameras module 502 has the correct label assigned to the new container state. Therefore, the first scenario may be resolved correctly and the final transaction and update of the container state can be correct.

[0048] In a second scenario, due to occlusions, true negatives (undetected items) can occur for the static cameras module 502. For example, large/tall items in front of small items may be a source of occlusions for the static cameras module 502. In such instances, the static cameras module 502 may infer a (valuable) item was removed when in fact the item in the back moved so slightly to become occluded. Usage of probabilistic reasoning by a processor may assign a probability that a smaller item can hide behind a (presumably detected) larger item. In this case, the weight sensors module may not detect any weight change but may fall short in detecting that a decoy weight was placed behind a large item just to pick up the smaller valuable item.

[0049] However, this situation may be resolved by data from the video cameras module

506 that identifies that no item has left the container or, in the instance of bad intention behavior, it did leave (and replaced with a decoy). Moreover a video of such behavior exists from the video cameras module 506. The weight sensors module may also detect the presence of the hidden beverage by the weight. Thus, a conclusion may be made by the processor or cloud computing functionality using the integration component to determine that such item not detected by the static cameras module 502 should be accounted by the new container state. Or for bad intention behavior, that it indeed was removed. A probability of the occurrence of occlusion may also be computed based on the data from the weight sensors module and the video cameras module 506 to determine a correct final transaction and update the container state correctly.

[0050] By using a combination of the static cameras module 502, weight sensors module

504, and video cameras module 506, items in a container can be secured and resolved for certain situations where systems with, for example, just weight sensors, just video cameras, or just static cameras alone (or any two of the aforementioned) would not be suffice. One such situation includes invasors which are items that may be put in a container but do not belong in the container. For example, if a first brand provides a cooler to a vendor for selling items belonging to the first brand, other second brands may be considered as invasor items. The first brand will want to know if second brand items are being placed in the cooler they provided (e.g., this could be regardless of someone picking them up, and would be detected by a static camera). In another situation, an item that has been consumed and placed back in a container may also be considered as an invasor. For example, if someone puts back a bottle of water that is empty (in this case a weight sensor can detect this).

[0051] Fig. 6 illustrates a flowchart of a method for detecting a commercial transaction according to an embodiment of the present invention. Data is received, by a data processing system comprising a processor and a memory, from sensory modules of smart shelves within a container, step 602. The sensory modules may comprise a static cameras module, a weight sensors module, and a video cameras module that generate data output representative of physical activities and events corresponding to items on the smart shelves in a given session. The data output generated from individual ones of the sensory modules may include data representative of physical activities and events corresponding to the items. [0052] The data from the sensory modules is resolved via an integration module, step

604. The data and any conflicting data associated with the physical activities and events corresponding to the items may be resolved by the integration module analyzing the data from the sensory modules as a whole and utilizing, for example, probabilistic reasoning methods and/or machine learning methods.

[0053] The data processing system detects whether a change to a current container state has occurred, step 606. If not, the data processing system returns to step 602 to receive data from the sensory modules. Otherwise, a change to the current container state causes the data processing system to determine a new container state after the given session based on the resolved data and the current container state, step 608. The current container state may comprise data identifying available inventory and placement of the inventory in the container prior to the detected change to the current container state. The resolved data may be used to determine a change to the available inventory and/or placement of the inventory. Based on the determined change to the available inventory and/or placement of the inventory, the new container state is determined including data identifying a new inventory and placement of the inventory in the container.

[0054] A final commercial transaction is determined based on the new container state, step 610. The final commercial transaction may comprise data including a description of which merchandise items have been taken from the container by a user and an indication that the taken merchandise items are desired to be purchased. The current container state is updated with the new container state, step 612 and then the data processing system returns to step 602.

[0055] [0056] A computer 706 may be placed outside the internal space of a pod 702 where consumers shop and yet close to shelves unit 704 within the pod so that the shelves unit 704 can communicate data securely to the computer 706 and receive power from the computer 706, as illustrated in Fig. 7. A pod 702 may comprise any product display unit or storage, such as a refrigerator, cooler, cabinet, etc. According to the illustrated embodiment, the computer 706 is placed inside the bottom of the pod. In this way it is well protected by the pod 702. While one could use wireless cameras in the shelves unit 704 powered by batteries, this can cause communications to be exposed to any nearby electronic device to interfere with it and batteries have finite time of usage. Thus, cables are utilized to secure communications and provide power, eliminating the need to replace batteries every so often.

[0057] Each shelf unit 704 may have a printed circuit where all sensors communicate data and, for example, one cable going through the frame of the shelf unit 704 providing power and transmitting/receiving data to/from the shelf unit 704. A cable for each shelf may connect the printed circuit device to a central hub or computing device (in one embodiment of the invention the server or computing device uses USB) that interfaces with computer 706 placed nearby or inside the pod 702. The hub may assign an identifier to each shelf in a pod, for identification during sessions of the use of the pod 702. Shelves unit 704 may also include screen display 708 for dynamic price updates in front of each lane.

[0058] Angle and Depth Covering Theory: The disclosed system may provide support for cameras to be placed such that they may cover the view of an entire shelf full, or partially full, of items below it. Fig. 8 presents a side view of a shelf and the variables h, d, D, A, F, 0, cp. The illustration may be used to help find the geometry and trigonometry to estimate an angle 0 that a camera with angle view F should be placed and a depth D it can cover for items of height up to A to be seen. Here p/2= F + q+ f.

[0059] A pod with a significant higher depth may require cameras to be placed on the back of a shelf to secure an entire shelf. A formula may be used to estimate the angle to be placed the front cameras, and Fig. 8 illustrates the scenario where the following variables are given (i) h - height between the camera and the shelf below (ii) d - the distance of the closest item to the shelf-front. This value can be slightly larger as it may not be required of the camera to see the bottom of the first item, and instead, just enough of the item to classify it correctly; (ii) D - depth or distance of the furthest item to be considered from the shelf-front cameras; (iii) A - height of the tallest item to be considered among a list of items to be placed in the pod; (iv) F - field of view of the camera along the pitch angle. The variable Q , the angle the camera should make with the shelf to cover for the first item at distance d is desired. Which depth D will camera angle Q cover given tallest item A can be present in the shelf is also desired. Referring to Fig. 2, a formula is derived as follows d = h tan f = h tan

^and thus, Q = j - j - tan^-1 ^ (1). Then, using that

and formula (1) results in D = h-A

- (2) , which is the maximum depth that such camera can cover and see every

item in it (up to one item occluding another one).

[0060] With these two formulas, one can define the best angle Q to adjust the front camera and cover a depth D in a cooler for items as tall as height A, with shelves of height h, and cameras covering pitch wide angle F . If there is a need to cover a depth further than D, back cameras can be placed on the shelf and using the same formula, starting from the back of the shelf looking towards the front of the shelf, an adjustment can be made to the camera angle to cover up to a depth Db, indicating depth from the back to front. Note the parameter d can be different for the front camera and back camera, resulting on a value Db different than D. Thus, with cameras in the front and in the back, a depth D+ Db is covered which is larger than Dpod (the depth of the pod) to visually secure a pod.

[0061] Camera Angles Mechanisms: The height h between two shelves may be set by the vendor and vary according to the items being sold and size of the pod. Taller objects may require larger height between shelves. The shelf heights h, depth of the pod Dpod, objects sizes A, and choice of wide angle lenses F may impact on the best angle a camera should be placed on the shelf, as derived above in formula (1) and (2). A vendor may prefer to have as many items as possible to be offered to consumers, because replenishing the items require logistics and human work (to bring new items to the pod) which then translates in costs that should be minimized by vendors. Accordingly, a shelf may include a mechanism to adjust the camera angle and direction (yaw and pitch) to best fit the setting of the pod (a cooler or a cabinet), items dimensions, and camera angle.

[0062] Fig. 9 A presents a side view of a shelf frame 902 including a box 906 and mechanism that adjusts a camera 904 according to an embodiment of the invention. In a very common scenario, there may be a need to guarantee that all cameras are turned by the same angle and for such calibration to be as simple as possible. In this case, the presently disclosed system may adjust the angles of the entire set of cameras together. For example, one knob may move boxes holding the camera at once. The boxes holding the cameras can be connected to a printing circuit 908 as shown in Fig. 9B. The board of the circuit design may hold all boxes with all cameras of the shelf 910 and so they can be rotated at once, all of the boxes-cameras. Accordingly, shelves may include boxes to place cameras looking at the shelf below at different yaw and pitch angles. These boxes may be made so that cameras are protected from user manipulation at the shelves.

[0063] Moreover, filters made with anti-fog films may be mounted on each camera and protect the cameras from fog that can occur. At the level of the software, fog can also be detected but may cause delays in the usability of the camera while filters with anti-fog films can eliminate or reduce the time where fog is present. Mechanisms to adjust the yaw and pitch angles of the cameras at each box can also be constructed on some scenarios. There are scenarios that such boxes can be displaced along the shelves so that the distance between cameras can be reduced or increased. The simpler is the hardware for an application the more cost effective is the hardware solution. So these scenarios above must be considered according to the ultimate needs of the vendor.

[0064] Width and Cameras Spatial Distribution: Coverage of a shelf along the depth of the pod is discussed above, but coverage of the shelf along the width of the pod may also be needed. If items are placed along lanes that go from front to the back (along depth), there are occlusion-width effects where a camera view of items along a further away (width wise) lane are obstructed by items along a nearby lane. There are also occlusion-depth effects where items in a lane obstruct items behind them, on the same lane. For an occlusion effect to take place, it is necessary and sufficient for the occluded item to be along the ray between the camera and the occluder item. Of course, the taller is the occluder item and the shorter is the occluded item, the more the occlusion effect and the less of the occluded item the camera will see.

[0065] In one embodiment, cameras are placed between two lanes to minimize the occlusion-depth effect along such two lanes. In this embodiment, the pod includes six (6) lanes and thus three cameras along the width of the shelf 910 as seen in Fig. 9B. The wider a shelf needs to be, the more lanes it has. A camera may be placed between two lanes and thus we can cover 2N lanes with N cameras. Moreover, shelves can be inclined (discussed in further detail below) and then, items on the back appear to be taller and thus the depth-occlusion effect is reduced. If the height between the shelves is allowed by vendor to be increased, cameras can be placed further inside the pod to cover for wider views with less occlusion effects. With a fish eye camera and proper height, an entire shelf can be covered by one camera placed in the center of the shelf. However, the items will be seen mostly from the top and thus, visual differentiation should exist. If all items are cans with the same metal cover, visual differentiation may not be possible without making items that are visually differentiated from the top.

[0066] In one embodiment where cameras are placed in front to visually distinguish the items, the challenge is that occlusions will demand more cameras. The number of cameras is then dependent on the AI system and items being sold. If one is selling just one item, a weight sensor system may be sufficient to output accurate reports and the shelf may not need a camera. If items vary in height, shape, and occlusions are significant, one may require one camera per two lanes. If items of similar heights differ on visual details that are easily occluded, one more reason to have cameras per two lanes. Various ones of these arrangements can be incorporated into the shelf system.

[0067] Motor Mechanism: A shelf can also incorporate a motor attached to a mechanism to move a camera 904 (box 906), such as a conveyer belt. In this way, a camera 904 can move along, for example, the width of the shelf 910, and then several photos can be taken off the shelf 910. Such a mechanism may help cover an entire shelf and better resolve occlusions scenarios. Even depth from motion can be obtained with such motion mechanism for each item at the shelf 910. The challenge of using a motion mechanism (motors) is that it takes longer time to cover an entire shelf, and one introduces another mechanical mechanism prone to failures, the one used to move the camera around the shelf. The main advantage of this motion mechanism, is needing just one camera to cover the same area while being able to take more photos within a given width compared to just one per two lanes as suggested for the static case. More photos may help better resolve occlusions scenarios.

[0068] Weight Sensors: Referring to Fig. 10, the shelf system may include weight sensors 1002 (load cells) distributed so that it converts the shelf 910 into a scale of when items are taken or put back to the shelf 910. In some scenarios shelves can also provide position of where the weight has changed, i.e., when an item is added or removed from the shelf 910, the weight sensors 1002 may not only provide the weight change but also information to locate where the change occurred at the time of the change. Further description and details of formulae of how to compute location associated with a weight change may be found in “Ubiquitous Interaction - Using Surfaces in Everyday Environment as Pointing Devices”, A. Schmidt et. al., Conference Paper in Lecture Notes in Computer Science, October 2002, DOI: 10.1007/3-540- 36572-9_21 which is hereby incorporated by reference in its entirety. Fewer load cells, at least two, can be used as well. The weight sensor 1002 will also receive power and transmit data through the same cable that a shelf has, as illustrated in Fig. 10. The printed circuit can also communicate data with the load cells.

[0069] Height between Shelves: Weight sensors 1002 (load cells) can add thickness to the shelves. According to one embodiment, the weight sensors 1002 can be placed around the frame 902 of the shelves, as shown by Fig. 10 which illustrates where they can be placed. While the frame will have a thickness imposed by the load cells, the entire surface area of the shelf 910 where the items are placed does not have to be thick, as illustrated in Fig. 10. This design allows for maximum height between the shelf surfaces, in the sense that the thickness due to the weight sensors does not impact on the height between the shelf surfaces. As formula (2) above indicates, the larger value of the height h yields further depth D that cameras can see.

[0070] Inclining Shelves for Gravity: The mechanical design of the disclosed shelf hardware allows for the shelf surface to be placed at different angles as shown in Figs. 10, 11A and 1 IB. The surface area can be inclined as shown in the figure, and the levels can be manually chosen (1004). The power for the load cells and data transmission may be through a cable per shelf. Inclining a shelf so that gravity can cause items to move towards the front of the shelf may be a mechanism used as consumers shop and withdraw items often from the front of the shelves, thus requiring a mechanism to bring items from the back of the shelf towards the front.

[0071] Previous mechanisms were either plastic materials placed on top of the shelves to create the height gradient or shelves were hard screwed into the cabinet at such inclinations, thus no flexibility of changing it. Different items of different weights and heights and widths are impacted by the gravity differently and avoiding them to fall is a concern with gravity made shelves. According to at least one embodiment, the disclosed shelves may allow vendors to mechanically and easily at any time, adjust the inclination that best adjust to the items they are selling at any time. There are levels of inclination (1004) that are stable and can be manually changed, as shown in Fig. 10. The shelves may be inclined such that the weight sensors remains calibrated. This is may be accomplished by placing the load cells on the frame of the shelf, which is not inclined with the shelf surface.

[0072] Other sensors can be attached to the disclosed shelves 910. A very common sensor that may be needed for the application of the shelves is a thermometer, which can be added. Another sensor may include video cameras. Video cameras may be used to monitor activities just outside the shelves, checking which products leave the shelves and return to the shelves. One may place one or two cameras per shelf, on the sides of the shelf pointing towards each other and “securing” what comes in and out of the shelves. These video shelves may also be communicated with the pod computer 706. The challenge is that processing video data can be very intensive in memory, storage, and processing. In order to minimize the use of video cameras one may consider placing only such video camera, with wide angle views, on the sides of the top shelves looking down to cover all shelves. In most scenarios, but not all, these two cameras can secure a pod. Other sensors not discussed herein may also be considered.

[0073] Dynamic Price Updates: Shelves may be configured with a display system 1102 that provides information corresponding to items on the shelf. Such display system 1102 may be made with multiple display screens that simulate price tags in a usual vending machine. For example, the display system 1102 may include LCD screens that indicate which items are available, characteristics of such items, and prices for them. These displays may show any information the vendor wish to display to the consumers, which may be programmed or communicated to the pod computer 706. Display information may also be transmitted from the cloud to the pod computer 706 and then to the shelves. This communication can allow shelves to dynamically update tag description of the items being sold as well as prices. If a vendor wishes to offer a change in price to an item for a one time transaction, or for a one time period, such changes can be made. Such changes can be part of a marketing campaign by the vendor, it can be geographically specific, it can be temporarily specific and any granularity of space and time can be applied for dynamic price to be reflected in the screen of shelves. Such functionality of the shelves may be implemented via communication between the pod computer 706 and the screen display. [0074] At the time an individual consumer requests permission to open the pod door, updates to the display screens at the pod can be made, as well as messages to the mobile phone of the consumer can be sent. Some marketing strategies can be applied from the local computer of the pod, without requiring information coming from the cloud. For example, the vendor can set a policy that the last item in the pod to be offered at a discount. Another example would be to promote a new location, every time someone opens the door at such location, a random promotion from a list of predefined promotions, appears to the consumer. Such decisions can be made locally. Other rules that are locally decided can be made. Rules may reside on the pod computer 706 and dynamic pricing to display on the shelves may be executed on display screens. [0075] Moreover, a full automatic system of price updates and promotions can reflect objectives set by the vendor that a machine learning system can leam to update the screens. This can be market strategy for a specific product or set of products for a geographic region where a pod is present or to a group of consumers based on their history, or both, based on geographical place and customer history. The vendor can also define when a rule will be applied and when it will end. The shelf includes hardware and communication support for implementing such marketing strategies. The power supplied to the display system as well as data transmission between the display and the pod computer 706 placed nearby is also provided by the same cable described above.

[0076] Voice Recognition: Additionally, the disclosed system may include a voice recognition system. A central computer at a local location to the pod may include a real time AI voice recognition system that it is able to interact with the customer. A customer may be allowed to ask questions, such as “Do you have product A?” “Where is product A?” “Do you have discounts?” Such as feature can allow visually impaired people to locate desired products easily. The system may also allow the central computer and a cloud service to have the precise inventory and distribution of products in each shelf for a voice recognition system to exchange communication with a consumer.

[0077] According to one embodiment, the system includes a universal serial bus (e.g.,

USB 3.1 including any backward compatibility version) connection 1104 to provide a communication channel and power supply required by electronic components connected to the hub of a shelf. A numbering mechanism of each shelf and camera within the system may also be configured. In one embodiment, a numbering mechanism uses a controlled power up sequence in two different levels. A level 1 may control the power up sequence of each shelf, and the shelves may be numbered from top to the bottom. A level 2 numbering sequence may control the power up of each camera on the shelf. The cameras may be numbered in sequence from the left to the right side of each shelf. Each shelf and camera may be associated to a USB device on the central hub and the USB device serial number may be used to map the USB device entry to the number of each shelf or camera. The level 1 powering sequence may be executed by the hub and subsequently, the hub may send a command to each shelf to start the level 2 powering sequence. These commands may be sent using the USB device on the hub for each shelf. In the case of a shelf is restarted after the level 1 was finished, the hub may restart the powering sequence of this specific shelf, to ensure the correct numbering on the system.

[0078] The hub placed nearby the pod (cooler or cabinet) can be in communication with the cloud, if communication is possible. Then, updates to the firmware of shelves can be made remotely. It can also report any failures, to start a maintenance process.

[0079] Figures 1 through 11B are conceptual illustrations allowing for an explanation of the present invention. Notably, the figures and examples above are not meant to limit the scope of the present invention to a single embodiment, as other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Moreover, where certain elements of the present invention can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present invention are described, and detailed descriptions of other portions of such known components are omitted so as not to obscure the invention. In the present specification, an embodiment showing a singular component should not necessarily be limited to other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein. Moreover, applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, the present invention encompasses present and future known equivalents to the known components referred to herein by way of illustration.

[0080] It should be understood that various aspects of the embodiments of the present invention could be implemented in hardware, firmware, software, or combinations thereof. In such embodiments, the various components and/or steps would be implemented in hardware, firmware, and/or software to perform the functions of the present invention. That is, the same piece of hardware, firmware, or module of software could perform one or more of the illustrated blocks ( e.g ., components or steps). In software implementations, computer software (e.g., programs or other instructions) and/or data is stored on a machine-readable medium as part of a computer program product and is loaded into a computer system or other device or machine via a removable storage drive, hard drive, or communications interface. Computer programs (also called computer control logic or computer-readable program code) are stored in a main and/or secondary memory, and executed by one or more processors (controllers, or the like) to cause the one or more processors to perform the functions of the invention as described herein. In this document, the terms “machine readable medium,” “computer-readable medium,” “computer program medium,” and “computer usable medium” are used to generally refer to media such as a random access memory (RAM); a read only memory (ROM); a removable storage unit ( e.g ., a magnetic or optical disc, flash memory device, or the like); a hard disk; or the like.

[0081] The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying knowledge within the skill of the relevant art(s) (including the contents of the documents cited and incorporated by reference herein), readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present invention. Such adaptations and modifications are therefore intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance presented herein, in combination with the knowledge of one skilled in the relevant art(s).

Claims

CLAIMS What is claimed is:

1. A system for detecting a commercial transaction through physical interactions with items, the system comprising: a plurality of sensory modules associated with one or more shelves within a container, the plurality of sensory modules including a static cameras module, a weight sensors module, and a video cameras module; an integration module configured to: receive data from the plurality of sensory modules, the data including physical activities corresponding to items on the smart shelves in a given session; resolve the data from the sensory modules using probabalistic reasoning and machine learning; determine a new container state after the given session based on the resolved data; and determine a final commercial transaction based on the new container state.

2. The system of claim 1 wherein the static cameras module is configured to: retrieve images of inside the container before the given session and images of inside the container after the given session; determine state configurations of the one or more smart shelves; and transmit the state configurations to the integration module.

3. The system of claim 1 wherein the video cameras module is configured to: receive video recordings that start when the container is opened and ends when the container is closed; determine items that have been placed in and out of the container and times of which the items have been placed in and out of the container; and transmit associated with the determined items and times to the integration module.

4. The system of claim 1 wherein the weight sensors module is configured to detect weight changes on the one or more shelves during the given session.

5. The system of claim 1 wherein the integration module is further configured to resolve the static camera module detecting an item removal by confirming with data from the video cameras module and the weight sensors module.

6. A method, in a data processing system comprising a processor and a memory, for detecting a commercial transaction through physical interactions with items, the method comprising: receiving, by a computing device, data from a plurality of sensory modules associated with one or more shelves within a container, the plurality of sensory modules including a static cameras module, a weight sensors module, and a video cameras module, wherein the data includes physical activities corresponding to items on the smart shelves in a given session; resolving, by the computing device, the data from the sensory modules using probabilistic reasoning and machine learning; determining, by the computing device, a new container state after the given session based on the resolved data; and determining, by the computing device, a final commercial transaction based on the new container state.

7. The method of claim 6 further comprising detecting a change to a current container state has occurred.

8. The method of claim 7 wherein the current container state comprises data identifying available inventory and placement of the inventory in the container prior to the detected change to the current container state.

9. The method of claim 8 wherein determining the new container state further comprises determining the new container state based a change to the available inventory or placement of the inventory.

10. The method of claim 6 wherein the final commercial transaction comprises data including a description of which items have been taken from the container and an indication that the taken items are desired to be purchased.

11. Non-transitory computer-readable media comprising program code that when executed by a programmable processor causes execution of a method for detecting a commercial transaction through physical interactions with items, the computer-readable media comprising: computer program code for receiving data from a plurality of sensory modules associated with one or more shelves within a container, the plurality of sensory modules including a static cameras module, a weight sensors module, and a video cameras module, wherein the data includes physical activities corresponding to items on the smart shelves in a given session; computer program code for resolving the data from the sensory modules using probabilistic reasoning and machine learning; computer program code for determining a new container state after the given session based on the resolved data; and computer program code for determining a final commercial transaction based on the new container state.

12. The non-transitory computer-readable media of claim 11 further comprising computer program code for detecting a change to a current container state has occurred.

13. The non-transitory computer-readable media of claim 12 wherein the current container state comprises data identifying available inventory and placement of the inventory in the container prior to the detected change to the current container state.

14. The non-transitory computer-readable media of claim 13 wherein the computer program code for determining the new container state further comprises computer program code for determining the new container state based a change to the available inventory or placement of the inventory.

15. The non-transitory computer-readable media of claim 12 wherein the final commercial transaction comprises data including a description of which items have been taken from the container and an indication that the taken items are desired to be purchased.