WO2021232333A1

WO2021232333A1 - System and methods for express checkout

Info

Publication number: WO2021232333A1
Application number: PCT/CN2020/091504
Authority: WO
Inventors: Xianbin Zhang; Matthew Robert SCOTT; Yujie ZHONG
Original assignee: Shenzhen Malong Technologies Co., Ltd.
Priority date: 2020-05-21
Filing date: 2020-05-21
Publication date: 2021-11-25

Abstract

A system for express checkout uses machine learning models to detect one or more products that require a reposition to be recognized, provide reposition cues via a graphical or voice user interface, and facilitate a user to check out all products in a session without scanning any barcodes.

Description

SYSTEM AND METHODS FOR EXPRESS CHECKOUT

BACKGROUND

As an alternative to the traditional cashier-staffed checkout, self-checkout solutions become popular for retail success, particularly for grocery stores and supermarkets. Most self-checkout machines have the following components, including a lane light, a touchscreen monitor, a basket stand, a barcode scanner, a weighing scale, and a payment module. Using a self-checkout machine, a customer can check out products by scanning individual product barcode without any interactions with a cashier or a clerk, although a clerk may be assigned to supervise a group of self-checkout machines or lanes, so the clerk can assist customers when required, such as authorizing the sale of restricted products (e.g., alcohol, tobacco, etc. ) .

Self-checkout machines may utilize the universal product code (UPC) system to check out products one by one. In some recent experiments, self-checkout machines try to use radio-frequency identification (RFID) tags to check out a group of products together. However, as a precondition, these checkout solutions require a barcode or an RFID tag to be attached to each product. Such preparations can be expensive and error-prone. A wrong barcode or tag usually leads to a wrong transaction. A technical solution is needed for checking out products without such onerous preconditions.

SUMMARY

This Summary is provided to introduce selected concepts that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.

In general, aspects of this disclosure include a system for express checkout. The system may include one or more imaging devices and a display. Each imaging device may be adapted to capture images of a designated checkout area. When to recognize all products in an image of a designated checkout area, the system may detect an exception that requires a reposition of one or more products in the image. In response to the exception, the system may cause the image or a part of the image to be displayed with one or more reposition cues via a graphical or voice user interface. Accordingly, users may reposition the respective products according to the reposition cues, and check out all products simultaneously. Advantageously, the disclosed system for express checkout may drastically improve conventional checkout machines as the disclosed system enables multiple products to be checked out rapidly without requiring the aforementioned preconditions for affixing barcodes or RFID tags to each product. For instance, by simply parking a shopping cart under a camera, a customer may be able to check out all products in the shopping cart at once.

Further, technologies, as embodied in various systems, methods, and computer-readable storage devices, are disclosed to improve a computing system’s ability for computer-vision-based batch checkout. One aspect of the technologies described herein is to improve the computing system’s ability to recognize multiple products simultaneously. Another aspect of the technologies described herein is to improve the human-machine interface so that the checkout process may be enabled with precise user interactions. Yet another aspect of the technologies described herein is to improve a computing system’s ability to resolving various exceptions that might hinder the checkout process. Various aspects are further discussed in the DETAILED DESCRIPTION.

BRIEF DESCRIPTION OF THE DRAWINGS

The technologies described herein are illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:

FIG. 1 is a schematic representation illustrating a first group of exemplary graphical user interfaces or parts thereof for express checkout, in accordance with at least one aspect of the technologies described herein;

FIG. 2 is a schematic representation illustrating a second group of exemplary graphical user interfaces or parts thereof for express checkout, in accordance with at least one aspect of the technologies described herein;

FIG. 3 are various schematic representations illustrating an exemplary operating environment and an exemplary checkout system, in accordance with at least one aspect of the technologies described herein;

FIG. 4 is a flow diagram illustrating a first exemplary process of express checkout, in accordance with at least one aspect of the technologies described herein;

FIG. 5 is a flow diagram illustrating a second exemplary process of express checkout, in accordance with at least one aspect of the technologies described herein; and

FIG. 6 is a block diagram of an exemplary computing environment suitable for use in implementing various aspects of the technologies described herein.

DETAILED DESCRIPTION

The various technologies described herein are set forth with sufficient specificity to meet statutory requirements. However, the description itself is not intended to limit the scope of this disclosure. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” or “block” may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described. Further, the term “based on” generally denotes that the succedent condition is used in performing the precedent action. Further, the term “class” or “product class” is interchangeable with the term “type” or “product type” as used herein.

In the modern economy, many products are affixed with machine-readable labels ( “MRLs” ) , such as UPC barcodes, QR codes, RFID tags, etc. MRLs may be provisioned by a manufacturer, e.g., a UPC label on a TV, or by a retailer, e.g., a UPC label for an apple in a supermarket. MRLs may be read by scanning devices for automatic identification and data capture, e.g., supporting transactions at various point of sale ( “POS” ) locations, tracking inventory at warehouses, facilitating transportation of goods in commerce, etc.

In combination with an MRL system, products may be checked out with a checkout machine with reduced waiting time, reduced labor costs, and increased accuracy for sales and inventory tracking. Self-checkout, also known as self-service checkout, is an alternative to the traditional cashier-staffed checkout, where self-checkout machines are provided for customers to process their purchases from a retailer. Checkout machines designed for cashier-staffed checkout or self-checkout solutions can provide great benefits for the retail industry, e.g., by improving the productivity and accuracy of the checkout process.

However, tagging or labeling each product could be expensive, impractical, or error-prone on many occasions, such as for products sold in greengrocers, farmers’ markets, or supermarkets. Sometimes, MRLs may become missing, e.g., during the transportation process or due to mishandling. Sometimes, MRLs may become illegible (i.e., cannot be scanned by a scanner) , e.g., due to damage or smear. Sometimes, MRLs may be intentionally misplaced and affixed to unintended products. Sometimes, users (e.g., a cashier or a customer) may simply fail to scan every product in the shopping cart.

New technologies are needed to check out products from a store more conveniently and accurately. A technical solution is provided in this disclosure for express checkout. As used herein, express checkout refers to the process to check out products without scanning any MRLs. At a high level, one aspect of this disclosure includes a system with one or more imaging devices (e.g., cameras) and a display. Each camera may be adapted to capture images of a designated checkout area. By way of example, one camera may be positioned to face an area to park a shopping cart, another camera may be positioned to face another area to place a shopping basket, and yet another camera may be positioned to face a scale. One exemplary checkout system is further discussed in connection with FIG. 3.

When to recognize and check out all products in an image of a designated checkout area, the system may detect an exception that requires a reposition of one or more products. Computer-vision-based technologies typically require an unobstructed line-of-sight between a camera and an object. Accordingly, the system may generate overlap-based exceptions if significant overlaps among a stack of products are detected. Some products are sold based on weight. Accordingly, the system may generate weight-based exceptions if a product needs to be weighted. Some products have restrictions (e.g., age, quantity, etc. ) for sale. Accordingly, the system may generate restriction-based exceptions if the recognized products have any restrictions.

Different exceptions may require different reposition resolutions. Overlap-based exceptions may require repositioning the affected products separately to eliminate the overlap. Weight-based exceptions may require the affected products to be weighted. Restriction-based exceptions may require specific actions to resolve the restriction, e.g., age verification or quantity reduction, or even an intervention from a store clerk.

In response to an exception, the system may generate the exception-specific resolution cues, which may include the image or a part of the image associated with the exception. The system may cause the exception-specific resolution cues, including the image or a part of the image, to be displayed via a special graphical user interface, such as shown in FIG. 1 or FIG. 2. Alternatively, the system may provide such reposition cues via voice.

Accordingly, users may reposition the respective products according to the reposition cues, and check out all products simultaneously. Advantageously, the disclosed system for express checkout may drastically improve conventional checkout machines as the disclosed system enables multiple products to be checked out rapidly without the need for scanning any MRLs. A typical case for express checkout without exceptions may only require the customer to park a shopping cart under a designated checkout area, then all products in the shopping cart can be checked out at once.

In various embodiments, the disclosed system uses a machine learning model (MLM) to detect objects in an image and uses another MLM to recognize the product class or type of a detected object. To greatly improve the efficiency of product recognition, the disclosed system caches the features of recognized products in a session to form a novel cached feature space. After an iteration of repositioning, the disclosed system may try to recognize a product by utilizing the cached feature space first instead of the global feature space. As the cached feature space could be many orders of magnitude smaller than the global feature space, product recognition with the cached feature space could be many orders of magnitude faster than without the cached feature space, which is further discussed in connection with FIG. 5.

Having briefly described an overview of aspects of the technologies described herein, referring now to FIG. 1, which is a schematic representation illustrating a first group of exemplary graphical user interfaces or parts thereof for express checkout.

In one embodiment of a computing system 150 for express checkout, hardware 152 represents the hardware components, such as, for example, a central processing unit ( “CPU” ) , a graphics processing unit ( “GPU” ) , a memory, an imaging device, a wireless communication module, a display, a speaker, a microphone, etc. Above hardware 152, there is operating system ( “OS” ) 154, which includes various drivers to interface with various hardware components. Lying above OS 154 are applications 156 and graphical user interface ( “GUI” ) 158. Applications 156 include a checkout application, which is designed to enable products to be checked out based on images of the products. GUI 158 may be provided via OS 154 or applications 156, alternatively function as a standalone software application. In various embodiments, GUI 158 includes specially designed GUI for computing system 150 to display visual data (e.g., display reposition cues, related messages, etc. ) and accept user inputs (e.g., age verification information, etc. ) .

GUI 158 may control the operation of computing system 150 (e.g., the flow of an express checkout process) and determines how a user interacts with computing system 150 (e.g., whether and how to reposition products) . Regarding human-computer-interaction, computing system 150 may present graphical objects to a user for selection in GUI 158. The user interacts with the computer by selecting one or more objects, e.g., or choosing a command from a menu. In response to the selected object (s) , the computer may execute corresponding business logic and present one or more new or updated objects in the GUI or a new GUI.

GUI 158 typically includes windows and control boxes for manipulating the windows, such as buttons to minimize or close the window along with checkboxes and scroll bars related to the underlying program. Moreover, each window typically includes one or more “containers” within the window, where each container may be viewed as a sub-window within the window. Each container displays graphical elements or “widgets” that define how a user interacts with the GUI for an underlying program via that container. The terms “widgets” refer to the graphical elements displayed within windows or containers of a GUI that define how a user interacts with the GUI and thereby the underlying functions.

GUI 110 and GUI 120 are exemplary GUIs in GUI 158. For the sake of brevity, only important graphical components are illustrated herein. A “layout builder, ” which is a program that allows a programmer to define the GUI, may be used to define GUI 110 or GUI 120, such as defining containers to be placed within given windows and arranging respective widgets within each container.

In one embodiment, GUI 110 comprises at least four containers. Container 112 is adapted to display reposition messages, such as an instruction to resolve an exception. Container 114 is adapted to display reposition cues, such as an image of the product caused an exception. Container 116 is adapted to display product information, such as product names, unit price, total price, etc. Container 118 is adapted to display exception information, such as an exception type and its related information.

In this instance, a weight exception is generated as the watermelon is a sale-by-weight item. Accordingly, information on this weight exception is displayed in container 118. An instruction to resolve the weight exception is displayed in container 112. Image 132 of the watermelon is displayed in container 114, which may use a layered design as discussed in connection to FIG. 2. In some embodiments, image 132 is extracted from the image capturing the actual product being checked out in the current session. In other embodiments, image 132 may be the standard stock image of the product type, such as a typical image of the product type. Further, several already recognized products and their respective information are displayed in container 116.

In some embodiments, container 124 is adapted to display widgets (e.g., forward and backward arrows/buttons) for browsing, so the user may review individual exceptions if there are multiple exceptions generated in a checkout session. In this instance, widget 138 indicates there are more exceptions. In response to a user interaction (e.g., tap via a touch screen) with widget 138, GUI 110 may transit into GUI 120.

GUI 120 is similar to GUI 110 but displays information of an overlap exception. As discussed in connection with FIG. 5, not every overlap would trigger an overlap exception as the disclosed system can recognize products even with overlaps in some circumstances. However, in this instance, an overlap exception is generated due to the overlap between product 134 and product 136. Accordingly, information of this overlap exception is displayed in container 118. An instruction to resolve the overlap exception is displayed in container 122. An image illustrating the overlap is displayed in container 124. In various embodiments, the overlap image may be extracted from the image capturing the actual overlapped products in the current session. Further, several already recognized products and their respective information are displayed in container 126.

Under these embodiments, GUI 110 or GUI 120 is advantageously designed to present exception information, reposition cues, reposition messages, etc. to a user, and specifically designed to enable the user to review multiple exceptions conveniently.

Referring now to FIG. 2, which illustrates a second group of exemplary GUIs or parts thereof for express checkout. This group of GUIs uses layered design, such as illustrated in layers 230. A similar layered design may be used for container 114 or container 124 in FIG. 1.

In this embodiment, the exemplary GUIs use three layers, i.e., product layer 232, location layer 234, and message layer 236. Product layer 232 is to display product images. In some embodiments, such images may be captured by an overhead camera of the express checkout machine or retrieved from a product image database. Location layer 234 is to display the respective locations of the products. The location of a product may be visually presented by a bounding box of the product. Coordinates of the bounding boxes may be determined based on the underlying machine learning model, e.g., an object detection model, as discussed in connection with FIG. 3. In some embodiments, bounding boxes of all products may be shown. In other embodiments, only the bounding box (es) of the product (s) associated with an exception may be shown. Message layer 236 is to display messages related to the exception, such as an explanation of the exception or instruction to resolve the exception. The layers are stacked in a particular order, which leads to a visual effect of the content of a higher layer being superimposed on the content of a lower layer.

By way of example, GUI 210 illustrates several products in the image layer, including product 212, product 214, and product 216. In this instance, product 212 is highlighted as product 212 caused an exception. To highlight product 212, a highlighter (e.g., a flash background or a bounding box) may be shown at a matching location in the location layer. Further, message 218 is shown to provide instruction for resolving the exception. Similarly, in GUI 220, product 222, product 224, and product 226 are shown in the product layer. The bounding boxes of product 222 and product 224 are shown in the location layer as these products caused an overlap exception. Further, message 228 is shown in the message layer to provide instruction for resolving the exception.

In various embodiments, the image of a designated checkout area with the product being checked out in the current session is displayed at the product layer, the bounding box (es) of the exact product (s) that caused the exception is displayed at the location layer, and an exception resolving message is displayed at the message layer. Advantageously, a user may be guided to resolve the exception based on the cues displayed via this layered design, e.g., to reposition the exact product (s) that caused the exception (s) .

Referring now to FIG. 3, which illustrates an exemplary express checkout system and its exemplary operating environment. In this operating environment, checkout system 300 includes, among many components not shown, checkout station 310, which may be used for self-checkout, e.g., to check out all products in shopping cart 322 without scanning any MRLs. Checkout station 310 includes, among other components not shown, camera 312A, camera 312B, and display 316 mounted to an arm, as well as scale 314B mounted to desktop 314A. Camera 312A is configured to cover designated area 324, which a user may park shopping cart 322. Camera 312B is configured to cover desktop 314A, including scale 314B. In some embodiments, checkout station 310 is also equipped with a speaker for voice output and a microphone for voice input.

In some embodiments, checkout station 310 is also equipped with communication module 318 to communicate with positor 330 via network 360, which may include, without limitation, a local area network (LAN) or a wide area network (WAN) , e.g., a 4G or 5G cellular network. In various embodiments, communication module 318 includes a radio-frequency module for transmitting or receiving radio signals between two devices.

In some embodiments, positor 330 is configured as a local module in checkout station 310. In other embodiments, positor 330 is configured as a remote module, e.g., in server 380 in a computing cloud. In this embodiment, positor 330 includes object detector 332 to detect objects in an image, retriever 334 to recognize products, exception detector 336 to detect exceptions for express checkout, and signaler 338 to generate exception-related information, such as exception information, reposition cues, correction instructions, etc. Various functions performed by positor 330 may rely on one or more learning models in MLM 340. Moreover, information on express checkout, e.g., exception information or requests for assistance for resolving exceptions, may be distributed to device 370, e.g., a smartphone, a mobile device, or a computer, etc., that is accessible to a store clerk. Additionally, positor 330 may retrieve and store product data, payment data, customer data, MLM data, etc. from data store 350.

It should be noted that checkout station 310, positor 330, and other components illustrated in this exemplary operating environment, merely form an exemplary system following at least one aspect of the technologies described herein. These examples are not intended to suggest any limitation as to the scope of use or functionality of all aspects of the technologies disclosed herein. Neither should this exemplary operating environment be interpreted as having any dependency or requirement relating to any one component nor any combination of components illustrated.

After the initiation of an express checkout session, e.g., after a user touches a start button via the touchscreen of display 316, images will be collected by camera 312A and camera 312B. The image from camera 312A has a view of products in shopping cart 322, and the image from camera 312B has a view of any products on desktop 314A. These images are processed in positor 330.

At a high level, positor 330 is configured to recognize products in an image via computer vision technologies for express checkout, including detecting any exceptions for express checkout, such as exceptions requiring a reposition of products. If no exception is detected, all products can then be checked out simultaneously. If an exception is detected, positor 330 may provide exception-related information, including cues to resolve the exception, to users via a GUI on display 316 or device 370. Optionally, positor 330 may provide exception-resolving cues via a voice user interface ( “VUI” ) , e.g., via a speaker and a microphone for voice input and output. The VUI is helpful for visually-challenged users. After the user carries out an attempt (e.g., reposition) to resolve the exception, positor 330 will execute its functions based on newly captured images. This process may be repeated until all exceptions are resolved and the express checkout session completed. Advantageously, positor 330 enables a customer to resolve exceptions with intuitive cues via the GUI or VUI. Optionally, positor 330 may summon a clerk to resolve some exceptions (e.g., restriction exceptions) for the customer.

In addition to other components not shown, object detector 332, retriever 334, exception detector 336, signaler 338, and MLM 340, operatively coupled with each other to achieve various functions of positor 330.

In various embodiments, object detector 332 can use an object detection model to detect objects in an image. Object detector 332 may use various object detection models, such as two-stage detectors (e.g., Faster-RCNN, R-FCN, Lighthead-RCNN, Cascade R-CNN, etc. ) or one-stage detectors (e.g., SSD, Yolov3, RetinaNet, FCOS, EfficientDet, etc. ) .

Retriever 334 can use a product retrieval model to recognize the product type of the detected object. By way of example, retriever 334 may use various retrieval models, such as a combination of a type of network (e.g., VGG, ResNet, Inception, EfficientNet) with a type of loss (e.g., triplet loss, contrastive loss, lifted loss, multi-similarity loss) . Moreover, retriever 334 may use various computer vision technologies to recognize the product type of a product. The applications (PCT/CN2019/111643, PCT/CN2019/086367, and PCT/CN2019/073390, etc. ) have disclosed some effective technical solutions for product recognition, which may be used by retriever 334 herein. Further details of these machine learning models will be discussed in connection with MLM 340 herein.

In general, retriever 334 is configured to recognize product types and retrieve corresponding product information (e.g., product identifier, name, unit price, representative images, etc. ) . In various embodiments, retriever 334 is to compare the image features of a query product with image features of known products for similarity, e.g., via one or more machine MLMs, so that the known product types may be ranked based on their respective similarity measures against the query product. In one embodiment, the top-ranked product type is used to represent the product.

In some embodiments, scale 314B can communicate with positor 330 directly, e.g., via wired or wireless communication channels. In this case, positor 330 or a checkout module in checkout system 300 may directly retrieve the weight of a product on scale 314B. In other embodiments, scale 314B can be a standalone device. In this case, retriever 334 is also configured to recognize product information, such as weight information on scale 314B. The patent applications (e.g., U.S. Application No. 16/672,883, entitled Character-based Text Detection and Recognition) have disclosed some effective technical solutions for text detection and recognition, which may be used by retriever 334 to recognize the weight information on scale 314B, e.g., numerical or character weight information. By way of example, in the text detection stage, retriever 334 may use a convolutional network to identify a position of text from an image from camera 312B. As the image passes through the neural network, various feature maps may be generated to indicate a confidence measure for whether a text is presented and its position in the image. In the text recognition stage, retriever 334 can extract the text from the respective positions identified in the text detection stage, e.g., based on a recursive-network-based approach or OCR-related technologies. Advantageously, this technology enables existing scales in a store to be reused in this express checkout system, thus reduce the economical or technical barriers to deploy the disclosed express checkout systems in smaller stores or developing areas.

Exception detector 336 is configured to detect various exceptions that may prevent a successful express checkout. Many exceptions may be resolved by a reposition of one or more products, such as by changing a product’s placement in shopping cart 322, or moving the product from shopping cart 322 to desktop 314A in general or to scale 314B.

Computer-vision-based technologies typically require an unobstructed line-of-sight between a camera and an object. When two products are stacked together, exception detector 336 may generate an overlap-based exception, which would require one or more stacked products to be repositioned. Accordingly, signaler 338 may provide users with information on the overlap-based exception and reposition cues, such as discussed in connection with FIGS. 1-2, so that a user may reposition the affected products. This process is further discussed in connection with FIGS. 4-5.

Some products are sold by weight. If the product information retrieved via retriever 334 indicates that a recognized product needs to be weighted, exception detector 336 will generate a weight-based exception. Accordingly, signaler 338 may provide users with information on the weight-based exception and reposition cues, such as discussed in connection with FIGS. 1-2, so that a user may place the affected product to scale 314B. This process is also further discussed in connection with FIGS. 4-5.

Some products have restrictions (e.g., age, quantity, etc. ) for sale. By way of example, alcohol, tobacco, medicines, etc. may be restricted to customers of certain age groups. If the product information retrieved via retriever 334 indicates that a recognized product has a restriction, exception detector 336 will generate a restriction-based exception. Accordingly, signaler 338 may provide users information of the restriction-based exception and reposition cues, such as requesting the customer to provide a government-issued identifier for age verification, requesting the customer to buy the restricted product in a staffed POS, or requesting a store clerk to visit the express checkout machine to resolve the exception, etc.

If there is no exception or the customer easily resolves the exception according to the reposition cues, then all products in the shopping session may be checked out simultaneously. The payment subsystem or other subsystems for checkout may be needed to support the express checkout, and detailed discussions of those subsystems are omitted herein for keeping this disclosure brief. Resultantly, the disclosed system enables multiple products to be checked out rapidly without the need for scanning any MRLs. Advantageously, the disclosed system for express checkout may drastically improve conventional checkout machines with advanced technologies disclosed herein.

Returning to the machine learning models, the aforementioned many computer vision technologies may be implemented in MLM 340, which may include one or more neural networks in some embodiments. Different components in positor 330 may use one or more different neural networks to achieve their respective functions, which will be further discussed in connection with the remaining figures. For example, retriever 334 may use a trained neural network to learn the neural features of an unknown product, which may be represented by a feature vector in a high-dimensional feature space, and compute the similarity between the unknown product and a known product based on the cosine distance between their respective feature vectors in the high-dimensional feature space. In various embodiments, various MLMs and image data (e.g., image data retrieved by retriever 334, data associated with the high- dimensional feature space, etc. ) may be stored in data store 350 and accessible in real-time via network 360.

As used herein, a neural network comprises at least three operational layers. The three layers can include an input layer, a hidden layer, and an output layer. Each layer comprises neurons. The input layer neurons pass data to neurons in the hidden layer. Neurons in the hidden layer pass data to neurons in the output layer. The output layer then produces a classification. Different types of layers and networks connect neurons in different ways.

Every neuron has weights, an activation function that defines the output of the neuron given an input (including the weights) , and an output. The weights are the adjustable parameters that cause a network to produce the correct output. The weights are adjusted during training. Once trained, the weight associated with a given neuron can remain fixed. The other data passing between neurons can change in response to a given input (e.g., image) .

The neural network may include many more than three layers. Neural networks with more than one hidden layer may be called deep neural networks. Example neural networks that may be used with aspects of the technology described herein include, but are not limited to, multilayer perceptron (MLP) networks, convolutional neural networks (CNN) , recursive neural networks, recurrent neural networks, and long short-term memory (LSTM) (which is a type of recursive neural network) . Some embodiments described herein use a convolutional neural network, but aspects of the technologies apply to other types of multi-layer machine classification technologies.

CNN may include any number of layers. The objective of one type of layers (e.g., Convolutional, Relu, and Pool) is to extract features of the input volume, while the objective of another type of layers (e.g., fully connected (FC) and Softmax) is to classify based on the extracted features. An input layer may hold values associated with an instance. For example, when the instance is an image (s) , the input layer may hold values representative of the raw pixel values of the image (s) as a volume (e.g., a width, W, a height, H, and color channels, C (e.g., RGB) , such as W x H x C) , or a batch size, B.

One or more layers in the CNN may include convolutional layers. The convolutional layers may compute the output of neurons that are connected to local regions in an input layer (e.g., the input layer) , each neuron computing a dot product between their weights and a small region they are connected to in the input volume. In a convolutional process, a filter, a kernel, or a feature detector includes a small matrix used for feature detection. Convolved features, activation maps, or feature maps are the output volume formed by sliding the filter over the image and computing the dot product. An exemplary result of a convolutional layer may include another volume, with one of the dimensions based on the number of filters applied (e.g., the width, the height, and the number of filters, F, such as W x H x F, if F were the number of filters) .

One or more of the layers may include a rectified linear unit (ReLU) layer. The ReLU layer (s) may apply an elementwise activation function, such as the max (0, x) , thresholding at zero, for example, which turns negative values to zeros (thresholding at zero) . The resulting volume of a ReLU layer may be the same as the volume of the input of the ReLU layer. This layer does not change the size of the volume, and there are no hyperparameters.

One or more of the layers may include a pooling layer. A pooling layer performs a function to reduce the spatial dimensions of the input and control overfitting. This layer may use various functions, such as Max pooling, average pooling, or L2-norm pooling. In some embodiments, max pooling is used, which only takes the most important part (e.g., the value of the brightest pixel) of the input volume. By way of example, a pooling layer may perform a down-sampling operation along the spatial dimensions (e.g., the height and the width) , which may result in a smaller volume than the input of the pooling layer (e.g., 16 x 16 x 12 from the 32 x 32 x 12 input volume) . In some embodiments, the convolutional network may not include any pooling layers. Instead, strided convolutional layers may be used in place of pooling layers.

One or more of the layers may include a fully connected (FC) layer. An FC layer connects every neuron in one layer to every neuron in another layer. The last FC layer normally uses an activation function (e.g., Softmax) for classifying the generated features of the input volume into various classes based on the training dataset. The resulting volume may take the form of 1 x 1 x number of classes.

Further, calculating the length or magnitude of vectors is often required either directly as a regularization method in machine learning, or as part of broader vector or matrix operations. The length of the vector is referred to as the vector norm or the vector’s magnitude. The L1 norm is calculated as the sum of the absolute values of the vector. The L2 norm is calculated as the square root of the sum of the squared vector values. The max norm is calculated as the maximum vector values.

As discussed previously, some of the layers may include parameters (e.g., weights or biases) , such as a convolutional layer, while others may not, such as the ReLU layers and pooling layers, for example. In various embodiments, the parameters may be learned or updated during training. Further, some of the layers may include additional hyper-parameters (e.g., learning rate, stride, epochs, kernel size, number of filters, type of pooling for pooling layers, etc. ) , such as a convolutional layer or a pooling layer, while other layers may not, such as a ReLU layer. Various activation functions may be used, including but not limited to, ReLU, leaky ReLU, sigmoid, hyperbolic tangent (tanh) , exponential linear unit (ELU) , etc. The parameters, hyper-parameters, or activation functions are not to be limited and may differ depending on the embodiment.

Although input layers, convolutional layers, pooling layers, ReLU layers, and fully connected layers are discussed herein, this is not intended to be limiting. For example, additional or alternative layers, such as normalization layers, Softmax layers, or other layer types, may be used in a CNN. Different orders and layers in a CNN may be used depending on the embodiment.

Although many examples are described herein concerning using neural networks, and specifically convolutional neural networks, this is not intended to be limiting. For example, and without limitation, MLM 340 may include any type of machine learning models, such as a machine learning model (s) using linear regression, logistic regression, decision trees, support vector machines (SVM) ,

Bayes, k-nearest neighbor (KNN) , K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long or short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc. ) , or other types of machine learning models.

Checkout system 300 is merely one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of aspects of the technologies described herein. Neither should this system be interpreted as having any dependency or requirement relating to any one component nor any combination of components illustrated.

It should be understood that this arrangement of various components in positor 330 is set forth only as an example. Other arrangements and elements (e.g., machines, networks, interfaces, functions, orders, and grouping of functions, etc. ) can be used in addition to or instead of those shown, and some elements may be omitted altogether for the sake of clarity. Further, many of the elements described herein are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, and any suitable combination and location. Various functions described herein as being performed by an entity may be carried out by hardware, firmware, or software. For instance, some functions may be carried out by a processor executing instructions stored in memory.

It should be understood that each of the components shown in positor 330 may be implemented on any type of computing device, such as computing device 600 described in FIG. 6. Further, each of the components may communicate with various external devices via a network, which may include, without limitation, a local area network (LAN) or a wide area network (WAN) .

FIG. 4 is a flow diagram illustrating an exemplary process of express checkout. Each block of process 400, and other processes described herein, comprises a computing process that may be performed using any combination of hardware, firmware, or software. For instance, various functions may be carried out by a processor executing instructions stored in memory. The process may also be embodied as computer-usable instructions stored on computer storage media or devices. The process may be provided by an application, a service, or in combination thereof.

One main process for express checkout includes receiving an image with products inside, detecting an exception that requires a reposition of a product to check out all products in the image; and generating and displaying a message for the exception, e.g., to indicate the product to be repositioned.

Specifically, at block 410, the process is to recognize products in the image. This block may include two stages. At the first stage, the disclosed system may detect, via a first machine learning model (e.g., an object detection model) , one or more objects in the image. At the second stage, the disclosed system may retrieve, via a second machine learning model (e.g., a neural-feature-based retrieval model) , respective product classes of the detected objects, and respective confidence scores of the respective product classes. In some embodiments, the product class with the highest confidence score is used to represent the detected object. In other embodiments, multiple product classes are used to generate product options for users to determine the true product class, which will be further discussed in connection with block 420 below.

At block 420, the process is to detect exceptions. Once a product class is assigned to an object, related product information may be retrieved from the product database, including identifier, name, price (e.g., unit price, sale-by-weight, etc. ) , restrictions (e.g., age, quantity, etc. ) , and other product information. If a product class has a weight-based checkout condition (e.g., sale-by-weight) , the process can generate a weight-based exception. If a product class has a restriction-based checkout condition (e.g., restriction-by-age) , the process can generate a restriction-based exception.

However, sometimes the process may not be able to confidently assign a product class to an object, e.g., due to overlap. Overlap is a common hindrance for computer-vision-based recognition as overlap would block the line-of-sight between a camera and an object. To detect an overlap exception, the process is first to determine an overlap ratio between any two objects in the image. An object detection model can output the bounding boxes for the detected objects. In some embodiments, the process is to determine an overlap ratio of two objects based on respective bounding boxes of the two objects. This part of the process is discussed in more detail in connection with FIG. 5. The disclosed system often can still accurately recognize products even with some overlaps. Accordingly, the process may only generate an overlap exception if the overlap ratio being greater than an overlap threshold and the confidence score of at least one of the two objects being less than a confidence threshold.

As discussed previously, retriever 334 may determine potential product classes of an object and respective confidence scores of the potential product classes. Sometimes, all the confidence scores may be too low to be acceptable. Further, reasons other than overlap may also prevent accurate product recognition or lead to a low confidence score. By way of example, two different beer bottles may share an identical bottle design, and their bottom views may reveal no visual difference. If viewed from the side, the beer labels would clearly show that they have different brands or product types. If only the bottom view is captured in the image, the retrieval model may come up with low confidence scores for potential product classes. In this case, the process may not be able to confidently assign a product class to the object. Accordingly, the process may generate a low-confidence exception in response to the highest confidence score being below a threshold. However, in other embodiments, the process can still output several potential product classes for user selection.

At block 430, the process is to resolve exceptions. Different exceptions require different resolutions. Many resolutions include a step of reposition. For example, to resolve a low-confidence exception, the disclosed system may require the customer to reposition the affected product. Accordingly, the disclosed system may recognize the repositioned product. In some embodiments, both images before and after the repositioning operation are used for product recognition as more image features may be extracted from these images showing the product from different perspectives. Optionally, as discussed in block 420, multiple potential product classes may be provided to the customer as selectable options, and the customer may determine the true product class from these options. In various embodiments, the affected product (s) may be announced to the customer via a GUI or VUI, e.g., by highlighting the affected product (s) as a resolution cue in a GUI, as illustrated in FIGS. 1-2.

To resolve an overlap-based exception, the disclosed system may similarly communicate the overlap-based exception information and resolution cues via a GUI or VUI, so that the customer may reposition the affected products. In connection with FIG. 3, to reposition a product, the customer may simply put the product to desktop 314A or a different location in shopping cart 322. To resolve a weight-based exception, the disclosed system may communicate the weight-based exception information and resolution cues via a GUI or VUI, so that the customer may weigh the affected product. To resolve a restriction-based exception, the disclosed system may provide restriction-specific cues to the customer. Some restriction-based exceptions may be resolved by requesting the customer to verify a personal identification. For example, an age-based exception may be resolved if the customer can scan a government-issued or store-issued identification card, and the express checkout system can verify the age information on the identification card. As another example, for quantify-restriction, the disclosed system may remind the customer to remove the excessive products. However, for some restriction-based exceptions, the disclosed system may have to resort to store staff. In various embodiments, the product causing the exception may be highlighted in a synthetic image with overlay messages (e.g., cues for resolving the exception) , and the synthetic image may be communicated to the customer or a store clerk so that the exception can be resolved quickly.

At block 440, the process is to perform the express checkout, which includes finalizing the checkout list, tallying the sum, clearing the payment, etc. Noticeably, many shopping sessions wouldn’t encounter any exceptions as the disclosed system can accurately recognize products in general. This means that by simply parking a shopping cart under a camera, a customer can check out all products in the shopping cart quickly without scanning any MRLs or unloading any products from the shopping cart, which is a significant improvement over existing checkout machines.

FIG. 5 is a flow diagram illustrating an exemplary process of express checkout. Each block of process 500, and other processes described herein, comprises a computing process that may be performed using any combination of hardware, firmware, or software. For instance, various functions may be carried out by a processor executing instructions stored in memory. The process may also be embodied as computer-usable instructions stored on computer storage media or devices. The process may be provided by an application, a service, or in combination thereof.

Process 500 is a specific type of process 400. At block 510, the process is to collect images, such as an image of a designated checkout area with a shopping cart. At block 520, the process is to detect objects in the image. As discussed previously, various object detection models may be used for object detection.

At block 520, the process is to retrieve potential product classes and their associated scores via various product recognition models as discussed previously. In some embodiments, the retrieval process is executed differently in different iterations, especially if an exception is detected. At a high level, the disclosed system caches the features of recognized products in a session to form a novel cached feature space. After an iteration of repositioning, the disclosed system may try to recognize a product by utilizing the cached feature space first instead of the global feature space. As the cached feature space could be many orders of magnitude smaller than the global feature space, product recognition with the cached feature space could be many orders of magnitude faster than without the cached feature space.

Specifically, in the first iteration, the query product (e.g., neural features of the product image derived from the neural network) has to compared with all known products to determine a ranked list of product classes that potentially match the query product. In one embodiment, the features of the recognized product are cached for future inquiries. In the subsequent iterations, for example, for a product in another image taken after a repositioning operation, the product may be compared with the cached features first. In one embodiment, the recognized product class is cached for future inquiries. In the subsequent iterations, for example, for a product in another image taken after a repositioning operation, the product may be compared with the cached product classes first. With this improved cache mechanism, the retrieval process in the subsequent iterations usually runs much faster than in the first iteration. Resultantly, the disclosed system can quickly recognize all products after a repositioning operation.

At block 510, the process is to detect overlaps among the products in the image. In various embodiments, the process is to determine an overlap ratio between two objects. An object detection model can output the bounding boxes for the detected objects. The overlap ratio between two objects may then be calculated based on the two bounding boxes of the two objects. A bounding box is often represented by the coordinates of two diagonal corners of the bounding box. Accordingly, respective areas of the two bounding boxes (referred to as S _a and S _b) and the overlapped area (referred to as S _o) can be derived from the two sets of coordinates.

In one embodiment, a single overlap ratio (referred to as R ₁) is obtained from Eq. 1. The process may generate an overlap exception if R ₁ is greater than a threshold. In another embodiment, two overlap ratios (referred to as R ₂ &R ₃) are obtained from Eq. 2. The process may generate an overlap exception if R ₂ or R ₃ is greater than a threshold.

At block 540, the process is to determine whether there is an overlap-based exception. Not every overlap would trigger an overlap-based exception because the disclosed system is capable to accurately recognize products with some overlaps. In some embodiments, the process generates an overlap exception in response to the overlap ratio being greater than an overlap threshold and the highest confidence score of the potential product classes being less than a confidence threshold. In other words, the system would generate an overlap exception only if the system cannot recognize the overlapped products in these embodiments. In other embodiments, to further improve the accuracy for product recognition, the system may automatically generate an overlap-based exception for any detected overlaps.

If an overlap-based exception is generated, the process is to request a repositioning operation at block 592. Otherwise, the process will proceed to block 560. At block 592, the system may provide reposition cues or other instructions to help the customer resolve the overlap-based exception. After receiving an input (e.g., via the “Completed” button in container 122, as shown in FIG. 1) that the repositioning operation has completed, the process may then proceed to block 510 to collect another image of the products.

At block 560, the process is to determine whether there is a weight-based exception. The product information retrieved at block 530 may indicate whether the product class has a weight-based checkout condition. Accordingly, the process can generate a weight-based exception in response to the product class having a weight-based checkout condition.

If a weight-based exception is generated, the process is to request a weighing operation at block 594. Otherwise, the process will proceed to block 570. At block 594, the system may provide cues or other instructions to help the customer resolve the weight-based exception, e.g., requesting the customer to place the product on a scale. After receiving an input (e.g., via the “Completed” button in container 112, as shown in FIG. 1) that the weighing operation has completed, the process may then retrieve the weight information from the scale. The weight information may be provided by the scale directly. Alternatively, the system may recognize, via a machine learning model, the weight information from an image capturing the scale’s display. Accordingly, the system may determine a price for the product based on the weight information. After receiving the weight information, the process will proceed to block 580. Sometimes, a customer may understand that a product must be weighted for checkout, and may voluntarily place the product on the scale. In this case, the system will recognize the displayed information of the weight from the image showing the scale, recognize a product class of the product placed on the scale, and to calculate a price for checking out the product based on the product class and the weight of the product.

At block 570, the process is to determine whether there is another type of exception, e.g., restriction-based exception. The product information retrieved at block 530 may indicate whether the product class has any restriction-based -based checkout conditions. Accordingly, the process can generate a restriction-based exception in response to the product class having a restriction-based checkout condition. If another exception is generated at block 570, the process will proceed to resolve the exception at block 596. Otherwise, the process will proceed to block 580. After resolving the remaining exception at block 596, the process will also proceed to block 580. At block 580, the process will perform the checkout operation, similarly as in block 440 of FIG. 4.

Accordingly, we have described various aspects of the disclosed technologies for video recognition. Each block in process 400 or process 500 and other processes described herein comprises a computing process that may be performed using any combination of hardware, firmware, or software. For instance, various functions may be carried out by a processor executing instructions stored in memory. The processes may also be embodied as computer-usable instructions stored on computer storage media or devices. The process may be provided by an application, a service, or a combination thereof.

It is understood that various features, sub-combinations, and modifications of the embodiments described herein are of utility and may be employed in other embodiments without reference to other features or sub-combinations. Moreover, the order and sequences of steps or blocks shown in the above example processes are not meant to limit the scope of the present disclosure in any way and the steps or blocks may occur in a variety of different sequences within embodiments hereof. Such variations and combinations thereof are also contemplated to be within the scope of embodiments of this disclosure.

Referring to FIG. 6, an exemplary operating environment for implementing various aspects of the technologies described herein is shown and designated generally as computing device 600. Computing device 600 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use of the technologies described herein. Neither should the computing device 600 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated.

The technologies described herein may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine. Generally, program components, including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks or implements particular abstract data types. The technologies described herein may be practiced in a variety of system configurations, including handheld devices, consumer electronics, general-purpose computers, and specialty computing devices, etc. Aspects of the technologies described herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communications network.

With continued reference to FIG. 6, computing device 600 includes a bus 610 that directly or indirectly couples the following devices: memory 620, processors 630, presentation components 640, input/output (I/O) ports 650, I/O components 660, and an illustrative power supply 670. Bus 610 may include an address bus, data bus, or a combination thereof. Although the various blocks of FIG. 6 are shown with lines for the sake of clarity, in reality, delineating various components is not so clear and, metaphorically, the lines would more accurately be grey and fuzzy. For example, one may consider a presentation component such as a display device to be an I/O component. The inventors hereof recognize that such is the nature of the art and reiterate the diagram of FIG. 6 is merely illustrative of an exemplary computing device that can be used in connection with different aspects of the technologies described herein. Distinction is not made between such categories as “workstation, ” “server, ” “laptop, ” “handheld device, ” etc., as all are contemplated within the scope of FIG. 6 and refers to “computer” or “computing device. ”

Computing device 600 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computing device 600 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technologies for storage of information, such as computer-readable instructions, data structures, program modules, or other data.

Computer storage media includes RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disks (DVD) , or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Computer storage media does not comprise a propagated data signal.

Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.

Memory 620 includes computer storage media in the form of volatile or nonvolatile memory. The memory 620 may be removable, non-removable, or a combination thereof. Exemplary memory includes solid-state memory, hard drives, optical-disc drives, etc. Computing device 600 includes processors 630 that read data from various entities, such as bus 610, memory 620, or I/O components 660. Presentation component (s) 640 present data indications to a user or other device. Exemplary presentation components 640 include a display device, speaker, printing component, vibrating component, etc. I/O ports 650 allow computing device 600 to be logically coupled to other devices, including I/O components 660, some of which may be built-in.

In various embodiments, memory 620 includes, in particular, temporal and persistent copies of express checkout (XC) logic 622. XC logic 622 includes instructions that, when executed by processors 630, result in computing device 600 performing functions, such as but not limited to, process 400, process 500, or other processes discussed in connection with FIGS. 1-3. In various embodiments, XC logic 622 includes instructions that, when executed by processors 630, result in computing device 600 performing various functions associated with, but not limited to, various components in checkout system 300 in FIG. 3.

In some embodiments, processors 630 may be packed together with XC logic 622. In some embodiments, processors 630 may be packaged together with XC logic 622 to form a System in Package (SiP) . In some embodiments, processors 630 can be integrated on the same die with XC logic 622. In some embodiments, processors 630 can be integrated on the same die with XC logic 622 to form a System on Chip (SoC) .

Illustrative I/O components include a microphone, joystick, gamepad, satellite dish, scanner, printer, display device, wireless device, a controller (such as a stylus, a keyboard, and a mouse) , a natural user interface (NUI) , and the like. In aspects, a pen digitizer (not shown) and accompanying input instrument (also not shown but which may include, by way of example only, a pen or a stylus) are provided to digitally capture freehand user input. The connection between the pen digitizer and processor (s) 630 may be direct or via a coupling utilizing a serial port, parallel port, system bus, or other interface known in the art. Furthermore, the digitizer input component may be a component separate from an output component, such as a display device. In some aspects, the usable input area of a digitizer may coexist with the display area of a display device, be integrated with the display device, or may exist as a separate device overlaying or otherwise appended to a display device. Any and all such variations, and any combination thereof, are contemplated to be within the scope of aspects of the technologies described herein.

I/O components 660 include various GUI, which allow users to interact with computing device 600 through graphical elements or visual indicators. Interactions with a GUI usually are performed through direct manipulation of graphical elements in the GUI. Generally, such user interactions may invoke the business logic associated with respective graphical elements in the GUI. Two similar graphical elements may be associated with different functions, while two different graphical elements may be associated with similar functions. Further, the same GUI may have different presentations on different computing devices, such as based on the different graphical processing units (GPUs) or the various characteristics of the display.

Computing device 600 may include networking interface 680. The networking interface 680 includes a network interface controller (NIC) that transmits and receives data. The networking interface 680 may use wired technologies (e.g., coaxial cable, twisted pair, optical fiber, etc. ) or wireless technologies (e.g., terrestrial microwave, communications satellites, cellular, radio and spread spectrum technologies, etc. ) . Particularly, the networking interface 680 may include a wireless terminal adapted to receive communications and media over various wireless networks. Computing device 600 may communicate with other devices via the networking interface 680 using radio communication technologies. The radio communications may be a short-range connection, a long-range connection, or a combination of both a short-range and a long-range wireless telecommunications connection. A short-range connection may include a

connection to a device (e.g., mobile hotspot) that provides access to a wireless communications network, such as a wireless local area network (WLAN) connection using the 802.11 protocol. A Bluetooth connection to another computing device is a second example of a short-range connection. A long-range connection may include a connection using various wireless networks, including 1G, 2G, 3G, 4G, 5G, etc., or based on various standards or protocols, including General Packet Radio Service (GPRS) , Enhanced Data rates for GSM Evolution (EDGE) , Global System for Mobiles (GSM) , Code Division Multiple Access (CDMA) , Time Division Multiple Access (TDMA) , Long-Term Evolution (LTE) , 802.16 standards, etc.

The technologies described herein have been described with particular aspects, which are intended in all respects to be illustrative rather than restrictive. While the technologies described herein are susceptible to various modifications and alternative constructions, certain illustrated aspects thereof are shown in the drawings and have been described above in detail. It should be understood, however, there is no intention to limit the technologies described herein to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the technologies described herein.

Lastly, by way of example, and not limitation, the following examples are provided to illustrate various embodiments, following at least one aspect of the disclosed technologies.

Examples in the first group comprise a system for express checkout with one or more of the following features. The order of the following features is not to limit the scope of any examples in this group. A camera, positioned to cover an area, adapted to capture an image with products in the area. A processor, operationally coupled to the camera, configured to detect an exception that requires a reposition of at least one of the plurality of products to check out all of the plurality of products in the image. A display, operationally coupled to the camera and the processor, adapted to present a message associated with the exception. The message may comprise a text. A graphical user interface on the display, adapted to present an extracted portion of the image that provides a first cue to resolve the exception, and to present a text message as a second cue to resolve the exception. A graphical user interface element on the display, adapted to receive an input for indicating the completion of a reposition operation or a weighing operation, and the processor is further configured to execute the next step for express checkout in response to the input. A radio-frequency module, operationally coupled to the camera and the display, adapted to transmit the image to the processor, and to receive the message. A scale, operationally coupled to the processor, configured to measure a weight of a product placed on the scale, and to display information of the weight. Another camera, positioned to cover the scale, configured to capture, in another image, the displayed information of the weight and the product placed on the scale. The processor may be further configured to recognize the displayed information of the weight from the another image, to recognize a product class of the product placed on the scale, and to calculate a price for checking out the product based on the product class and the weight of the product. The exception may comprise an overlap exception, and the processor is further configured to determine an overlap ratio of the at least one of the plurality of products. The processor may be further configured to determine a confidence score for recognizing a product class of the at least one of the plurality of products. The processor may be further configured to generate the overlap exception in response to the overlap ratio being greater than a first threshold and the confidence score being less than a second threshold. The exception may comprise a weight exception, and the processor is further configured to recognize a product class of the at least one of the plurality of products, and generate the weight exception in response to the product class having a weight-based checkout condition. The processor may be further configured to highlight the at least one of the plurality of products in a generated image, and to add the generated image to the message.

Examples in the second group comprise a method, a computer system adapted to perform the method, or a computer storage device storing computer-usable instructions that cause a computer system to perform the method. The method has one or more of the following features. The order of the following features is not to limit the scope of any examples in this group.

A feature of receiving an image with a plurality of products. A feature of detecting an exception that requires a reposition of a product of the plurality of products to check out all of the plurality of products. A feature of generating a message to indicate the product to be repositioned. A feature of detecting, via a first machine learning model, a plurality of objects in the image. A feature of recognizing, via a second machine learning model, respective product classes of the plurality of objects, and respective confidence scores of the respective product classes. A feature of wherein the exception comprises an overlap exception. A feature of determining an overlap ratio of two objects of the plurality of objects based on respective bounding boxes of the two objects. A feature of generating the overlap exception in response to the overlap ratio being greater than a first threshold and a confidence score of a product class of one of the two objects being less than a second threshold. A feature of receiving another image with the product being repositioned. A feature of recognizing, via the second machine learning model, a product class of another product based on features of a recognized object at a previous recognition iteration. A feature of wherein the exception comprises a weight exception. A feature of generating the weight exception in response to a product class of the product having a weight-based checkout condition. A feature of receiving another image with the product and weight information. A feature of recognizing, via a third machine learning model, the weight information. A feature of determining a price for the product based on the weight information.

Examples in the third group comprise a method, a computer system adapted to perform the method, or a computer storage device storing computer-usable instructions that cause a computer system to perform the method. The method has one or more of the following features. The order of the following features is not to limit the scope of any examples in this group.

A feature of receiving an image with a plurality of objects. A feature of detecting an exception that requires a reposition of an object of the plurality of objects for express checkout. A feature of generating a message to indicate the object to be repositioned. A feature of recognizing respective classes of the plurality of objects, and respective confidence scores of the respective classes. A feature of generating the exception in response to one of the respective confidence scores being below a threshold. A feature of receiving a second image with the object having been repositioned. A feature of recognizing a class of the object based on both the first image and the second image. A feature of wherein the object is a first object, and a feature of recognizing a class of a second object in the second image based on a feature match between the second object in the second image and a third object in the first image.

All patent applications, patents, and printed publications cited herein are incorporated herein by reference in the entireties, except for any definitions, subject matter disclaimers or disavowals, and except to the extent that the incorporated material is inconsistent with the express disclosure herein, in which case the language in this disclosure controls.

Claims

A system for express checkout, comprising: a camera, positioned to cover a designated checkout area, adapted to capture an image with a plurality of products in the designated checkout area; a processor, operationally coupled to the camera, configured to detect an exception that requires a reposition of at least one of the plurality of products to check out all of the plurality of products in the image; and a display, operationally coupled to the camera and the processor, adapted to present a message associated with the exception.
The system of claim 1, wherein the message comprises a text message, the system further comprising: a graphical user interface on the display, adapted to present an extracted portion of the image that provides a first cue to resolve the exception, and to present the text message as a second cue to resolve the exception.
The system of claim 1, further comprising: a radio-frequency module, operationally coupled to the camera and the display, adapted to transmit the image to the processor, and to receive the message.
The system of claim 1, further comprising: a scale, operationally coupled to the processor, configured to measure a weight of a product placed on the scale, and to display information of the weight; and another camera, positioned to cover the scale, configured to capture, in another image, the displayed information of the weight and the product placed on the scale.
The system of claim 4, wherein the processor is further configured to recognize the displayed information of the weight from the another image, to recognize a product class of the product placed on the scale, and to calculate a price for checking out the product based on the product class and the weight of the product.
The system of claim 1, wherein the exception comprises an overlap exception, and the processor is further configured to determine an overlap ratio of the at least one of the plurality of products.
The system of claim 6, wherein the processor is further configured to determine a confidence score for recognizing a product class of the at least one of the plurality of products.
The system of claim 7, wherein the processor is further configured to generate the overlap exception in response to the overlap ratio being greater than a first threshold and the confidence score being less than a second threshold.
The system of claim 1, wherein the exception comprises a weight exception, and the processor is further configured to recognize a product class of the at least one of the plurality of products, and generate the weight exception in response to the product class having a weight-based checkout condition.
The system of any one of claims 1-9, wherein the processor is further configured to highlight the at least one of the plurality of products in a generated image, and to add the generated image to the message.
A computer-implemented method for express checkout, comprising: receiving an image with a plurality of products in a designated checkout area; detecting an exception that requires a reposition of a product of the plurality of products to check out all of the plurality of products; and generating a message to indicate the product to be repositioned.
The computer-implemented method of claim 11, further comprising: detecting, via a first machine learning model, a plurality of objects in the image; and recognizing, via a second machine learning model, respective product classes of the plurality of objects, and respective confidence scores of the respective product classes.
The computer-implemented method of claim 12, wherein the exception comprises an overlap exception, the computer-implemented method further comprising: determining an overlap ratio of two objects of the plurality of objects based on respective bounding boxes of the two objects; and generating the overlap exception in response to the overlap ratio being greater than a first threshold and a confidence score of a product class of one of the two objects being less than a second threshold.
The computer-implemented method of claim 12, further comprising: receiving another image with the product being repositioned; and recognizing, via the second machine learning model, a product class of another product based on features of a recognized object at a previous recognition iteration.
The computer-implemented method of claim 12, wherein the exception comprises a weight exception, the computer-implemented method further comprising: generating the weight exception in response to a product class of the product having a weight-based checkout condition.
The computer-implemented method of any one of claims 12-15, further comprising: receiving another image with the product and weight information; recognizing, via a third machine learning model, the weight information; and determining a price for the product based on the weight information.
A non-transitory computer-readable storage device encoded with instructions that, when executed, cause one or more processors of a system to perform operations of express checkout, comprising: receiving an image with a plurality of objects in a designated area; detecting an exception that requires a reposition of an object of the plurality of objects for express checkout; and generating a message to indicate the object to be repositioned.
The non-transitory computer-readable storage device of claim 17, wherein the instructions that, when executed, further cause the one or more processors to perform operations comprising: recognizing respective classes of the plurality of objects, and respective confidence scores of the respective classes; and generating the exception in response to one of the respective confidence scores being below a threshold.
The non-transitory computer-readable storage device of claim 18, wherein the image is a first image, wherein the instructions that, when executed, further cause the one or more processors to perform operations comprising: receiving a second image with the object having been repositioned; and recognizing a class of the object based on both the first image and the second image.
The non-transitory computer-readable storage device of any one of claims 17-19, wherein the object is a first object, wherein the instructions that, when executed, further cause the one or more processors to perform operations comprising: recognizing a class of a second object in the second image based on a feature match between the second object in the second image and a third object in the first image.