WO2020228437A1 - Apparatus and methods for multi-sourced checkout verification - Google Patents

Apparatus and methods for multi-sourced checkout verification Download PDF

Info

Publication number
WO2020228437A1
WO2020228437A1 PCT/CN2020/082735 CN2020082735W WO2020228437A1 WO 2020228437 A1 WO2020228437 A1 WO 2020228437A1 CN 2020082735 W CN2020082735 W CN 2020082735W WO 2020228437 A1 WO2020228437 A1 WO 2020228437A1
Authority
WO
WIPO (PCT)
Prior art keywords
product
product information
camera
image
checkout machine
Prior art date
Application number
PCT/CN2020/082735
Other languages
French (fr)
Inventor
Matthew Robert SCOTT
Xiaoji Li
Xianbin Zhang
Qiong Wu
Xiaolv SONG
Haihan Wang
Sheng Guo
Weilin Huang
Original Assignee
Shenzhen Malong Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from PCT/CN2019/086367 external-priority patent/WO2020227845A1/en
Priority claimed from PCT/CN2019/111643 external-priority patent/WO2021072699A1/en
Priority claimed from US16/672,883 external-priority patent/US20210110189A1/en
Application filed by Shenzhen Malong Technologies Co., Ltd. filed Critical Shenzhen Malong Technologies Co., Ltd.
Publication of WO2020228437A1 publication Critical patent/WO2020228437A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/20Point-of-sale [POS] network systems
    • G06Q20/208Input by product or record sensing, e.g. weighing or scanner processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/20Point-of-sale [POS] network systems
    • G06Q20/203Inventory monitoring
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07GREGISTERING THE RECEIPT OF CASH, VALUABLES, OR TOKENS
    • G07G1/00Cash registers
    • G07G1/0036Checkout procedures
    • G07G1/0045Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader
    • G07G1/0054Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader with control of supplementary check-parameters, e.g. weight or number of articles

Definitions

  • self-checkout solutions are popular for retail success, particularly for grocery stores and supermarkets.
  • Most self-checkout machines have the following components, including a lane light, a touchscreen monitor, a basket stand, a barcode scanner, a weighing scale, and a payment module.
  • a customer can scan product barcodes, weight products (such as fresh produce without barcodes) and select the product type on a display, pay the products, bag the purchased products, and exit the store without any interactions with a cashier or a clerk.
  • a clerk is typically assigned to supervise a group of self-checkout machines or lanes, so the clerk can assist customers when required, such as authorizing the sale of restricted products (e.g., alcohol, tobacco, etc. ) .
  • restricted products e.g., alcohol, tobacco, etc.
  • UPC universal product code
  • Self-checkout machines and the UPC system are designed for regular checkout transactions. Irregular checkout activities, such as non-scan or irregular scans, would disrupt the normal checkout process and lead to retail shrinkage. Retail shrinkage or shrinkage means there are fewer items in stock than the inventory list. Shrinkage reduces profits for retailers, which may lead to increased prices for consumers. Many of the shrinkage problems happen at the POS due to irregular checkout activities. A technical solution is needed for automated checkout verification at the POS for loss prevention.
  • aspects of this disclosure include an apparatus that may be conveniently installed next to a checkout machine without modifying the existing components or interrupting the regular functions of the checkout machine.
  • the apparatus includes one or more imaging devices and a display.
  • the apparatus is configured to crosscheck product information derived from multi-sourced image data generated from the one or more imaging devices, further, to present corresponding information on the display based on the verification result.
  • the apparatus may serve as a general-purpose loss prevention measure for many types of checkout machines.
  • systems, methods, and computer-readable storage devices are provided to improve a computing system’s ability for image-based checkout verification in general.
  • one aspect of the technologies described herein is to improve the efficiency of a computing system’s functions to perform product recognition and verification tasks by crosschecking product information derived from multiple sources.
  • Another aspect of the technologies described herein is to improve a computing system’s ability to synchronize image data from multiple sources for verification.
  • Yet another aspect of the technologies described herein is to improve a computing system’s ability to perform various functions or other practical applications in response to the verification outcomes, which are further discussed in the DETAILED DESCRIPTION.
  • FIG. 1 is a schematic representation illustrating an exemplary system in an exemplary operating environment, in accordance with at least one aspect of the technologies described herein;
  • FIG. 2 are various schematic representations illustrating respective embodiments of an exemplary apparatus, in accordance with at least one aspect of the technologies described herein;
  • FIG. 3 is a schematic representation illustrating an exemplary process to detect product information from multiple sources, in accordance with at least one aspect of the technologies described herein;
  • FIG. 4 is a flow diagram illustrating an exemplary process of verification, in accordance with at least one aspect of the technologies described herein;
  • FIG. 5 is a flow diagram illustrating another exemplary process of verification, in accordance with at least one aspect of the technologies described herein;
  • FIG. 6 is a block diagram of an exemplary computing environment suitable for use in implementing various aspects of the technologies described herein.
  • MRLs machine-readable labels
  • a manufacturer e.g., a UPC label on a TV
  • a retailer e.g., a UPC label for an apple in a supermarket
  • MRLs may be read by scanning devices for automatic identification and data capture, e.g., supporting transactions at various POS locations, tracking inventory at warehouses, facilitating transportation of goods in commerce, etc.
  • Self-checkout also known as self-service checkout, is an alternative to the traditional cashier-staffed checkout, where self-checkout machines are provided for customers to process their purchases from a retailer. Checkout machines designed for cashier-staffed checkout or self-checkout solutions can provide great benefits for the retail industry, e.g., by improving the productivity and accuracy of the checkout process.
  • MRLs may become missing, e.g., during the transportation process or due to mishandling. Sometimes, MRLs may become illegible, e.g., due to damage or smear.
  • MRLs may be intentionally misplaced and affixed to unintended products, e.g., by a fraud known as ticket switch, in which scammers intentionally switch MRLs to pay less to merchants.
  • a fraud known as ticket switch
  • scammers intentionally switch MRLs to pay less to merchants.
  • a lower-priced MRL is fraudulently affixed to a higher-priced product, so that the higher-priced item could be purchased for less.
  • the operator e.g., a cashier or a customer
  • checkout verification or verification refers to the process or the outcome of crosschecking product information obtained by a checkout machine against product information obtained from another source independent from the checkout machine.
  • one aspect of this disclosure includes an apparatus with one or more imaging devices.
  • the apparatus is configured to crosscheck product information derived from multi-sourced image data generated from the one or more imaging devices, further, to present corresponding audio or visual information (e.g., via a speaker or a display) based on the verification result.
  • the disclosed apparatus may be conveniently installed next to a checkout machine without modifying the existing components or interrupting the regular functions of the checkout machine.
  • the apparatus may serve as a general-purpose loss prevention measure for many types of checkout machines, including cashier-staffed checkout machines or self-checkout machines.
  • the disclosed apparatus has two cameras.
  • the first camera is adapted to capture one or more images of the display of the checkout machine.
  • the display of the checkout machine contains product information recognized by the checkout machine, e.g., based on a machine reading of an MRL.
  • the second camera is adapted to capture one or more images of the product being checked out directly.
  • the product in the image may be recognized, and corresponding product information may be retrieved from a product database based on such product recognition technologies.
  • the computing process associated with the apparatus can determine the respective product information associated with the product based on the images from the first camera and the second camera respectively. Further, the computing process can crosscheck the product information from the two independent sources. A positive verification code may be generated if the product information derived from the two independent sources are consistent. Conversely, a negative verification code may be generated if the product information derived from the two independent sources are inconsistent.
  • the disclosed apparatus may take necessary measures, e.g., by displaying a message, playing an audible message, activating a warning light, etc., to remind the customer to rescan the product if the checkout machine did not receive the correct product information, e.g., due to a failure of the MRL reading, a mismatch between the physical product and the MRL, etc.
  • the disclosed apparatus may send an electronic message (e.g., including the nature of the event, the relevant images, the product information, etc. ) to a remote device, such as a server or a wireless device, which could be monitored by a loss prevention staff member.
  • a remote device such as a server or a wireless device, which could be monitored by a loss prevention staff member.
  • different messages or loss prevention actions may be configured based on the specific implementation of the disclosed technologies.
  • one aspect of the disclosure is to perform a verification function for any conventional checkout machines, wherein a conventional checkout machine lacks such verification function.
  • Another aspect of the disclosure is to perform such verification function based on product information derived from multiple sources, including at least one source independent from the checkout machine, e.g., an image of the actual product captured by the apparatus.
  • An independent source can increase the authenticity of such verification and enable a secure solution that is more resilient to tampering.
  • Yet another aspect of the disclosure is for the apparatus to be compatible with heterogenous checkout machines.
  • the disclosed apparatus may be conveniently installed next to a checkout machine without modifying the existing components or interrupting the regular functions of the checkout machine, the disclosed apparatus may serve as a general-purpose loss prevention solution for many types of checkout machines, including both cashier-staffed checkout machines and self-checkout machines.
  • the disclosed apparatus is equipped with two independent imaging devices, such as two cameras. Corresponding images from the two independent sources may be synchronized for verification.
  • the synchronization process is guided by the moving path of the product, e.g., over the scanning area of the checkout machine. One or more images of the display of the checkout machine may be synchronized with one or more product images based on the moving path of the product. Resultantly, the product information derived from both sources may be synchronized and crosschecked.
  • Yet another aspect of the technologies described herein is to improve a computing system’s ability to perform various functions or other practical applications in response to the verification outcomes.
  • the disclosed system may send a warning message, including one or more images or video segments relevant to the product being verified, to a designated device to warn an operator, so the operator can take appropriate loss-prevention actions to correct the error, assist the customer, etc.
  • the disclosed technologies may be used as a loss prevention solution, e.g., for checkout verification for many types of checkout machines.
  • the disclosed technologies may also be used in other practical systems, such as in a quality control system in a manufacturer (e.g., to crosscheck whether a part in an assembly line is the right part needed for assembling a product) , or other kinds of verification tasks in other practical systems or industries.
  • FIG. 1 is a schematic representation illustrating an exemplary system in an exemplary operating environment.
  • checkout machine 110 includes, among many components not shown, scanner 112 and display 114.
  • User 130 may use this checkout machine for self-checkout or to assist others checkout goods.
  • Apparatus 120 includes, among many components not shown, camera 124a, camera 124b, and display 126, which are mounted to arm 122.
  • Apparatus 120 may further include one or more microphones, distributed in various locations at arm 122.
  • microphone 128a is mounted in the middle of the vertical section of arm 122
  • microphone 128b is mounted at the top of the vertical section of arm 122.
  • arm 122 is adjustable (e.g., different lengths) as well as rotatable (e.g., different directions) .
  • arm 122 and its mounted components may be adjusted so that the view of camera 124a covers scanner 112 of checkout machine 110, and the view of camera 124b covers display 114 of checkout machine 110.
  • the respective distances from checkout machine 110 to microphone 128a and microphone 128b may be measured and recorded during the installation.
  • the microphones and the distance information may be used to determine whether a beep sound (e.g., to confirm a successful scan or reading of an MRL) is generated from checkout machine 110 or another checkout machine that is not monitored by apparatus 120.
  • a beep sound e.g., to confirm a successful scan or reading of an MRL
  • checkout verification (CV) system 170 is installed in apparatus 120.
  • CV system 170 is operatively coupled to apparatus 120, e.g., via network 160, which may include, without limitation, a local area network (LAN) or a wide area network (WAN) , e.g., a 4G or 5G cellular network.
  • network 160 may include, without limitation, a local area network (LAN) or a wide area network (WAN) , e.g., a 4G or 5G cellular network.
  • checkout machine 110 and apparatus 120 here merely form one exemplary operating environment for CV system 170, which is merely an exemplary system following at least one aspect of the technologies described herein.
  • Checkout machine 110, apparatus 120, or CV system 170 is not intended to suggest any limitation as to the scope of use or functionality of all aspects of the technology described herein. Neither should this operating environment be interpreted as having any dependency or requirement relating to any one component nor any combination of components illustrated.
  • apparatus 120 may be conveniently installed next to checkout machine 110 without modifying the existing components or interrupting the regular functions of checkout machine 110.
  • CV system 170 is adapted to crosscheck product information derived from multi-sourced image data generated from camera 124a and camera 124b respectively.
  • a positive verification code may be generated if the product information derived from the two sources are consistent.
  • a negative verification code may be generated if the product information derived from the two sources are inconsistent.
  • corresponding messages may be presented via display 126 based on the verification result.
  • CV system 170 is configured to crosscheck product information as obtained by checkout machine 110 against product information obtained from another source independent from the checkout machine.
  • images from camera 124b cover display 114, and such images contain product information as obtained by checkout machine 110.
  • images from camera 124a cover product 140 and scanner 112, and CV system 170 may directly derive product information of product 140 from these images, e.g., using computer vision technologies.
  • CV system 170 may output positive or negative verification code based on the verification result.
  • a positive verification code indicates that the product information derived from different sources are consistent.
  • a negative verification code indicates that the product information derived from different sources are inconsistent.
  • being consistent refers to that the product identifier (as obtained by the checkout machine) has a rank (as derived from another source) that meets the requisite criteria of ranking in a specific practical application.
  • product 140 has an MRL encoding a stock-keeping unit (SKU) number (e.g., 538902) .
  • Scanner 112 reads the SKU number into checkout machine 110, and checkout machine 110 displays product information 116 (e.g., the SKU number, the product name, the weight, the price, etc. ) on display 114.
  • CV system 170 uses camera 124b to capture an image of display 114 and decode the product information to obtain the product identifier (e.g., based on the SKU number) .
  • CV system 170 uses camera 124a to capture one or more images of product 140 passing over scanner 112 and ranks a set of known products based on their respective similarities to product 140. Further, the requisite criteria of ranking in this instance may be set as the top 5. In this case, if the product identifier as obtained by checkout machine 110 could be found within the top 5 ranked products, CV system 170 would determine the product information from two sources are consistent, otherwise, inconsistent.
  • the top-N criteria in this example is not limiting.
  • the requisite criteria of ranking may be adapted to fit the requirements of a practical application implementing the disclosed technologies, e.g., based on the specific product recognition technology, the specific similarity measurement technology, the MRL system, the preference of the retailer, etc.
  • CV system 170 may cause the verification code or a message reflecting the verification code to display, e.g., via a graphical user interface (GUI) , on display 126, alternatively on computing device 190, e.g., a smartphone, a mobile device, or a computer, etc., which may be accessed by a loss prevention person.
  • GUI graphical user interface
  • user 130 may be reminded to remedy any ongoing issues, e.g., by canceling a transaction in response to a negative verification code.
  • a clerk may be summoned to resolve issues for the customer.
  • CV system 170 may active various components of checkout machine 110 in response to a verification code.
  • CV system 170 may activate a warning light or prompt a voice message to convey a message indicating the verification code.
  • the message may include instructions for how user 130 may continue the transaction, such as how to cancel a problematic transaction or how to process the product again.
  • CV system 170 may display the production information of top-ranked products on display 126. This is especially useful when product 140 does not have an MRL. In this way, user 130 may provide or confirm correct product information to checkout machine 110.
  • retriever 172 is configured to retrieve data from respective sources
  • synchronizer 174 is configured to synchronize respective data sources.
  • retriever 172 may continuously retrieve images from camera 124a and camera 124b, and synchronizer 174 may synchronize images from different sources so that synced images can be analyzed together to derive product information for the same product.
  • retriever 172 may retrieve images from camera 124a or camera 124b in response to a command from synchronizer 174, so that the cameras only need to capture images when needed.
  • Recognizer 176 is configured to recognize products from images and retrieve corresponding product information (e.g., product identifier, name, unit price, representative images, etc. ) .
  • recognizer 176 is to compare the image features of a query product with image features of known products for similarity, e.g., via one or more machine learning models (MLMs) in MLM 180, so that the known products may be ranked based on their respective similarity measures against the query product.
  • MLMs machine learning models
  • rank information may be used by verifier 178 to determine whether the product information obtained by the checkout machine (e.g., via the product’s MRL) is consistent with the product information obtained by apparatus 120 from another independent source (e.g., product images captured by camera 124a) .
  • Recognizer 176 may use various computer vision technologies to recognize the product type or the quantity of product 140.
  • the applications (PCT/CN2019/111643, PCT/CN2019/086367, and PCT/CN2019/073390) have disclosed some effective technical solutions for product recognition, which may be used by recognizer 176.
  • recognizer 176 uses a detector to detect product 140 or the quantity of product 140 based on images from camera 124a, and uses a retrieval model to recognize the product type of the detected object.
  • Various detectors may be used by recognizer 176, such as two-stage detectors (e.g., Faster-RCNN, R-FCN, Lighthead-RCNN, Cascade R-CNN, etc.
  • MLM multi-similarity loss
  • a type of network e.g., VGG, ResNet, Inception, EfficientNet
  • loss e.g., triplet loss, contrastive loss, lifted loss, multi-similarity loss
  • recognizer 176 may use technologies of object detection, image segmentation, or instance segmentation to determine the quantity of product 140. Different from semantic segmentation, instance segmentation would identify each instance of each product in an image.
  • neural networks e.g., in MLM 180
  • other technologies may be used to determine the quantity of product 140, e.g., based on thresholding (using one or more specified threshold values to separate pixels into different levels to isolate objects) , K-means clustering, Histogram-based image segmentation, edge detection, etc.
  • Recognizer 176 is also configured to recognize product information based on images from camera 124b.
  • Product information may include product identifier, product name, product price (unit price, total price, etc. ) , product quantity (the count of different products in a session, the total count of all products in a session, the count of the same product in a transaction, etc. ) , etc., which may be displayed by checkout machine 110.
  • the pending applications e.g., U.S. Application No. 16/672,883, entitled Character-based Text Detection and Recognition
  • U.S. Application No. 16/672,883 entitled Character-based Text Detection and Recognition
  • recognizer 176 in the text detection stage, first uses a convolutional network to identify a position of text from an image from camera 124b. For example, when an image passes through the neural network, various feature maps may be generated to indicate a confidence measure for whether text is presented and its position in the image. In the text recognition stage, recognizer 176 can extract the text from the respective positions identified in the text detection stage, e.g., based on a recursive-network-based approach or OCR-related technologies.
  • the weight information of the product may serve as another verification source.
  • the weight of product 140 is used to determine the quantity of product 140.
  • a typical scanner is equipped with a weight scale.
  • the weight information may be derived from the images taken by camera 124b in some embodiments.
  • the quantity of product 140 may be obtained based on a ratio of the weight divided by the average weight of an instance of product 140. Assuming, as an example, product 140 is a bunch of bananas overlapping each other, user 130 input 3 counts of banana to checkout machine 110, and the total counts of bananas computed based on computer vision technologies is also 3. However, verifier 178 would additionally take the weight information into account in some embodiments.
  • verifier 178 will generate a negative verification code instead.
  • MLM 180 may include one or more neural networks in some embodiments. Different components in CV system 170 may use one or more different neural networks to achieve their respective functions, which will be further discussed in connection with the remaining figures.
  • recognizer 176 may use a trained neural network to learn the neural features of an unknown product, which may be represented by a feature vector in a high-dimensional feature space, and compute the similarity between the unknown product and a known product based on the cosine distance between their respective feature vectors in the high-dimensional feature space.
  • various MLMs and image data e.g., image data retrieved by retriever 172, data associated with the high-dimensional feature space, etc.
  • image data may be stored in data store 150 and accessible in real-time via network 160.
  • a neural network comprises at least three operational layers.
  • the three layers can include an input layer, a hidden layer, and an output layer.
  • Each layer comprises neurons.
  • the input layer neurons pass data to neurons in the hidden layer.
  • Neurons in the hidden layer pass data to neurons in the output layer.
  • the output layer then produces a classification.
  • Different types of layers and networks connect neurons in different ways.
  • Every neuron has weights, an activation function that defines the output of the neuron given an input (including the weights) , and an output.
  • the weights are the adjustable parameters that cause a network to produce a correct output.
  • the weights are adjusted during training. Once trained, the weight associated with a given neuron can remain fixed. The other data passing between neurons can change in response to a given input (e.g., image) .
  • the neural network may include many more than three layers. Neural networks with more than one hidden layer may be called deep neural networks.
  • Example neural networks that may be used with aspects of the technology described herein include, but are not limited to, multilayer perceptron (MLP) networks, convolutional neural networks (CNN) , recursive neural networks, recurrent neural networks, and long short-term memory (LSTM) (which is a type of recursive neural network) .
  • MLP multilayer perceptron
  • CNN convolutional neural networks
  • LSTM long short-term memory
  • Some embodiments described herein use a convolutional neural network, but aspects of the technology are applicable to other types of multi-layer machine classification technologies.
  • a CNN may include any number of layers.
  • the objective of one type of layers e.g., Convolutional, Relu, and Pool
  • the objective of another type of layers e.g., fully connected (FC) and Softmax
  • FC fully connected
  • An input layer may hold values associated with an instance. For example, when the instance is an image (s) , the input layer may hold values representative of the raw pixel values of the image (s) as a volume (e.g., a width, W, a height, H, and color channels, C (e.g., RGB) , such as W x H x C) , or a batch size, B.
  • One or more layers in the CNN may include convolutional layers.
  • the convolutional layers may compute the output of neurons that are connected to local regions in an input layer (e.g., the input layer) , each neuron computing a dot product between their weights and a small region they are connected to in the input volume.
  • a filter, a kernel, or a feature detector includes a small matrix used for feature detection.
  • Convolved features, activation maps, or feature maps are the output volume formed by sliding the filter over the image and computing the dot product.
  • An exemplary result of a convolutional layer may include another volume, with one of the dimensions based on the number of filters applied (e.g., the width, the height, and the number of filters, F, such as W x H x F, if F were the number of filters) .
  • One or more of the layers may include a rectified linear unit (ReLU) layer.
  • the ReLU layer (s) may apply an elementwise activation function, such as the max (0, x) , thresholding at zero, for example, which turns negative values to zeros (thresholding at zero) .
  • the resulting volume of a ReLU layer may be the same as the volume of the input of the ReLU layer. This layer does not change the size of the volume, and there are no hyperparameters.
  • One or more of the layers may include a pooling layer.
  • a pooling layer performs a function to reduce the spatial dimensions of the input and control overfitting. This layer may use various functions, such as Max pooling, average pooling, or L2-norm pooling. In some embodiments, max pooling is used, which only takes the most important part (e.g., the value of the brightest pixel) of the input volume.
  • a pooling layer may perform a down-sampling operation along the spatial dimensions (e.g., the height and the width) , which may result in a smaller volume than the input of the pooling layer (e.g., 16 x 16 x 12 from the 32 x 32 x 12 input volume) .
  • the convolutional network may not include any pooling layers. Instead, strided convolutional layers may be used in place of pooling layers.
  • One or more of the layers may include a fully connected (FC) layer.
  • FC fully connected
  • An FC layer connects every neuron in one layer to every neuron in another layer.
  • the last FC layer normally uses an activation function (e.g., Softmax) for classifying the generated features of the input volume into various classes based on the training dataset.
  • the resulting volume may take the form of 1 x 1 x number of classes.
  • the length of the vector is referred to as the vector norm or the vector’s magnitude.
  • the L1 norm is calculated as the sum of the absolute values of the vector.
  • the L2 norm is calculated as the square root of the sum of the squared vector values.
  • the max norm is calculated as the maximum vector values.
  • some of the layers may include parameters (e.g., weights or biases) , such as a convolutional layer, while others may not, such as the ReLU layers and pooling layers, for example.
  • the parameters may be learned or updated during training.
  • some of the layers may include additional hyper-parameters (e.g., learning rate, stride, epochs, kernel size, number of filters, type of pooling for pooling layers, etc. ) , such as a convolutional layer or a pooling layer, while other layers may not, such as a ReLU layer.
  • activation functions may be used, including but not limited to, ReLU, leaky ReLU, sigmoid, hyperbolic tangent (tanh) , exponential linear unit (ELU) , etc.
  • the parameters, hyper-parameters, or activation functions are not to be limited and may differ depending on the embodiment.
  • input layers convolutional layers, pooling layers, ReLU layers, and fully connected layers are discussed herein, this is not intended to be limiting.
  • additional or alternative layers such as normalization layers, Softmax layers, or other layer types, may be used in a CNN.
  • Different orders and layers in a CNN may be used depending on the embodiment.
  • CV system 170 when CV system 170 is used in practical applications for loss prevention (e.g., with emphasis on product-oriented action recognition) , there may be one order and one combination of layers; whereas when CV system 170 is used in practical applications for crime prevention in public areas (e.g., with emphasis on person-oriented action recognition) , there may be another order and another combination of layers.
  • the layers and their order in a CNN may vary without departing from the scope of this disclosure.
  • MLM 180 may include any type of machine learning models, such as a machine learning model (s) using linear regression, logistic regression, decision trees, support vector machines (SVM) , Bayes, k-nearest neighbor (KNN) , K means clustering, random forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long or short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc. ) , or other types of machine learning models.
  • a machine learning model e.g., auto-encoders, convolutional, recurrent, perceptrons, long or short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc.
  • CV system 170 is merely one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of aspects of the technologies described herein. Neither should this system be interpreted as having any dependency or requirement relating to any one component nor any combination of components illustrated.
  • CV system 170 this arrangement of various components in CV system 170 is set forth only as an example. Other arrangements and elements (e.g., machines, networks, interfaces, functions, orders, and grouping of functions, etc. ) can be used in addition to or instead of those shown, and some elements may be omitted altogether for the sake of clarity. Further, many of the elements described herein are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Various functions described herein as being performed by an entity may be carried out by hardware, firmware, or software. For instance, some functions may be carried out by a processor executing instructions stored in memory.
  • each of the components shown in CV system 170 may be implemented on any type of computing device, such as computing device 600 described in FIG. 6. Further, each of the components may communicate with various external devices via a network, which may include, without limitation, a local area network (LAN) or a wide area network (WAN) .
  • LAN local area network
  • WAN wide area network
  • Apparatus 200A may be installed next to checkout machine 210A.
  • Apparatus 200B may be installed next to checkout machine 210B.
  • Apparatus 200C may be installed next to checkout machine 210C.
  • Apparatus 200D may be installed next to checkout machine 210D. All illustrated embodiments of the exemplary apparatus may be conveniently installed next to a checkout machine without modifying the existing components or interrupting the regular functions of the checkout machine at least because these embodiments of the exemplary apparatus are electronically decoupled from the checkout machine.
  • Apparatus 200A is a single-camera configuration.
  • Field of view 217 of camera 218A covers both display 216A and scanner 214A of checkout machine 210A.
  • An image from camera 218A may be segmented.
  • the part of the image covering display 216A forms one source, which has the product information as obtained by checkout machine 210A.
  • the part of the image covering scanner 214A and the product thereof forms another source, which can be used to derive the product information as determined via product recognition.
  • Base 250A enables apparatus 200A to stand next to checkout machine 210A, and the initial installation of apparatus 200A is as easy as setting up a floor lamp. During the verification process, various messages may be displayed on display 226A.
  • Light 232A may also be turned on, turned off, or flash in a certain pattern (e.g., a unique combination of duration and frequency) in response to a particular verification result. For example, light 232A may flash in a pattern to indicate a negative verification result, so a staff member may be summoned to assist the customer.
  • apparatus 200A is equipped with a wireless communication module (WCM) 230A.
  • WCM 230A enables apparatus 200A to communicate with a remote server via a wireless network (e.g., via WiFi or 5G) .
  • the verification process is at least partially executed in a remote server instead of locally in apparatus 200A, e.g., by transmitting the images taken by camera 218A to the remote server and receiving the corresponding verification code and other messages.
  • WCM 230A also enables apparatus 200A to receive updates (e.g., MLM updates, firmware updates, software updates, configuration updates, etc. ) .
  • Apparatus 200B has a dual-camera configuration.
  • Camera 218B is configured to capture images of products on scanner 214B
  • camera 219B is configured to capture images of display 216B of checkout machine 210B.
  • the dual-camera configuration requires an additional camera but no longer requires image post-processing to split one image into multiple sources.
  • camera 218B and camera 219B may take images in different resolutions. For example, the requisite resolution of camera 218B may be lower than the requisite resolution of camera 219B.
  • the images from camera 218B are used for product recognition, but high-resolution images from camera 219B are required, e.g., for text recognition.
  • Apparatus 200C can be mounted to the ceiling via base 250C to save the ground space or bypass the obstacle.
  • apparatus 200D may be directly placed on the top of checkout machine 210D via base 250D.
  • weight 242 is attached to the other end of the arm to balance the weight of camera 218D and camera 219D.
  • apparatus 310 has a vertical arm and a horizontal arm.
  • arm section 312b is extensible from arm section 312a, so that the horizontal arm may be adjusted to a suitable height, e.g. based on the height of checkout machine 320.
  • arm section 314b is extensible from arm section 314a.
  • arm section 314c may be further extended from arm section 314b.
  • camera 316a may be adjusted to cover the area of scanner 322, and camera 316b may be adjusted to cover the area of display 324.
  • microphone 318a and microphone 318b may be adjusted to have differential distances (e.g., D1 and D2 respectively, wherein D2 >> D1) to speaker 326 of checkout machine 320.
  • synchronization refers to the process of selecting corresponding data from multiple sources so that the respective product information derived from the multiple sources may be crosschecked for verification.
  • the checkout machine typically would update the information on its display after detecting a new event (e.g., a reading of an MRL) . Therefore, it is important to synchronize images before and after each scan.
  • this synchronization is achieved based on a moving path of a product over a predetermined area of the checkout machine.
  • the product will enter and exit the area of scanner 322, thus forming a moving path over the area of scanner 322, as illustrated in image 332, image 334, and image 336, captured by camera 316a.
  • Each image has its timestamp.
  • the verification system as implemented in apparatus 310, can synchronize image 340 with image 332 based on their timestamps. Resultantly, image 340 and image 332 form a before-scan pair.
  • image 350 and image 336 may form an after-scan pair after a successful scan. For a failed scan, the information on the display wouldn’t change. In this case, image 360 and image 336 would form an after-scan pair instead.
  • a before-scan image and an after-scan image from camera 316b may be used to derive product information, as obtained by checkout machine 320.
  • product information may be obtained by checkout machine 320.
  • computer vision or OCR technologies may be used to recognize the SKU number, the product name, the sales price, etc. in area 342, the count of products in area 344, the total amount in area 346 of image 340.
  • the verification system can detect that the text has been modified from area 342 to area 352, specifically, an additional product, illustrated by line 358, has been added to area 352 in image 350.
  • the product count has been increased by one from area 344 to area 354.
  • the total amount also increased from area 346 to area 356.
  • the product identifier is retrieved from image 350, e.g., the SKU number of the newly added product at line 358.
  • one or more images from camera 316a may be used to derive product information, e.g., based on one or more MLMs, as previously discussed with MLM 180 in FIG. 1.
  • the product identifiers of those products meeting the ranking criteria is retrieved to form a ranked list. If the product identifier as derived from image 350 can be found in the ranked list, the verification system will determine the product information from both sources to be consistent, accordingly generate a positive verification code. Otherwise, a negative verification code will be generated.
  • a negative verification code may indicate the MRL on the product does not match the product, e.g., caused by a misplaced MRL.
  • Some goods have a unit price and commonly sold by counts instead of by weight.
  • the produce department may set a unit price for avocado, apple, banana, etc., and a user is expected to input the count to the checkout machine.
  • the count of the product may be obtained from images taken by camera 316a, e.g., based on instance segmentation technologies.
  • the count of the product as input by the user may be retrieved from the images taken by camera 316b. These two counts may be crosschecked in some embodiments.
  • the verification system may be configured to generate a different negative verification code if these two counts disagree. Different negative verification codes may have different meanings, and the verification system can generate different actions based on a specific code.
  • the verification system may display a message on display 370 to indicate the specific meaning of such verification code, and the user may be reminded to input the correct count.
  • the verification system will detect that the text remains unchanged from area 342 to area 362. Similarly, the product count remains the same from area 344 to area 364. The total amount remains the same from area 346 to area 366.
  • checkout machine 320 failed to obtain any information about the product. Accordingly, the verification system will determine the product information from both sources to be inconsistent and generate a negative verification code accordingly.
  • the verification system may display the information (e.g., product identifier, product name, representative product images, etc. ) of the products in the ranked list, which may be derived based on the images from camera 316a. Resultantly, the user may manually input the correct product information to checkout machine 320 based on such product information on display 370. Further, as discussed previously, the count of the product may be determined, e.g., based on instance segmentation technologies. In some embodiments, the count of the product may also be displayed.
  • the information e.g., product identifier, product name, representative product images, etc.
  • apparatus 310 can assist customers to correctly identify and input product information (e.g., including product identifier or the count of product) to checkout machines, thus significantly improve the operation of conventional checkout machines, in other words, add a brand new function to conventional checkout machines.
  • product information e.g., including product identifier or the count of product
  • the verification system may activate other loss prevention actions in response to a negative verification code, as previously discussed.
  • this synchronization is achieved based on a specific sound emitted from the checkout machine.
  • Most checkout machines are configured to emit a beep to indicate a successful scan.
  • the verification system of apparatus 310 may select respective before-beep and after-beep images in the multiple sources. As the beep is equivalent to a scan in some embodiments, a similar verification logic, as described previously, may be carried out based on the selected before-beep and after-beep images.
  • apparatus 310 first detects a sound generated from checkout machine 320 based known acoustic characteristics of the sound, which may be configured during the installation of apparatus 310, e.g., by recording the sound emitted from speaker 326, and learning the acoustic characteristics of the sound (e.g., properties of the sound wave, such as duration, frequency, etc. ) .
  • Checkout machine 320 may have a delay between emitting the beep and displaying product information on display 324, and such delay may be observed during the installation of apparatus 310.
  • a time threshold may be configured for apparatus 310 to select an image taken by camera 316b as the after-scan image. For instance, assuming the sound is detected at the first timestamp, the after-scan image may be selected at a second timestamp, which may be determined based on the first timestamp and the time threshold.
  • camera 316b is configured to only capture images after an indicative sound is detected, e.g., a beep to indicate a successful scan.
  • Apparatus 310 may compare the newly captured image with its immediate processor to identify the product information of the newly scanned product.
  • less computing resources for storing and analyzing the images taken by camera 316b may be required, and, the life of camera 316b may be prolonged.
  • the sound detection technique may also serve as another source for verification. For example, if no beep is detected and camera 316a still captures a product, the verification system may generate a negative verification code without analyzing the images from camera 316b, thus expedite the verification process.
  • apparatus 310 may use various sound localization technologies to identify a sound from speaker 326, e.g., using steered beamformer approaches, collocated microphone array approaches, learning method for binaural hearing approaches, head-related transfer function (HRTF) approached, cross-power spectrum phase (CSP) analysis approaches, 2D sensor line array approaches, hierarchical fuzzy artificial neural networks approaches, etc.
  • sound localization is performed by using two or more microphones.
  • microphone 318a and microphone 318b may be adjusted to have differential distances (e.g., D1 and D2 respectively, wherein D2 >> D1, which could be measured during the initial installation) to speaker 326 of checkout machine 320.
  • apparatus 310 can mathematically estimate the direction and the distance of speaker 326 in relation to microphone 318a and microphone 318b, e.g., based on the geometrical configuration (typically a triangle) among microphone 318a, microphone 318b, and speaker 326, as well as the difference of arrival times of the sound at the two microphones. In this way, apparatus 310 can also differentiate a beep emitted by speaker 326 from another beep emitted by another checkout machine.
  • FIG. 4 is a flow diagram illustrating an exemplary process of verification.
  • images from the multiple cameras may form multiple-source data.
  • image pre-processing is required to derive multiple data streams from images taken by the single camera.
  • the verification system will synchronize source data 412, source data 418, and other source data at block 420.
  • some embodiments may use product-moving-path-based approaches for synchronization. Some embodiments may use sound-based approaches for synchronization. Some embodiments may use a hybrid approach for synchronization, e.g., primarily based on the product moving path, supplemented by the sound detection. Other embodiments may use additional approaches. In general, all approaches are designed to synchronize source data based on the same product or the same transaction, so that product information of one product or one transaction would not be crosschecked against product information of another product or another transaction.
  • the verification system determines product information (e.g., product data 442 or product data 448) from at least two sources based on the synchronized source data.
  • product data 442 is based on the information received by a checkout machine
  • product data 448 is based on the information derived from a source independent from the checkout machine.
  • the verification system may detect product information from the images taken by camera 316b based on text recognition technologies.
  • the verification system may detect product information from the images taken by camera 316a based on product recognition technologies, e.g., by comparing neural features of the product images with neural features of known products.
  • the verification system determines whether the product information from the multiple sources are consistent.
  • the respective product identifiers from the multiple sources are required to match to be consistent.
  • the product identifier derived from one source is required to fall into a ranked list derived from another source. Other criteria for measuring consistency may be devised for other implementations of the disclosed technologies.
  • the verification system generates a negative verification code if the product information from the multiple sources is inconsistent. It may be noted that the verification system is designed to generate a negative verification code in some embodiments if one source provided product information and another source did not, such as in a non-scan or miss-scan case.
  • the verification system generates a positive verification code if the product information from the multiple sources is consistent.
  • FIG. 5 is a flow diagram illustrating another exemplary process of verification.
  • Each block of process 500, and other processes described herein, comprises a computing process that may be performed using any combination of hardware, firmware, or software. For instance, various functions may be carried out by a processor executing instructions stored in memory.
  • the process may also be embodied as computer-usable instructions stored on computer storage media or devices.
  • the process may be provided by an application, a service, or in combination thereof.
  • the process is to determine the first product information associated with a product from a first source.
  • the product information may include a product identifier, e.g., an SKU number, a product name, a product symbol, etc.
  • the first source may include one or more images of the display of a checkout machine.
  • the one or more images may include a before-scan image and an after-scan image.
  • the differential of the before-scan and the after-scan images usually encodes the newly scanned product information. Such information may be retrieved from the images based on OCR technologies.
  • the process is to determine second product information associated with the product from a second source.
  • the second source may include one or more images of the product moving through the scanner of the checkout machine.
  • the process may recognize the product based on computer vision technologies, such as by comparing the neural features of one or more product images in question with neural features of images of known products.
  • the product information of the known product with the most similar neural features is used as the second product information associated with the product from the second source.
  • the product information of a ranked list of known products are used instead, for example, if the similarity measure between the neural features of one or more query images and neural features of each of the known products in the ranked list meets a requisite criteria, which may be decided based on the actual implementation of the disclosed technologies.
  • the process is to generate a verification code based on whether the first product information is consistent with the second product information.
  • the product information e.g., the product identifier
  • the process will determine the first product information is consistent with the second product information, and generate a positive verification code.
  • the first product information e.g., the product identifier
  • the process will determine the first product information is consistent with the second product information, and generate a positive verification code.
  • Each block in process 500 and other processes described herein comprises a computing process that may be performed using any combination of hardware, firmware, or software. For instance, various functions may be carried out by a processor executing instructions stored in memory. The processes may also be embodied as computer-usable instructions stored on computer storage media or devices. The process may be provided by an application, a service, or a combination thereof.
  • computing device 600 an exemplary operating environment for implementing various aspects of the technologies described herein is shown and designated generally as computing device 600.
  • Computing device 600 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use of the technologies described herein. Neither should the computing device 600 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated.
  • the technologies described herein may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine.
  • program components including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks or implements particular abstract data types.
  • the technologies described herein may be practiced in a variety of system configurations, including handheld devices, consumer electronics, general-purpose computers, and specialty computing devices, etc. Aspects of the technologies described herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communications network.
  • computing device 600 includes a bus 610 that directly or indirectly couples the following devices: memory 620, processors 630, presentation components 640, input/output (I/O) ports 650, I/O components 660, and an illustrative power supply 670.
  • Bus 610 may include an address bus, data bus, or a combination thereof.
  • FIG. 6 is merely illustrative of an exemplary computing device that can be used in connection with different aspects of the technologies described herein. Distinction is not made between such categories as “workstation, ” “server, ” “laptop, ” “handheld device, ” etc., as all are contemplated within the scope of FIG. 6 and refers to “computer” or “computing device. ”
  • Computer-readable media can be any available media that can be accessed by computing device 600 and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer-readable media may comprise computer storage media and communication media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technologies for storage of information, such as computer-readable instructions, data structures, program modules, or other data.
  • Computer storage media includes RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disks (DVD) , or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Computer storage media does not comprise a propagated data signal.
  • Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • modulated data signal means a signal that has its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
  • Memory 620 includes computer storage media in the form of volatile or nonvolatile memory.
  • the memory 620 may be removable, non-removable, or a combination thereof.
  • Exemplary memory includes solid-state memory, hard drives, optical-disc drives, etc.
  • Computing device 600 includes processors 630 that read data from various entities, such as bus 610, memory 620, or I/O components 660.
  • Presentation component (s) 640 present data indications to a user or other device.
  • Exemplary presentation components 640 include a display device, speaker, printing component, vibrating component, etc.
  • I/O ports 650 allow computing device 600 to be logically coupled to other devices, including I/O components 660, some of which may be built-in.
  • memory 620 includes, in particular, temporal and persistent copies of CV logic 622.
  • CV logic 622 includes instructions that, when executed by processor 630, result in computing device 600 performing functions, such as but not limited to, process 400, process 500, or other processes discussed in connection with FIGS. 1-3.
  • CV logic 622 includes instructions that, when executed by processors 630, result in computing device 600 performing various functions associated with, but not limited to, various components in CV system 170 in FIG. 1 and various apparatuses in FIGS. 1-3.
  • processors 630 may be packed together with CV logic 622. In some embodiments, processors 630 may be packaged together with CV logic 622 to form a System in Package (SiP) . In some embodiments, processors 630 can be integrated on the same die with CV logic 622. In some embodiments, processors 630 can be integrated on the same die with CV logic 622 to form a System on Chip (SoC) .
  • SoC System on Chip
  • Illustrative I/O components include a microphone, joystick, gamepad, satellite dish, scanner, printer, display device, wireless device, a controller (such as a stylus, a keyboard, and a mouse) , a natural user interface (NUI) , and the like.
  • a pen digitizer (not shown) and accompanying input instrument (also not shown but which may include, by way of example only, a pen or a stylus) are provided to digitally capture freehand user input.
  • the connection between the pen digitizer and processor (s) 630 may be direct or via a coupling utilizing a serial port, parallel port, system bus, or other interface known in the art.
  • the digitizer input component may be a component separate from an output component, such as a display device.
  • the usable input area of a digitizer may coexist with the display area of a display device, be integrated with the display device, or may exist as a separate device overlaying or otherwise appended to a display device. Any and all such variations, and any combination thereof, are contemplated to be within the scope of aspects of the technologies described herein.
  • I/O components 660 include various GUI, which allow users to interact with computing device 600 through graphical elements or visual indicators. Interactions with a GUI usually are performed through direct manipulation of graphical elements in the GUI. Generally, such user interactions may invoke the business logic associated with respective graphical elements in the GUI. Two similar graphical elements may be associated with different functions, while two different graphical elements may be associated with similar functions. Further, the same GUI may have different presentations on different computing devices, such as based on the different graphical processing units (GPUs) or the various characteristics of the display.
  • GPUs graphical processing units
  • Computing device 600 may include networking interface 680.
  • the networking interface 680 includes a network interface controller (NIC) that transmits and receives data.
  • the networking interface 680 may use wired technologies (e.g., coaxial cable, twisted pair, optical fiber, etc. ) or wireless technologies (e.g., terrestrial microwave, communications satellites, cellular, radio and spread spectrum technologies, etc. ) .
  • the networking interface 680 may include a wireless terminal adapted to receive communications and media over various wireless networks.
  • Computing device 600 may communicate with other devices via the networking interface 680 using radio communication technologies.
  • the radio communications may be a short-range connection, a long-range connection, or a combination of both a short-range and a long-range wireless telecommunications connection.
  • a short-range connection may include a connection to a device (e.g., mobile hotspot) that provides access to a wireless communications network, such as a wireless local area network (WLAN) connection using the 802.11 protocol.
  • a wireless communications network such as a wireless local area network (WLAN) connection using the 802.11 protocol.
  • WLAN wireless local area network
  • a Bluetooth connection to another computing device is a second example of a short-range connection.
  • a long-range connection may include a connection using various wireless networks, including 1G, 2G, 3G, 4G, 5G, etc., or based on various standards or protocols, including General Packet Radio Service (GPRS) , Enhanced Data rates for GSM Evolution (EDGE) , Global System for Mobiles (GSM) , Code Division Multiple Access (CDMA) , Time Division Multiple Access (TDMA) , Long-Term Evolution (LTE) , 802.16 standards, etc.
  • GPRS General Packet Radio Service
  • EDGE Enhanced Data rates for GSM Evolution
  • GSM Global System for Mobiles
  • CDMA Code Division Multiple Access
  • TDMA Time Division Multiple Access
  • LTE Long-Term Evolution
  • Examples in the first group comprises an apparatus for verification with one or more of the following features.
  • the order of the following features is not to limit the scope of any examples in this group.
  • a first camera is adapted to capture a product placed on a first area of a checkout machine.
  • a second camera is adapted to capture product information displayed on a second area of a checkout machine.
  • a camera is adapted to capture a product placed on a first area of a checkout machine and product information displayed on a second area of the checkout machine.
  • a display is adapted to display a message reflecting whether the product information is consistent with the product.
  • a display is adapted to display an instruction for a user to interact with a checkout machine.
  • a processor is adapted to verify whether the product information derived from one source is consistent with the product information derived from another source.
  • a radio-frequency module is adapted to transmit first image data of the product and second image data of the product information to a remote device, and receive a verification code regarding whether the product information derived from one source is consistent with the product information derived from another source.
  • An adjustable mechanical arm is adapted to enable a first camera to capture a first area of the checkout machine, and a second camera to capture a second area of the checkout machine.
  • An adjustable mechanical arm is adapted to enable a camera, which is mounted to the adjustable mechanical arm, to capture a product placed on a first area of a checkout machine and product information displayed on a second area of the checkout machine.
  • An adjustable mechanical arm is extensible or flexible, so that a camera can be adjusted to a desirable height, a desirable direction, or a desirable spacial position in general.
  • a supporting base, connected to an adjustable mechanical arm, is adapted to enable a camera, mounted to the adjustable mechanical arm, to maintain a stable spacial position.
  • a first camera is fixed to a first location of an adjustable mechanical arm of the apparatus, and a second camera is fixed to a second location of the adjustable mechanical arm, wherein the first location is closer than the second location to a supporting base of the apparatus.
  • a first microphone is fixed to a first location of an adjustable mechanical arm, and a second microphone is fixed to a second location of the adjustable mechanical arm, wherein the first location and the second location is selected to cause a differential distance from a speaker of a checkout machine to the respective microphones.
  • the product information comprises product identifier, product name, product price (unit price, total price, etc. ) , product quantity (the count of different products in a session, the total count of all products in a session, the count of the same product in a transaction, etc. ) , etc.
  • Examples in the second group comprises a method, a computer system adapted to perform the method, or a computer storage device storing computer-usable instructions that cause a computer system to perform the method.
  • the method has one or more of the following features. The order of the following features is not to limit the scope of any examples in this group.
  • a feature of detecting, at a first timestamp, a sound generated from a checkout machine based on one or more known acoustic characteristics of the sound A feature of selecting an image of a display of the checkout machine from the second image data based on the first timestamp, wherein the image has a second timestamp later than the first timestamp.
  • a feature of detecting the second product information associated with the product from the image of the display A feature of identifying a first image and a second image from the second image data based on a moving path of the product over a predetermined area under a field of view of the first imaging source.
  • the product information comprises product identifier, product name, product price (unit price, total price, etc. ) , product quantity (the count of different products in a session, the total count of all products in a session, the count of the same product in a transaction, etc. ) , etc.

Abstract

This disclosure includes technologies for multi-sourced checkout verification. The disclosed system collects respective data from multiple sources, determines respective product information from the respective data, and crosschecks the respective product information for verification. Further, the disclosed system is designed to launch appropriate actions based on the verification outcome.

Description

APPARATUS AND METHODS FOR MULTI-SOURCED CHECKOUT VERIFICATION
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims benefit to, and incorporates by reference herein in its entirety, pending U.S. Application No. 16/672,883, filed November 4, 2019; pending International Application No. PCT/CN2019/111643, filed October 17, 2019; and pending International Application No. PCT/CN2019/086367, filed May 10, 2019.
BACKGROUND
As an alternative to the traditional cashier-staffed checkout, self-checkout solutions are popular for retail success, particularly for grocery stores and supermarkets. Most self-checkout machines have the following components, including a lane light, a touchscreen monitor, a basket stand, a barcode scanner, a weighing scale, and a payment module. Using a self-checkout machine, a customer can scan product barcodes, weight products (such as fresh produce without barcodes) and select the product type on a display, pay the products, bag the purchased products, and exit the store without any interactions with a cashier or a clerk. However, a clerk is typically assigned to supervise a group of self-checkout machines or lanes, so the clerk can assist customers when required, such as authorizing the sale of restricted products (e.g., alcohol, tobacco, etc. ) .
Barcodes, affixed to many commercial products in the modern economy, have made product checkout and inventory tracking possible in many retail sectors. A barcode, seemingly a trivial piece of label, can encode machine-readable data. The universal product code (UPC) is a barcode symbology, mainly used for scanning of trade items at the point of sale (POS) . Barcodes, particularly UPC barcodes, have shaped the modern economy, not only universally used in checkout systems but used for many other tasks, e.g., automatic identification and data capture.
Self-checkout machines and the UPC system are designed for regular checkout transactions. Irregular checkout activities, such as non-scan or irregular scans, would disrupt the normal checkout process and lead to retail shrinkage. Retail shrinkage or shrinkage means there are fewer items in stock than the inventory list. Shrinkage reduces profits for retailers, which may lead to increased prices for consumers. Many of the shrinkage problems happen at the POS due to irregular checkout activities. A technical solution is needed for automated checkout verification at the POS for loss prevention.
SUMMARY
This Summary is provided to introduce selected concepts that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In general, aspects of this disclosure include an apparatus that may be conveniently installed next to a checkout machine without modifying the existing components or interrupting the regular functions of the checkout machine. In some embodiments, the apparatus includes one or more imaging devices and a display. The apparatus is configured to crosscheck product information derived from multi-sourced image data generated from the one or more imaging devices, further, to present corresponding information on the display based on the verification result. Advantageously, the apparatus may serve as a general-purpose loss prevention measure for many types of checkout machines.
In various aspects, systems, methods, and computer-readable storage devices are provided to improve a computing system’s ability for image-based checkout verification in general. Specifically, one aspect of the technologies described herein is to improve the efficiency of a computing system’s functions to perform product recognition and verification tasks by crosschecking product information derived from multiple sources. Another aspect of the technologies described herein is to improve a computing system’s ability to synchronize image data from multiple sources for verification. Yet another aspect of the technologies described herein is to improve a computing system’s ability to perform various functions or other practical applications in response to the verification outcomes, which are further discussed in the DETAILED DESCRIPTION.
BRIEF DESCRIPTION OF THE DRAWINGS
The technologies described herein are illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
FIG. 1 is a schematic representation illustrating an exemplary system in an exemplary operating environment, in accordance with at least one aspect of the technologies described herein;
FIG. 2 are various schematic representations illustrating respective embodiments of an exemplary apparatus, in accordance with at least one aspect of the technologies described herein;
FIG. 3 is a schematic representation illustrating an exemplary process to detect product information from multiple sources, in accordance with at least one aspect of the technologies described herein;
FIG. 4 is a flow diagram illustrating an exemplary process of verification, in accordance with at least one aspect of the technologies described herein;
FIG. 5 is a flow diagram illustrating another exemplary process of verification, in accordance with at least one aspect of the technologies described herein; and
FIG. 6 is a block diagram of an exemplary computing environment suitable for use in implementing various aspects of the technologies described herein.
DETAILED DESCRIPTION
The various technologies described herein are set forth with sufficient specificity to meet statutory requirements. However, the description itself is not intended to limit the scope of this disclosure. Rather, the inventors have contemplated that the claimed subject matter might also be embodied in other ways, to include different steps or combinations of steps similar to the ones described in this document, in conjunction with other present or future technologies. Moreover, although the terms “step” or “block” may be used herein to connote different elements of methods employed, the terms should not be interpreted as implying any particular order among or between various steps herein disclosed unless and except when the order of individual steps is explicitly described. Further, the term “based on” generally denotes that the succedent condition is used in performing the precedent action.
In the modern economy, many products are affixed with machine-readable labels (MRLs) , such as UPC barcodes, QR codes, RFID tags, etc. MRLs may be provisioned by a manufacturer, e.g., a UPC label on a TV, or by a retailer, e.g., a UPC label for an apple in a supermarket. MRLs may be read by scanning devices for automatic identification and data capture, e.g., supporting transactions at various POS locations, tracking inventory at warehouses, facilitating transportation of goods in commerce, etc.
In combination with an MRL system, products may be checked out with a checkout machine with reduced waiting time, reduced labor costs, and increased accuracy for sales and inventory tracking. Self-checkout, also known as self-service checkout, is an alternative to the traditional cashier-staffed checkout, where self-checkout machines are provided for customers to process their purchases from a retailer. Checkout machines designed for cashier-staffed checkout or self-checkout solutions can provide great benefits for the retail industry, e.g., by improving the productivity and accuracy of the checkout process.
However, self-checkout machines are generally more vulnerable, compared to cashier-staffed checkout, to shrinkage, a form of preventable loss for retailers caused by deliberate or inadvertent human actions. Some products may be unlabeled. Labeling each product could be expensive, impractical, or error-prone on many occasions, such as for products sold in greengrocers, farmers’ markets, or supermarkets. Sometimes, MRLs may become missing, e.g., during the transportation process or due to mishandling. Sometimes, MRLs may become illegible, e.g., due to damage or smear. Sometimes, MRLs may be intentionally misplaced and affixed to unintended products, e.g., by a fraud known as ticket switch, in which scammers intentionally switch MRLs to pay less to merchants. Typically, a lower-priced MRL is fraudulently affixed to a higher-priced product, so that the higher-priced item could be purchased for less. Sometimes, the operator (e.g., a cashier or a customer) may simply forget to scan every product in the shopping cart.
To solve some of the aforementioned problems, a technical solution is provided in this disclosure for automated checkout verification for labeled or unlabeled products sold via cashier-staffed checkout machines or self-checkout machines. As used herein, checkout verification or verification refers to the process or the outcome of crosschecking product information obtained by a checkout machine against product information obtained from another source independent from the checkout machine.
At a high level, one aspect of this disclosure includes an apparatus with one or more imaging devices. The apparatus is configured to crosscheck product information derived from multi-sourced image data generated from the one or more imaging devices, further, to present corresponding audio or visual information (e.g., via a speaker or a display) based on the verification result. The disclosed apparatus may be conveniently installed next to a checkout machine without modifying the existing components or interrupting the regular functions of the checkout machine. Advantageously, the apparatus may serve as a general-purpose loss prevention measure for many types of checkout machines, including cashier-staffed checkout machines or self-checkout machines.
In some embodiments, the disclosed apparatus has two cameras. The first camera is adapted to capture one or more images of the display of the checkout machine. The display of the checkout machine contains product information recognized by the checkout machine, e.g., based on a machine reading of an MRL. The second camera is adapted to capture one or more images of the product being checked out directly. Based on various computer vision technologies, the product in the image may be recognized, and corresponding product information may be retrieved from a product database based on such product recognition technologies. Accordingly,  the computing process associated with the apparatus can determine the respective product information associated with the product based on the images from the first camera and the second camera respectively. Further, the computing process can crosscheck the product information from the two independent sources. A positive verification code may be generated if the product information derived from the two independent sources are consistent. Conversely, a negative verification code may be generated if the product information derived from the two independent sources are inconsistent.
Based on the verification code, corresponding messages can be generated and communicated to various recipients or devices. By way of example, the disclosed apparatus may take necessary measures, e.g., by displaying a message, playing an audible message, activating a warning light, etc., to remind the customer to rescan the product if the checkout machine did not receive the correct product information, e.g., due to a failure of the MRL reading, a mismatch between the physical product and the MRL, etc. As another example, the disclosed apparatus may send an electronic message (e.g., including the nature of the event, the relevant images, the product information, etc. ) to a remote device, such as a server or a wireless device, which could be monitored by a loss prevention staff member. In other embodiments, different messages or loss prevention actions may be configured based on the specific implementation of the disclosed technologies.
The disclosed apparatus and its technologies provide numerous improvements over conventional checkout machines and have many advantages. Specifically, one aspect of the disclosure is to perform a verification function for any conventional checkout machines, wherein a conventional checkout machine lacks such verification function. Another aspect of the disclosure is to perform such verification function based on product information derived from multiple sources, including at least one source independent from the checkout machine, e.g., an image of the actual product captured by the apparatus. An independent source can increase the authenticity of such verification and enable a secure solution that is more resilient to tampering. Yet another aspect of the disclosure is for the apparatus to be compatible with heterogenous checkout machines. As the disclosed apparatus may be conveniently installed next to a checkout machine without modifying the existing components or interrupting the regular functions of the checkout machine, the disclosed apparatus may serve as a general-purpose loss prevention solution for many types of checkout machines, including both cashier-staffed checkout machines and self-checkout machines.
Another aspect of the technologies described herein is to improve a computing system’s ability to synchronize image data from multiple sources for verification. In some  embodiments, the disclosed apparatus is equipped with two independent imaging devices, such as two cameras. Corresponding images from the two independent sources may be synchronized for verification. In one embodiment, the synchronization process is guided by the moving path of the product, e.g., over the scanning area of the checkout machine. One or more images of the display of the checkout machine may be synchronized with one or more product images based on the moving path of the product. Resultantly, the product information derived from both sources may be synchronized and crosschecked.
Yet another aspect of the technologies described herein is to improve a computing system’s ability to perform various functions or other practical applications in response to the verification outcomes. By way of example, when a negative verification outcome is detected, the disclosed system may send a warning message, including one or more images or video segments relevant to the product being verified, to a designated device to warn an operator, so the operator can take appropriate loss-prevention actions to correct the error, assist the customer, etc.
This disclosure provides a general and flexible framework for verification. The disclosed technologies may be used as a loss prevention solution, e.g., for checkout verification for many types of checkout machines. The disclosed technologies may also be used in other practical systems, such as in a quality control system in a manufacturer (e.g., to crosscheck whether a part in an assembly line is the right part needed for assembling a product) , or other kinds of verification tasks in other practical systems or industries.
Having briefly described an overview of aspects of the technologies described herein, referring now to FIG. 1, which is a schematic representation illustrating an exemplary system in an exemplary operating environment. In this operating environment, checkout machine 110 includes, among many components not shown, scanner 112 and display 114. User 130 may use this checkout machine for self-checkout or to assist others checkout goods. Apparatus 120, includes, among many components not shown, camera 124a, camera 124b, and display 126, which are mounted to arm 122. Apparatus 120 may further include one or more microphones, distributed in various locations at arm 122. In this embodiment, microphone 128a is mounted in the middle of the vertical section of arm 122, and microphone 128b is mounted at the top of the vertical section of arm 122.
In various embodiments, arm 122 is adjustable (e.g., different lengths) as well as rotatable (e.g., different directions) . When apparatus 120 is installed next to checkout machine 110, arm 122 and its mounted components may be adjusted so that the view of camera 124a covers scanner 112 of checkout machine 110, and the view of camera 124b covers display 114 of checkout machine 110. Further, the respective distances from checkout machine 110 to  microphone 128a and microphone 128b may be measured and recorded during the installation. The microphones and the distance information may be used to determine whether a beep sound (e.g., to confirm a successful scan or reading of an MRL) is generated from checkout machine 110 or another checkout machine that is not monitored by apparatus 120. Such embodiments are further discussed in connection with FIG. 3.
In some embodiments, checkout verification (CV) system 170 is installed in apparatus 120. In some embodiments, CV system 170 is operatively coupled to apparatus 120, e.g., via network 160, which may include, without limitation, a local area network (LAN) or a wide area network (WAN) , e.g., a 4G or 5G cellular network.
It should be noted that checkout machine 110 and apparatus 120 here merely form one exemplary operating environment for CV system 170, which is merely an exemplary system following at least one aspect of the technologies described herein. Checkout machine 110, apparatus 120, or CV system 170 is not intended to suggest any limitation as to the scope of use or functionality of all aspects of the technology described herein. Neither should this operating environment be interpreted as having any dependency or requirement relating to any one component nor any combination of components illustrated.
Equipped with CV system 170, apparatus 120 may be conveniently installed next to checkout machine 110 without modifying the existing components or interrupting the regular functions of checkout machine 110. During the normal operation of checkout machine 110, CV system 170 is adapted to crosscheck product information derived from multi-sourced image data generated from camera 124a and camera 124b respectively. A positive verification code may be generated if the product information derived from the two sources are consistent. Conversely, a negative verification code may be generated if the product information derived from the two sources are inconsistent. Further, corresponding messages may be presented via display 126 based on the verification result.
At a high level, CV system 170 is configured to crosscheck product information as obtained by checkout machine 110 against product information obtained from another source independent from the checkout machine. In this embodiment, images from camera 124b cover display 114, and such images contain product information as obtained by checkout machine 110. Meanwhile, images from camera 124a cover product 140 and scanner 112, and CV system 170 may directly derive product information of product 140 from these images, e.g., using computer vision technologies.
CV system 170 may output positive or negative verification code based on the verification result. A positive verification code indicates that the product information derived  from different sources are consistent. A negative verification code indicates that the product information derived from different sources are inconsistent. In various embodiments, being consistent, as used herein, refers to that the product identifier (as obtained by the checkout machine) has a rank (as derived from another source) that meets the requisite criteria of ranking in a specific practical application.
By way of example, product 140 has an MRL encoding a stock-keeping unit (SKU) number (e.g., 538902) . Scanner 112 reads the SKU number into checkout machine 110, and checkout machine 110 displays product information 116 (e.g., the SKU number, the product name, the weight, the price, etc. ) on display 114. CV system 170 uses camera 124b to capture an image of display 114 and decode the product information to obtain the product identifier (e.g., based on the SKU number) .
In a synchronized but independent path, CV system 170 uses camera 124a to capture one or more images of product 140 passing over scanner 112 and ranks a set of known products based on their respective similarities to product 140. Further, the requisite criteria of ranking in this instance may be set as the top 5. In this case, if the product identifier as obtained by checkout machine 110 could be found within the top 5 ranked products, CV system 170 would determine the product information from two sources are consistent, otherwise, inconsistent. The top-N criteria in this example is not limiting. A skilled person would recognize that the requisite criteria of ranking may be adapted to fit the requirements of a practical application implementing the disclosed technologies, e.g., based on the specific product recognition technology, the specific similarity measurement technology, the MRL system, the preference of the retailer, etc.
Subsequently, CV system 170 may cause the verification code or a message reflecting the verification code to display, e.g., via a graphical user interface (GUI) , on display 126, alternatively on computing device 190, e.g., a smartphone, a mobile device, or a computer, etc., which may be accessed by a loss prevention person. Advantageously, user 130 may be reminded to remedy any ongoing issues, e.g., by canceling a transaction in response to a negative verification code. Optionally, a clerk may be summoned to resolve issues for the customer.
In other embodiments, CV system 170 may active various components of checkout machine 110 in response to a verification code. For example, CV system 170 may activate a warning light or prompt a voice message to convey a message indicating the verification code. The message may include instructions for how user 130 may continue the transaction, such as how to cancel a problematic transaction or how to process the product again.
In one embodiment, CV system 170 may display the production information of top-ranked products on display 126. This is especially useful when product 140 does not have an MRL. In this way, user 130 may provide or confirm correct product information to checkout machine 110.
In addition to other components not shown, retriever 172, synchronizer 174, recognizer 176, verifier 178, and machine learning module (MLM) 180, operatively coupled with each other to achieve various functions of CV system 170. In various embodiments, retriever 172 is configured to retrieve data from respective sources, and synchronizer 174 is configured to synchronize respective data sources. In some embodiments, retriever 172 may continuously retrieve images from camera 124a and camera 124b, and synchronizer 174 may synchronize images from different sources so that synced images can be analyzed together to derive product information for the same product. In some embodiments, retriever 172 may retrieve images from camera 124a or camera 124b in response to a command from synchronizer 174, so that the cameras only need to capture images when needed.
Recognizer 176 is configured to recognize products from images and retrieve corresponding product information (e.g., product identifier, name, unit price, representative images, etc. ) . In some embodiments, recognizer 176 is to compare the image features of a query product with image features of known products for similarity, e.g., via one or more machine learning models (MLMs) in MLM 180, so that the known products may be ranked based on their respective similarity measures against the query product. Such rank information may be used by verifier 178 to determine whether the product information obtained by the checkout machine (e.g., via the product’s MRL) is consistent with the product information obtained by apparatus 120 from another independent source (e.g., product images captured by camera 124a) .
Recognizer 176 may use various computer vision technologies to recognize the product type or the quantity of product 140. The applications (PCT/CN2019/111643, PCT/CN2019/086367, and PCT/CN2019/073390) have disclosed some effective technical solutions for product recognition, which may be used by recognizer 176. In various embodiments, recognizer 176 uses a detector to detect product 140 or the quantity of product 140 based on images from camera 124a, and uses a retrieval model to recognize the product type of the detected object. Various detectors may be used by recognizer 176, such as two-stage detectors (e.g., Faster-RCNN, R-FCN, Lighthead-RCNN, Cascade R-CNN, etc. ) or one-stage detectors (e.g., SSD, Yolov3, RetinaNet, FCOS, EfficientDet, etc. ) . Various retrieval models may be used by recognizer 176, such as a combination of a type of network (e.g., VGG, ResNet, Inception,  EfficientNet) with a type of loss (e.g., triplet loss, contrastive loss, lifted loss, multi-similarity loss) . Further details of MLMs will be discussed in connection with MLM 180 herein.
In general, recognizer 176 may use technologies of object detection, image segmentation, or instance segmentation to determine the quantity of product 140. Different from semantic segmentation, instance segmentation would identify each instance of each product in an image. In some embodiments, neural networks (e.g., in MLM 180) are trained to perform such instance segmentation tasks. In other embodiments, other technologies may be used to determine the quantity of product 140, e.g., based on thresholding (using one or more specified threshold values to separate pixels into different levels to isolate objects) , K-means clustering, Histogram-based image segmentation, edge detection, etc.
Recognizer 176 is also configured to recognize product information based on images from camera 124b. Product information may include product identifier, product name, product price (unit price, total price, etc. ) , product quantity (the count of different products in a session, the total count of all products in a session, the count of the same product in a transaction, etc. ) , etc., which may be displayed by checkout machine 110. The pending applications (e.g., U.S. Application No. 16/672,883, entitled Character-based Text Detection and Recognition) have disclosed some effective technical solutions for text detection and recognition, which may be used by recognizer 176. In some embodiments, in the text detection stage, recognizer 176 first uses a convolutional network to identify a position of text from an image from camera 124b. For example, when an image passes through the neural network, various feature maps may be generated to indicate a confidence measure for whether text is presented and its position in the image. In the text recognition stage, recognizer 176 can extract the text from the respective positions identified in the text detection stage, e.g., based on a recursive-network-based approach or OCR-related technologies.
In some embodiments, the weight information of the product may serve as another verification source. In one embodiment, the weight of product 140 is used to determine the quantity of product 140. A typical scanner is equipped with a weight scale. The weight information may be derived from the images taken by camera 124b in some embodiments. The quantity of product 140 may be obtained based on a ratio of the weight divided by the average weight of an instance of product 140. Assuming, as an example, product 140 is a bunch of bananas overlapping each other, user 130 input 3 counts of banana to checkout machine 110, and the total counts of bananas computed based on computer vision technologies is also 3. However, verifier 178 would additionally take the weight information into account in some embodiments. If the weight divided by the average weight of a banana is inconsistent with the count from the  user input or computed based on the computer vision technologies, such as with a difference greater than a threshold (e.g., 20%) , verifier 178 will generate a negative verification code instead.
Returning to the machine learning models, the aforementioned many computer vision technologies may be implemented in MLM 180, which may include one or more neural networks in some embodiments. Different components in CV system 170 may use one or more different neural networks to achieve their respective functions, which will be further discussed in connection with the remaining figures. For example, recognizer 176 may use a trained neural network to learn the neural features of an unknown product, which may be represented by a feature vector in a high-dimensional feature space, and compute the similarity between the unknown product and a known product based on the cosine distance between their respective feature vectors in the high-dimensional feature space. In various embodiments, various MLMs and image data (e.g., image data retrieved by retriever 172, data associated with the high-dimensional feature space, etc. ) may be stored in data store 150 and accessible in real-time via network 160.
As used herein, a neural network comprises at least three operational layers. The three layers can include an input layer, a hidden layer, and an output layer. Each layer comprises neurons. The input layer neurons pass data to neurons in the hidden layer. Neurons in the hidden layer pass data to neurons in the output layer. The output layer then produces a classification. Different types of layers and networks connect neurons in different ways.
Every neuron has weights, an activation function that defines the output of the neuron given an input (including the weights) , and an output. The weights are the adjustable parameters that cause a network to produce a correct output. The weights are adjusted during training. Once trained, the weight associated with a given neuron can remain fixed. The other data passing between neurons can change in response to a given input (e.g., image) .
The neural network may include many more than three layers. Neural networks with more than one hidden layer may be called deep neural networks. Example neural networks that may be used with aspects of the technology described herein include, but are not limited to, multilayer perceptron (MLP) networks, convolutional neural networks (CNN) , recursive neural networks, recurrent neural networks, and long short-term memory (LSTM) (which is a type of recursive neural network) . Some embodiments described herein use a convolutional neural network, but aspects of the technology are applicable to other types of multi-layer machine classification technologies.
A CNN may include any number of layers. The objective of one type of layers (e.g., Convolutional, Relu, and Pool) is to extract features of the input volume, while the objective of another type of layers (e.g., fully connected (FC) and Softmax) is to classify based on the extracted features. An input layer may hold values associated with an instance. For example, when the instance is an image (s) , the input layer may hold values representative of the raw pixel values of the image (s) as a volume (e.g., a width, W, a height, H, and color channels, C (e.g., RGB) , such as W x H x C) , or a batch size, B.
One or more layers in the CNN may include convolutional layers. The convolutional layers may compute the output of neurons that are connected to local regions in an input layer (e.g., the input layer) , each neuron computing a dot product between their weights and a small region they are connected to in the input volume. In a convolutional process, a filter, a kernel, or a feature detector includes a small matrix used for feature detection. Convolved features, activation maps, or feature maps are the output volume formed by sliding the filter over the image and computing the dot product. An exemplary result of a convolutional layer may include another volume, with one of the dimensions based on the number of filters applied (e.g., the width, the height, and the number of filters, F, such as W x H x F, if F were the number of filters) .
One or more of the layers may include a rectified linear unit (ReLU) layer. The ReLU layer (s) may apply an elementwise activation function, such as the max (0, x) , thresholding at zero, for example, which turns negative values to zeros (thresholding at zero) . The resulting volume of a ReLU layer may be the same as the volume of the input of the ReLU layer. This layer does not change the size of the volume, and there are no hyperparameters.
One or more of the layers may include a pooling layer. A pooling layer performs a function to reduce the spatial dimensions of the input and control overfitting. This layer may use various functions, such as Max pooling, average pooling, or L2-norm pooling. In some embodiments, max pooling is used, which only takes the most important part (e.g., the value of the brightest pixel) of the input volume. By way of example, a pooling layer may perform a down-sampling operation along the spatial dimensions (e.g., the height and the width) , which may result in a smaller volume than the input of the pooling layer (e.g., 16 x 16 x 12 from the 32 x 32 x 12 input volume) . In some embodiments, the convolutional network may not include any pooling layers. Instead, strided convolutional layers may be used in place of pooling layers.
One or more of the layers may include a fully connected (FC) layer. An FC layer connects every neuron in one layer to every neuron in another layer. The last FC layer normally uses an activation function (e.g., Softmax) for classifying the generated features of the input  volume into various classes based on the training dataset. The resulting volume may take the form of 1 x 1 x number of classes.
Further, calculating the length or magnitude of vectors is often required either directly as a regularization method in machine learning, or as part of broader vector or matrix operations. The length of the vector is referred to as the vector norm or the vector’s magnitude. The L1 norm is calculated as the sum of the absolute values of the vector. The L2 norm is calculated as the square root of the sum of the squared vector values. The max norm is calculated as the maximum vector values.
As discussed previously, some of the layers may include parameters (e.g., weights or biases) , such as a convolutional layer, while others may not, such as the ReLU layers and pooling layers, for example. In various embodiments, the parameters may be learned or updated during training. Further, some of the layers may include additional hyper-parameters (e.g., learning rate, stride, epochs, kernel size, number of filters, type of pooling for pooling layers, etc. ) , such as a convolutional layer or a pooling layer, while other layers may not, such as a ReLU layer. Various activation functions may be used, including but not limited to, ReLU, leaky ReLU, sigmoid, hyperbolic tangent (tanh) , exponential linear unit (ELU) , etc. The parameters, hyper-parameters, or activation functions are not to be limited and may differ depending on the embodiment.
Although input layers, convolutional layers, pooling layers, ReLU layers, and fully connected layers are discussed herein, this is not intended to be limiting. For example, additional or alternative layers, such as normalization layers, Softmax layers, or other layer types, may be used in a CNN.
Different orders and layers in a CNN may be used depending on the embodiment. For example, when CV system 170 is used in practical applications for loss prevention (e.g., with emphasis on product-oriented action recognition) , there may be one order and one combination of layers; whereas when CV system 170 is used in practical applications for crime prevention in public areas (e.g., with emphasis on person-oriented action recognition) , there may be another order and another combination of layers. In other words, the layers and their order in a CNN may vary without departing from the scope of this disclosure.
Although many examples are described herein concerning using neural networks, and specifically convolutional neural networks, this is not intended to be limiting. For example, and without limitation, MLM 180 may include any type of machine learning models, such as a machine learning model (s) using linear regression, logistic regression, decision trees, support vector machines (SVM) , 
Figure PCTCN2020082735-appb-000001
Bayes, k-nearest neighbor (KNN) , K means clustering, random  forest, dimensionality reduction algorithms, gradient boosting algorithms, neural networks (e.g., auto-encoders, convolutional, recurrent, perceptrons, long or short term memory/LSTM, Hopfield, Boltzmann, deep belief, deconvolutional, generative adversarial, liquid state machine, etc. ) , or other types of machine learning models.
CV system 170 is merely one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use or functionality of aspects of the technologies described herein. Neither should this system be interpreted as having any dependency or requirement relating to any one component nor any combination of components illustrated.
It should be understood that this arrangement of various components in CV system 170 is set forth only as an example. Other arrangements and elements (e.g., machines, networks, interfaces, functions, orders, and grouping of functions, etc. ) can be used in addition to or instead of those shown, and some elements may be omitted altogether for the sake of clarity. Further, many of the elements described herein are functional entities that may be implemented as discrete or distributed components or in conjunction with other components, and in any suitable combination and location. Various functions described herein as being performed by an entity may be carried out by hardware, firmware, or software. For instance, some functions may be carried out by a processor executing instructions stored in memory.
It should be understood that each of the components shown in CV system 170 may be implemented on any type of computing device, such as computing device 600 described in FIG. 6. Further, each of the components may communicate with various external devices via a network, which may include, without limitation, a local area network (LAN) or a wide area network (WAN) .
Referring now to FIG. 2, which illustrates respective embodiments of an exemplary apparatus. Apparatus 200A may be installed next to checkout machine 210A. Apparatus 200B may be installed next to checkout machine 210B. Apparatus 200C may be installed next to checkout machine 210C. Apparatus 200D may be installed next to checkout machine 210D. All illustrated embodiments of the exemplary apparatus may be conveniently installed next to a checkout machine without modifying the existing components or interrupting the regular functions of the checkout machine at least because these embodiments of the exemplary apparatus are electronically decoupled from the checkout machine.
Apparatus 200A is a single-camera configuration. Field of view 217 of camera 218A covers both display 216A and scanner 214A of checkout machine 210A. An image from camera 218A may be segmented. The part of the image covering display 216A forms one source,  which has the product information as obtained by checkout machine 210A. The part of the image covering scanner 214A and the product thereof forms another source, which can be used to derive the product information as determined via product recognition. Base 250A enables apparatus 200A to stand next to checkout machine 210A, and the initial installation of apparatus 200A is as easy as setting up a floor lamp. During the verification process, various messages may be displayed on display 226A. Light 232A may also be turned on, turned off, or flash in a certain pattern (e.g., a unique combination of duration and frequency) in response to a particular verification result. For example, light 232A may flash in a pattern to indicate a negative verification result, so a staff member may be summoned to assist the customer. Further, apparatus 200A is equipped with a wireless communication module (WCM) 230A. In some embodiments, WCM 230A enables apparatus 200A to communicate with a remote server via a wireless network (e.g., via WiFi or 5G) . In some embodiments, the verification process is at least partially executed in a remote server instead of locally in apparatus 200A, e.g., by transmitting the images taken by camera 218A to the remote server and receiving the corresponding verification code and other messages. In some embodiments, WCM 230A also enables apparatus 200A to receive updates (e.g., MLM updates, firmware updates, software updates, configuration updates, etc. ) .
Apparatus 200B has a dual-camera configuration. Camera 218B is configured to capture images of products on scanner 214B, while camera 219B is configured to capture images of display 216B of checkout machine 210B. Compared to apparatus 200A, the dual-camera configuration requires an additional camera but no longer requires image post-processing to split one image into multiple sources. In some embodiments, camera 218B and camera 219B may take images in different resolutions. For example, the requisite resolution of camera 218B may be lower than the requisite resolution of camera 219B. The images from camera 218B are used for product recognition, but high-resolution images from camera 219B are required, e.g., for text recognition.
Sometimes, ground space in a retail store may be limited, or it is impractical to set up the disclosed apparatus next to the checkout machine, e.g., due to an obstacle. Apparatus 200C can be mounted to the ceiling via base 250C to save the ground space or bypass the obstacle. When the ceiling is too high to install the disclosed apparatus, apparatus 200D may be directly placed on the top of checkout machine 210D via base 250D. In this embodiment, weight 242 is attached to the other end of the arm to balance the weight of camera 218D and camera 219D.
Because these embodiments of the exemplary apparatus can be electronically decoupled from the checkout machine, a skilled person can appreciate additional embodiments of the exemplary apparatus that could be adapted to many different checkout machines with different types, shapes, sizes, surrounding structures, etc.
Referring now to FIG. 3, a schematic representation is provided illustrating an exemplary process to detect product information from multiple sources. In this example, apparatus 310 has a vertical arm and a horizontal arm. For the vertical arm, arm section 312b is extensible from arm section 312a, so that the horizontal arm may be adjusted to a suitable height, e.g. based on the height of checkout machine 320. For the horizontal arm, arm section 314b is extensible from arm section 314a. Similarly, arm section 314c may be further extended from arm section 314b. As such, camera 316a may be adjusted to cover the area of scanner 322, and camera 316b may be adjusted to cover the area of display 324. Additionally, microphone 318a and microphone 318b may be adjusted to have differential distances (e.g., D1 and D2 respectively, wherein D2 >> D1) to speaker 326 of checkout machine 320.
The disclosed solution is to synchronize image data from multiple sources. As used herein, synchronization refers to the process of selecting corresponding data from multiple sources so that the respective product information derived from the multiple sources may be crosschecked for verification. For a regular checkout transaction, the checkout machine typically would update the information on its display after detecting a new event (e.g., a reading of an MRL) . Therefore, it is important to synchronize images before and after each scan.
In some embodiments, this synchronization is achieved based on a moving path of a product over a predetermined area of the checkout machine. By way of example, to scan the MRL of a product, the product will enter and exit the area of scanner 322, thus forming a moving path over the area of scanner 322, as illustrated in image 332, image 334, and image 336, captured by camera 316a. Each image has its timestamp. According to this moving path, the verification system, as implemented in apparatus 310, can synchronize image 340 with image 332 based on their timestamps. Resultantly, image 340 and image 332 form a before-scan pair. Similarly, image 350 and image 336 may form an after-scan pair after a successful scan. For a failed scan, the information on the display wouldn’t change. In this case, image 360 and image 336 would form an after-scan pair instead.
From one source, a before-scan image and an after-scan image from camera 316b may be used to derive product information, as obtained by checkout machine 320. By way of example, computer vision or OCR technologies may be used to recognize the SKU number, the product name, the sales price, etc. in area 342, the count of products in area 344, the total amount  in area 346 of image 340. By comparing the information between the before-scan image and the after-scan image, the verification system can detect that the text has been modified from area 342 to area 352, specifically, an additional product, illustrated by line 358, has been added to area 352 in image 350. Similarly, the product count has been increased by one from area 344 to area 354. The total amount also increased from area 346 to area 356. In some embodiments, the product identifier is retrieved from image 350, e.g., the SKU number of the newly added product at line 358.
From another source, one or more images (e.g., image 332, image 334, and image 336) from camera 316a may be used to derive product information, e.g., based on one or more MLMs, as previously discussed with MLM 180 in FIG. 1. In some embodiments, the product identifiers of those products meeting the ranking criteria is retrieved to form a ranked list. If the product identifier as derived from image 350 can be found in the ranked list, the verification system will determine the product information from both sources to be consistent, accordingly generate a positive verification code. Otherwise, a negative verification code will be generated. A negative verification code may indicate the MRL on the product does not match the product, e.g., caused by a misplaced MRL.
Some goods have a unit price and commonly sold by counts instead of by weight. By way of example, the produce department may set a unit price for avocado, apple, banana, etc., and a user is expected to input the count to the checkout machine. As discussed previously, in some embodiments, the count of the product may be obtained from images taken by camera 316a, e.g., based on instance segmentation technologies. Meanwhile, the count of the product as input by the user may be retrieved from the images taken by camera 316b. These two counts may be crosschecked in some embodiments. The verification system may be configured to generate a different negative verification code if these two counts disagree. Different negative verification codes may have different meanings, and the verification system can generate different actions based on a specific code. Optionally, the verification system may display a message on display 370 to indicate the specific meaning of such verification code, and the user may be reminded to input the correct count.
However, for a non-scan or miss-scan case, by comparing the information between the before-scan image (e.g., image 340) and the after-scan image (e.g., image 360) , the verification system will detect that the text remains unchanged from area 342 to area 362. Similarly, the product count remains the same from area 344 to area 364. The total amount remains the same from area 346 to area 366. In such a non-scan case, although a product has moved over scanner 322, checkout machine 320 failed to obtain any information about the  product. Accordingly, the verification system will determine the product information from both sources to be inconsistent and generate a negative verification code accordingly.
In response to a negative verification code, the verification system may display the information (e.g., product identifier, product name, representative product images, etc. ) of the products in the ranked list, which may be derived based on the images from camera 316a. Resultantly, the user may manually input the correct product information to checkout machine 320 based on such product information on display 370. Further, as discussed previously, the count of the product may be determined, e.g., based on instance segmentation technologies. In some embodiments, the count of the product may also be displayed. Advantageously, apparatus 310 can assist customers to correctly identify and input product information (e.g., including product identifier or the count of product) to checkout machines, thus significantly improve the operation of conventional checkout machines, in other words, add a brand new function to conventional checkout machines. Additionally, the verification system may activate other loss prevention actions in response to a negative verification code, as previously discussed.
In some embodiments, this synchronization is achieved based on a specific sound emitted from the checkout machine. Most checkout machines are configured to emit a beep to indicate a successful scan. When a beep is detected, the verification system of apparatus 310 may select respective before-beep and after-beep images in the multiple sources. As the beep is equivalent to a scan in some embodiments, a similar verification logic, as described previously, may be carried out based on the selected before-beep and after-beep images.
In one embodiment, to select the after-scan image, apparatus 310 first detects a sound generated from checkout machine 320 based known acoustic characteristics of the sound, which may be configured during the installation of apparatus 310, e.g., by recording the sound emitted from speaker 326, and learning the acoustic characteristics of the sound (e.g., properties of the sound wave, such as duration, frequency, etc. ) . Checkout machine 320 may have a delay between emitting the beep and displaying product information on display 324, and such delay may be observed during the installation of apparatus 310. Accordingly, a time threshold may be configured for apparatus 310 to select an image taken by camera 316b as the after-scan image. For instance, assuming the sound is detected at the first timestamp, the after-scan image may be selected at a second timestamp, which may be determined based on the first timestamp and the time threshold.
In some embodiments, unlike camera 316a, which may be required to continuously capture images to detect an unanticipated checkout event, camera 316b is configured to only capture images after an indicative sound is detected, e.g., a beep to indicate a  successful scan. Apparatus 310 may compare the newly captured image with its immediate processor to identify the product information of the newly scanned product. Advantageously, less computing resources for storing and analyzing the images taken by camera 316b may be required, and, the life of camera 316b may be prolonged.
In some embodiments, the sound detection technique may also serve as another source for verification. For example, if no beep is detected and camera 316a still captures a product, the verification system may generate a negative verification code without analyzing the images from camera 316b, thus expedite the verification process.
To correctly associate a sound with checkout machine 320, apparatus 310 may use various sound localization technologies to identify a sound from speaker 326, e.g., using steered beamformer approaches, collocated microphone array approaches, learning method for binaural hearing approaches, head-related transfer function (HRTF) approached, cross-power spectrum phase (CSP) analysis approaches, 2D sensor line array approaches, hierarchical fuzzy artificial neural networks approaches, etc. In one embodiment, sound localization is performed by using two or more microphones. In this example, microphone 318a and microphone 318b may be adjusted to have differential distances (e.g., D1 and D2 respectively, wherein D2 >> D1, which could be measured during the initial installation) to speaker 326 of checkout machine 320. A skilled person may understand that the speed of sound is the distance travelled per unit time by a sound wave as it propagates through an elastic medium. At 20 ℃ (68 °F) , the speed of sound in air is about 343 meters per second. Accordingly, apparatus 310 can mathematically estimate the direction and the distance of speaker 326 in relation to microphone 318a and microphone 318b, e.g., based on the geometrical configuration (typically a triangle) among microphone 318a, microphone 318b, and speaker 326, as well as the difference of arrival times of the sound at the two microphones. In this way, apparatus 310 can also differentiate a beep emitted by speaker 326 from another beep emitted by another checkout machine.
FIG. 4 is a flow diagram illustrating an exemplary process of verification. For multi-camera-based embodiments, e.g., apparatus 200B, images from the multiple cameras may form multiple-source data. For single-camera-based embodiments, e.g., apparatus 200A, image pre-processing is required to derive multiple data streams from images taken by the single camera. In process 400, the verification system will synchronize source data 412, source data 418, and other source data at block 420.
As discussed previously, some embodiments may use product-moving-path-based approaches for synchronization. Some embodiments may use sound-based approaches for synchronization. Some embodiments may use a hybrid approach for synchronization, e.g.,  primarily based on the product moving path, supplemented by the sound detection. Other embodiments may use additional approaches. In general, all approaches are designed to synchronize source data based on the same product or the same transaction, so that product information of one product or one transaction would not be crosschecked against product information of another product or another transaction.
At block 430, the verification system determines product information (e.g., product data 442 or product data 448) from at least two sources based on the synchronized source data. In some embodiments, product data 442 is based on the information received by a checkout machine, and product data 448 is based on the information derived from a source independent from the checkout machine. For example, in connection with FIG. 3, the verification system may detect product information from the images taken by camera 316b based on text recognition technologies. Meanwhile, the verification system may detect product information from the images taken by camera 316a based on product recognition technologies, e.g., by comparing neural features of the product images with neural features of known products.
At block 450, the verification system determines whether the product information from the multiple sources are consistent. In some embodiments, the respective product identifiers from the multiple sources are required to match to be consistent. In some embodiments, the product identifier derived from one source is required to fall into a ranked list derived from another source. Other criteria for measuring consistency may be devised for other implementations of the disclosed technologies.
At block 460, the verification system generates a negative verification code if the product information from the multiple sources is inconsistent. It may be noted that the verification system is designed to generate a negative verification code in some embodiments if one source provided product information and another source did not, such as in a non-scan or miss-scan case.
At block 470, the verification system generates a positive verification code if the product information from the multiple sources is consistent.
FIG. 5 is a flow diagram illustrating another exemplary process of verification. Each block of process 500, and other processes described herein, comprises a computing process that may be performed using any combination of hardware, firmware, or software. For instance, various functions may be carried out by a processor executing instructions stored in memory. The process may also be embodied as computer-usable instructions stored on computer storage media or devices. The process may be provided by an application, a service, or in combination thereof.
At block 510, the process is to determine the first product information associated with a product from a first source. The product information may include a product identifier, e.g., an SKU number, a product name, a product symbol, etc. The first source may include one or more images of the display of a checkout machine. The one or more images may include a before-scan image and an after-scan image. The differential of the before-scan and the after-scan images usually encodes the newly scanned product information. Such information may be retrieved from the images based on OCR technologies.
At block 520, the process is to determine second product information associated with the product from a second source. The second source may include one or more images of the product moving through the scanner of the checkout machine. The process may recognize the product based on computer vision technologies, such as by comparing the neural features of one or more product images in question with neural features of images of known products. In some embodiments, the product information of the known product with the most similar neural features is used as the second product information associated with the product from the second source. In some embodiments, the product information of a ranked list of known products are used instead, for example, if the similarity measure between the neural features of one or more query images and neural features of each of the known products in the ranked list meets a requisite criteria, which may be decided based on the actual implementation of the disclosed technologies.
At block 530, the process is to generate a verification code based on whether the first product information is consistent with the second product information. In some embodiments, if the product information (e.g., the product identifier) from respective sources match with each other, the process will determine the first product information is consistent with the second product information, and generate a positive verification code. In some embodiments, if the first product information (e.g., the product identifier) can be found in the list of the second product information (when the second product information contains a list) , the process will determine the first product information is consistent with the second product information, and generate a positive verification code.
Accordingly, we have described various aspects of the disclosed technologies for video recognition. Each block in process 500 and other processes described herein comprises a computing process that may be performed using any combination of hardware, firmware, or software. For instance, various functions may be carried out by a processor executing instructions stored in memory. The processes may also be embodied as computer-usable instructions stored on computer storage media or devices. The process may be provided by an application, a service, or a combination thereof.
It is understood that various features, sub-combinations, and modifications of the embodiments described herein are of utility and may be employed in other embodiments without reference to other features or sub-combinations. Moreover, the order and sequences of steps or blocks shown in the above example processes are not meant to limit the scope of the present disclosure in any way and the steps or blocks may occur in a variety of different sequences within embodiments hereof. Such variations and combinations thereof are also contemplated to be within the scope of embodiments of this disclosure.
Referring to FIG. 6, an exemplary operating environment for implementing various aspects of the technologies described herein is shown and designated generally as computing device 600. Computing device 600 is but one example of a suitable computing environment and is not intended to suggest any limitation as to the scope of use of the technologies described herein. Neither should the computing device 600 be interpreted as having any dependency or requirement relating to any one or combination of components illustrated.
The technologies described herein may be described in the general context of computer code or machine-useable instructions, including computer-executable instructions such as program components, being executed by a computer or other machine. Generally, program components, including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks or implements particular abstract data types. The technologies described herein may be practiced in a variety of system configurations, including handheld devices, consumer electronics, general-purpose computers, and specialty computing devices, etc. Aspects of the technologies described herein may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communications network.
With continued reference to FIG. 6, computing device 600 includes a bus 610 that directly or indirectly couples the following devices: memory 620, processors 630, presentation components 640, input/output (I/O) ports 650, I/O components 660, and an illustrative power supply 670. Bus 610 may include an address bus, data bus, or a combination thereof. Although the various blocks of FIG. 6 are shown with lines for the sake of clarity, in reality, delineating various components is not so clear and, metaphorically, the lines would more accurately be grey and fuzzy. For example, one may consider a presentation component such as a display device to be an I/O component. The inventors hereof recognize that such is the nature of the art and reiterate the diagram of FIG. 6 is merely illustrative of an exemplary computing device that can be used in connection with different aspects of the technologies described herein. Distinction is not made between such categories as “workstation, ” “server, ” “laptop, ” “handheld device, ” etc.,  as all are contemplated within the scope of FIG. 6 and refers to “computer” or “computing device. ”
Computing device 600 typically includes a variety of computer-readable media. Computer-readable media can be any available media that can be accessed by computing device 600 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer-readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technologies for storage of information, such as computer-readable instructions, data structures, program modules, or other data.
Computer storage media includes RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disks (DVD) , or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Computer storage media does not comprise a propagated data signal.
Communication media typically embodies computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term “modulated data signal” means a signal that has its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared, and other wireless media. Combinations of any of the above should also be included within the scope of computer-readable media.
Memory 620 includes computer storage media in the form of volatile or nonvolatile memory. The memory 620 may be removable, non-removable, or a combination thereof. Exemplary memory includes solid-state memory, hard drives, optical-disc drives, etc. Computing device 600 includes processors 630 that read data from various entities, such as bus 610, memory 620, or I/O components 660. Presentation component (s) 640 present data indications to a user or other device. Exemplary presentation components 640 include a display device, speaker, printing component, vibrating component, etc. I/O ports 650 allow computing device 600 to be logically coupled to other devices, including I/O components 660, some of which may be built-in.
In various embodiments, memory 620 includes, in particular, temporal and persistent copies of CV logic 622. CV logic 622 includes instructions that, when executed by processor 630, result in computing device 600 performing functions, such as but not limited to,  process 400, process 500, or other processes discussed in connection with FIGS. 1-3. In various embodiments, CV logic 622 includes instructions that, when executed by processors 630, result in computing device 600 performing various functions associated with, but not limited to, various components in CV system 170 in FIG. 1 and various apparatuses in FIGS. 1-3.
In some embodiments, processors 630 may be packed together with CV logic 622. In some embodiments, processors 630 may be packaged together with CV logic 622 to form a System in Package (SiP) . In some embodiments, processors 630 can be integrated on the same die with CV logic 622. In some embodiments, processors 630 can be integrated on the same die with CV logic 622 to form a System on Chip (SoC) .
Illustrative I/O components include a microphone, joystick, gamepad, satellite dish, scanner, printer, display device, wireless device, a controller (such as a stylus, a keyboard, and a mouse) , a natural user interface (NUI) , and the like. In aspects, a pen digitizer (not shown) and accompanying input instrument (also not shown but which may include, by way of example only, a pen or a stylus) are provided to digitally capture freehand user input. The connection between the pen digitizer and processor (s) 630 may be direct or via a coupling utilizing a serial port, parallel port, system bus, or other interface known in the art. Furthermore, the digitizer input component may be a component separate from an output component, such as a display device. In some aspects, the usable input area of a digitizer may coexist with the display area of a display device, be integrated with the display device, or may exist as a separate device overlaying or otherwise appended to a display device. Any and all such variations, and any combination thereof, are contemplated to be within the scope of aspects of the technologies described herein.
I/O components 660 include various GUI, which allow users to interact with computing device 600 through graphical elements or visual indicators. Interactions with a GUI usually are performed through direct manipulation of graphical elements in the GUI. Generally, such user interactions may invoke the business logic associated with respective graphical elements in the GUI. Two similar graphical elements may be associated with different functions, while two different graphical elements may be associated with similar functions. Further, the same GUI may have different presentations on different computing devices, such as based on the different graphical processing units (GPUs) or the various characteristics of the display.
Computing device 600 may include networking interface 680. The networking interface 680 includes a network interface controller (NIC) that transmits and receives data. The networking interface 680 may use wired technologies (e.g., coaxial cable, twisted pair, optical fiber, etc. ) or wireless technologies (e.g., terrestrial microwave, communications satellites, cellular, radio and spread spectrum technologies, etc. ) . Particularly, the networking interface 680  may include a wireless terminal adapted to receive communications and media over various wireless networks. Computing device 600 may communicate with other devices via the networking interface 680 using radio communication technologies. The radio communications may be a short-range connection, a long-range connection, or a combination of both a short-range and a long-range wireless telecommunications connection. A short-range connection may include a
Figure PCTCN2020082735-appb-000002
connection to a device (e.g., mobile hotspot) that provides access to a wireless communications network, such as a wireless local area network (WLAN) connection using the 802.11 protocol. A Bluetooth connection to another computing device is a second example of a short-range connection. A long-range connection may include a connection using various wireless networks, including 1G, 2G, 3G, 4G, 5G, etc., or based on various standards or protocols, including General Packet Radio Service (GPRS) , Enhanced Data rates for GSM Evolution (EDGE) , Global System for Mobiles (GSM) , Code Division Multiple Access (CDMA) , Time Division Multiple Access (TDMA) , Long-Term Evolution (LTE) , 802.16 standards, etc.
The technologies described herein have been described with particular aspects, which are intended in all respects to be illustrative rather than restrictive. While the technologies described herein are susceptible to various modifications and alternative constructions, certain illustrated aspects thereof are shown in the drawings and have been described above in detail. It should be understood, however, there is no intention to limit the technologies described herein to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the technologies described herein.
Lastly, by way of example, and not limitation, the following examples are provided to illustrate various embodiments, following at least one aspect of the disclosed technologies.
Examples in the first group comprises an apparatus for verification with one or more of the following features. The order of the following features is not to limit the scope of any examples in this group. A first camera is adapted to capture a product placed on a first area of a checkout machine. A second camera is adapted to capture product information displayed on a second area of a checkout machine. A camera is adapted to capture a product placed on a first area of a checkout machine and product information displayed on a second area of the checkout machine. A display is adapted to display a message reflecting whether the product information is consistent with the product. A display is adapted to display an instruction for a user to interact with a checkout machine. A processor is adapted to verify whether the product information derived from one source is consistent with the product information derived from another source.  A radio-frequency module is adapted to transmit first image data of the product and second image data of the product information to a remote device, and receive a verification code regarding whether the product information derived from one source is consistent with the product information derived from another source. An adjustable mechanical arm is adapted to enable a first camera to capture a first area of the checkout machine, and a second camera to capture a second area of the checkout machine. An adjustable mechanical arm is adapted to enable a camera, which is mounted to the adjustable mechanical arm, to capture a product placed on a first area of a checkout machine and product information displayed on a second area of the checkout machine. An adjustable mechanical arm is extensible or flexible, so that a camera can be adjusted to a desirable height, a desirable direction, or a desirable spacial position in general. A supporting base, connected to an adjustable mechanical arm, is adapted to enable a camera, mounted to the adjustable mechanical arm, to maintain a stable spacial position. A supporting base, connected to an adjustable mechanical arm, configured to enable the apparatus to maintain a spacial form independent from a checkout machine. A first camera is fixed to a first location of an adjustable mechanical arm of the apparatus, and a second camera is fixed to a second location of the adjustable mechanical arm, wherein the first location is closer than the second location to a supporting base of the apparatus. A first microphone is fixed to a first location of an adjustable mechanical arm, and a second microphone is fixed to a second location of the adjustable mechanical arm, wherein the first location and the second location is selected to cause a differential distance from a speaker of a checkout machine to the respective microphones. A speaker is mounted to the apparatus. The product information comprises product identifier, product name, product price (unit price, total price, etc. ) , product quantity (the count of different products in a session, the total count of all products in a session, the count of the same product in a transaction, etc. ) , etc.
Examples in the second group comprises a method, a computer system adapted to perform the method, or a computer storage device storing computer-usable instructions that cause a computer system to perform the method. The method has one or more of the following features. The order of the following features is not to limit the scope of any examples in this group.
A feature of retrieving first product information associated with a product based on first image data generated from a first imaging source. A feature of detecting second product information associated with the product from second image data generated from a second imaging source. A feature of generating a positive verification code in response to the first product information being consistent with the second product information. A feature of generating a negative verification code in response to the first product information being  inconsistent with the second product information. A feature of recognizing, via a machine learning model, the product based on the first image data generated from the first imaging source. A feature of synchronizing the first image data and the second image data based on a moving path of the product over a predetermined area under a field of view of the first imaging source. A feature of detecting, at a first timestamp, a sound generated from a checkout machine based on one or more known acoustic characteristics of the sound. A feature of selecting an image of a display of the checkout machine from the second image data based on the first timestamp, wherein the image has a second timestamp later than the first timestamp. A feature of detecting the second product information associated with the product from the image of the display. A feature of identifying a first image and a second image from the second image data based on a moving path of the product over a predetermined area under a field of view of the first imaging source. A feature of detecting a text being modified from the first image to the second image, and deducing a product identifier of the product or a count of the product from the text in the second image. A feature of determining the first product information being consistent with the second product information in response to the first product identifier being matching the second product identifier, wherein the first product information comprises a first product identifier, and the second product information comprises a second product identifier. A feature of determining first product information associated with a product based on a first area in one or more images generated from an imaging device. A feature of determining second product information associated with the product from a second area in the one or more images generated from the imaging device. A feature of generating a positive verification code in response to the first product information being consistent with the second product information. A feature of generating a negative verification code in response to the first product information being inconsistent with the second product information. A feature of identifying the first area including a scanning area of a checkout machine; and identifying the second area including a display of the checkout machine. A feature of capturing an image of a display of the checkout machine in response to a sound being detected, and the sound is indicative of a successful scan of a product. A feature of determining, via a machine learning model, the first product information based on image data of the first area in the one or more images; and determining, via an optical character recognition model, the second product information based on image data of the second area in the one or more images. A feature of detecting the second product information being unknown, and determining the first product information being inconsistent with the second product information in response to the second product information being unknown. A feature of causing a light to flash or causing a warning message to display in response to a negative verification code. A  feature of causing a message to be wirelessly transmitted to a wireless device in response to a negative verification code. A feature of causing a message to be played via a speaker in response to a negative verification code. A feature of causing an instruction to display, wherein the instruction comprises product information of a product. The product information comprises product identifier, product name, product price (unit price, total price, etc. ) , product quantity (the count of different products in a session, the total count of all products in a session, the count of the same product in a transaction, etc. ) , etc.
All patent applications, patents, and printed publications cited herein are incorporated herein by reference in the entireties, except for any definitions, subject matter disclaimers or disavowals, and except to the extent that the incorporated material is inconsistent with the express disclosure herein, in which case the language in this disclosure controls.

Claims (20)

  1. An apparatus for verification, comprising:
    a first camera adapted to capture a product placed on a first area of a checkout machine;
    a second camera adapted to capture product information displayed on a second area of the checkout machine; and
    a display, operationally coupled to the first camera and the second camera, adapted to display a message reflecting whether the product information is consistent with the product.
  2. The apparatus of claim 1, further comprising:
    a processor, operationally coupled to the first camera and the second camera, adapted to verify whether the product information is consistent with the product in a first aspect of product identifier and a second aspect of product quantity.
  3. The apparatus of claim 1, further comprising:
    a radio-frequency module, operationally coupled to the first camera and the second camera, adapted to transmit first image data of the product and second image data of the product information to a remote device, and receive a verification code regarding whether the product information is consistent with the product.
  4. The apparatus of claim 1, further comprising:
    an adjustable mechanical arm, connected to the first camera, adjusted to enable the first camera to capture the first area of the checkout machine.
  5. The apparatus of claim 1, further comprising:
    an adjustable mechanical arm, connected to the second camera, adjusted to enable the second camera to capture the second area of the checkout machine.
  6. The apparatus of claim 1, further comprising:
    an adjustable mechanical arm, connected to the first camera and the second camera, adjusted to enable the first camera to capture the first area of the checkout machine, and adjusted to enable the second camera to capture the second area of the checkout machine.
  7. The apparatus of claim 6, further comprising:
    a supporting base, connected to the adjustable mechanical arm, configured to enable the first camera or the second camera to maintain a stable spacial position.
  8. The apparatus of claim 6, further comprising:
    a supporting base, connected to the adjustable mechanical arm, configured to enable the apparatus to maintain a spacial form independent from the checkout machine.
  9. The apparatus of claim 8, wherein the first camera is fixed to a first location of the adjustable mechanical arm, the second camera is fixed to a second location of the adjustable mechanical arm, and the first location is closer than the second location to the supporting base.
  10. A computer-implemented method for verification, comprising:
    retrieving first product information associated with a product based on first image data from a first imaging source;
    detecting second product information associated with the product from second image data from a second imaging source;
    generating a positive verification code in response to the first product information being consistent with the second product information; and
    generating a negative verification code in response to the first product information being inconsistent with the second product information.
  11. The method of claim 10, further comprising:
    recognizing, via a machine learning model, the product based on the first image data from the first imaging source.
  12. The method of claim 10, further comprising:
    synchronizing the first image data and the second image data based on a moving path of the product over a predetermined area under a field of view of the first imaging source.
  13. The method of claim 10, further comprising:
    detecting, at a first timestamp, a sound generated from a checkout machine based on one or more known acoustic characteristics of the sound;
    selecting an image of a display of the checkout machine from the second image data based on the first timestamp, wherein the image has a second timestamp later than the first timestamp; and
    detecting the second product information associated with the product from the image of the display.
  14. The method of claim 10, further comprising:
    identifying a first image and a second image from the second image data based on a moving path of the product over a predetermined area under a field of view of the first imaging source; and
    detecting a text being modified from the first image to the second image, and deducing a product identifier of the product or a count of the product from the text in the second image.
  15. The method of claim 10, wherein the first product information comprises a first product identifier, and the second product information comprises a second product identifier, the method further comprising:
    in response to the first product identifier being matching the second product identifier, determining the first product information being consistent with the second product information.
  16. A computer-readable storage device encoded with instructions that, when executed, cause one or more processors of a computing system to perform operations of verification, comprising:
    determining first product information associated with a product based on a first area in one or more images generated from an imaging device;
    determining second product information associated with the product from a second area in the one or more images generated from the imaging device;
    in response to the first product information being consistent with the second product information, generating a positive verification code; and
    in response to the first product information being inconsistent with the second product information, generating a negative verification code.
  17. The computer-readable storage device of claim 16, wherein the instructions that, when executed, further cause the one or more processors to perform operations comprising:
    identifying the first area including a scanning area of a checkout machine; and
    identifying the second area including a display of the checkout machine.
  18. The computer-readable storage device of claim 16, wherein the determining first product information comprises determining, via a machine learning model, the first product information based on image data of the first area in the one or more images; and wherein the determining second product information comprises determining, via another machine learning model, the second product information based on image data of the second area in the one or more images.
  19. The computer-readable storage device of claim 16, wherein the detecting second product information associated with the product comprises detecting the second product information being unknown, wherein the instructions that, when executed, further cause the one or more processors to perform operations comprising:
    determining the first product information being inconsistent with the second product information in response to the second product information being unknown.
  20. The computer-readable storage device of claim 16, wherein the instructions that, when executed, further cause the one or more processors to perform operations comprising:
    in response to the negative verification code, causing a light to flash or a message to display.
PCT/CN2020/082735 2019-05-10 2020-04-01 Apparatus and methods for multi-sourced checkout verification WO2020228437A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
PCT/CN2019/086367 WO2020227845A1 (en) 2019-05-10 2019-05-10 Compressed network for product recognition
CNPCT/CN2019/086367 2019-05-10
CNPCT/CN2019/111643 2019-10-17
PCT/CN2019/111643 WO2021072699A1 (en) 2019-10-17 2019-10-17 Irregular scan detection for retail systems
US16/672,883 US20210110189A1 (en) 2019-10-14 2019-11-04 Character-based text detection and recognition
US16/672,883 2019-11-04

Publications (1)

Publication Number Publication Date
WO2020228437A1 true WO2020228437A1 (en) 2020-11-19

Family

ID=73290365

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/082735 WO2020228437A1 (en) 2019-05-10 2020-04-01 Apparatus and methods for multi-sourced checkout verification

Country Status (1)

Country Link
WO (1) WO2020228437A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106781017A (en) * 2017-03-07 2017-05-31 深圳市楼通宝实业有限公司 Self-service vending method and system
CN108109007A (en) * 2017-12-21 2018-06-01 张志勇 A kind of automatic shopping system and automatic shopping method based on weight identification
CN108520409A (en) * 2018-03-28 2018-09-11 深圳正品创想科技有限公司 A kind of express checkout method, apparatus and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106781017A (en) * 2017-03-07 2017-05-31 深圳市楼通宝实业有限公司 Self-service vending method and system
CN108109007A (en) * 2017-12-21 2018-06-01 张志勇 A kind of automatic shopping system and automatic shopping method based on weight identification
CN108520409A (en) * 2018-03-28 2018-09-11 深圳正品创想科技有限公司 A kind of express checkout method, apparatus and electronic equipment

Similar Documents

Publication Publication Date Title
US20210319420A1 (en) Retail system and methods with visual object tracking
US20200151692A1 (en) Systems and methods for training data generation for object identification and self-checkout anti-theft
US11749072B2 (en) Varied detail levels of shopping data for frictionless shoppers
CN109414119B (en) System and method for computer vision driven applications within an environment
US20210217017A1 (en) System and methods for monitoring retail transactions
US20210049400A1 (en) Mislabeled product detection
US10503961B2 (en) Object recognition for bottom of basket detection using neural network
US20240013633A1 (en) Identifying barcode-to-product mismatches using point of sale devices
US20230037427A1 (en) Identifying barcode-to-product mismatches using point of sale devices and overhead cameras
WO2021072699A1 (en) Irregular scan detection for retail systems
US20230087587A1 (en) Systems and methods for item recognition
WO2020228437A1 (en) Apparatus and methods for multi-sourced checkout verification
WO2021232333A1 (en) System and methods for express checkout
US20210279505A1 (en) Progressive verification system and methods
US20220019988A1 (en) Methods and systems of a multistage object detection and tracking checkout system
US20240005750A1 (en) Event-triggered capture of item image data and generation and storage of enhanced item identification data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20804831

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20804831

Country of ref document: EP

Kind code of ref document: A1