US20210312206A1 - System and method for classifier training and retrieval from classifier database for large scale product identification - Google Patents
System and method for classifier training and retrieval from classifier database for large scale product identification Download PDFInfo
- Publication number
- US20210312206A1 US20210312206A1 US17/353,024 US202117353024A US2021312206A1 US 20210312206 A1 US20210312206 A1 US 20210312206A1 US 202117353024 A US202117353024 A US 202117353024A US 2021312206 A1 US2021312206 A1 US 2021312206A1
- Authority
- US
- United States
- Prior art keywords
- product
- classifiers
- classifier
- database
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000012549 training Methods 0.000 title claims abstract description 28
- 238000000034 method Methods 0.000 title abstract description 51
- 238000003384 imaging method Methods 0.000 claims description 35
- 238000012545 processing Methods 0.000 claims description 21
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 238000013528 artificial neural network Methods 0.000 claims description 17
- 238000004891 communication Methods 0.000 claims description 14
- 239000003086 colorant Substances 0.000 claims description 11
- 238000004806 packaging method and process Methods 0.000 claims description 7
- 230000001419 dependent effect Effects 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000012360 testing method Methods 0.000 abstract description 6
- 238000010200 validation analysis Methods 0.000 abstract description 3
- 238000011897 real-time detection Methods 0.000 abstract description 2
- 239000000047 product Substances 0.000 description 250
- 239000013598 vector Substances 0.000 description 17
- 238000001514 detection method Methods 0.000 description 13
- 238000000605 extraction Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 5
- 238000012015 optical character recognition Methods 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 235000013405 beer Nutrition 0.000 description 3
- 239000006227 byproduct Substances 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 238000003780 insertion Methods 0.000 description 3
- 230000037431 insertion Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 230000001747 exhibiting effect Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 241000132092 Aster Species 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 108010068370 Glutens Proteins 0.000 description 1
- 244000185238 Lophostemon confertus Species 0.000 description 1
- 244000081841 Malus domestica Species 0.000 description 1
- 244000141359 Malus pumila Species 0.000 description 1
- 240000001899 Murraya exotica Species 0.000 description 1
- 241000669426 Pinnaspis aspidistrae Species 0.000 description 1
- 238000003646 Spearman's rank correlation coefficient Methods 0.000 description 1
- 235000002017 Zea mays subsp mays Nutrition 0.000 description 1
- 241000482268 Zea mays subsp. mays Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 235000021016 apples Nutrition 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000005452 bending Methods 0.000 description 1
- 150000001720 carbohydrates Chemical class 0.000 description 1
- 235000014633 carbohydrates Nutrition 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 235000014510 cooky Nutrition 0.000 description 1
- 235000013365 dairy product Nutrition 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 235000021312 gluten Nutrition 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 235000021440 light beer Nutrition 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 235000016709 nutrition Nutrition 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000002187 spin decoupling employing ultra-broadband-inversion sequences generated via simulated annealing Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Images
Classifications
-
- G06K9/3241—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/087—Inventory or stock management, e.g. order filling, procurement or balancing against orders
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B62—LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
- B62B—HAND-PROPELLED VEHICLES, e.g. HAND CARTS OR PERAMBULATORS; SLEDGES
- B62B3/00—Hand carts having more than one axis carrying transport wheels; Steering devices therefor; Equipment therefor
- B62B3/14—Hand carts having more than one axis carrying transport wheels; Steering devices therefor; Equipment therefor characterised by provisions for nesting or stacking, e.g. shopping trolleys
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G06K9/6215—
-
- G06K9/6223—
-
- G06K9/6276—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/04—Billing or invoicing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/87—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using selection of the recognition techniques, e.g. of a classifier in a multiple classifier system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/192—Recognition using electronic means using simultaneous comparisons or correlations of the image signals with a plurality of references
- G06V30/194—References adjustable by an adaptive method, e.g. learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/09—Recognition of logos
Definitions
- the disclosure is directed to systems and methods for real-time detection of items in a given constrained volume. Specifically, the disclosure relates to systems and methods of classifier assignment and retrieval from database for large-scale product detection.
- the Artificial Intelligent Cart is configured to automatically identify, in substantially real-time, inserted physical products- out of a significantly large number of different available products (stock keeping items) the inventory of a store and/or a warehouse.
- the AIC can comprise a plurality of imaging sensors and at least one load cell that are utilized for product recognition.
- Current state-of-the-art technologies, such as large-scale deep neural networks provide insufficient recognition accuracy when the number of products that are expected to be recognized is increased substantially.
- a system for automated product identification in a shopping cart comprising: a cart having a front wall, a rear wall and two side walls forming an apically open container with a base; a load cell module operably coupled to the base of the cart; a plurality of imaging modules coupled to the cart, adapted to identify, at least one of image an item inserted into the cart, and image an area of interest outside the cart; a central processing module in communication with the load cell and the plurality of imaging modules, the central processing module comprising a processor and being in further communication with a non-volatile memory having thereon: a classifiers' database; a product characteristics database; and a processor readable media comprising a set of executable instruction, configured, when executed to cause the processor to retrieve a set of a plurality of classifiers, wherein each set of classifiers is configured to identify a single product inserted into the cart.
- FIG. 1A Illustrates the characteristic extraction from AIC images, also showing the inference steps in the AIC, provided for illustrative purposes while FIG. 1B illustrates a flow diagrams of the characteristics extraction during the training process, including the classifier database update provided for illustrative purposes;
- FIG. 3 shows a flow diagram of the classifier training and management when new products is inserted or when a product is removed.
- FIG. 4 illustrates a schematic of the system components associated with the execution of the methods.
- the disclosure provides embodiments of systems and methods of classifier determination, assignment and retrieval from database for fine-grained, large-number—product detection.
- the product recognition task is split among multiple classifiers—each classifier is trained to recognize a small portion of products sharing similar characteristics.
- the product characteristics are extracted and identified by the AIC.
- the extracted product characteristics are then utilized as keys to extract a set of relevant classifiers that are needed to produce a single correct recognition.
- the relevant classifiers are retrieved from a large classifier database that is typically created to cover the entire available products of a retail store or chain of stores.
- multiple classifiers are utilized for each product recognition.
- the different classifiers in the classifier database are trained on products sharing similar characteristics. For each inserted product, a set of relevant classifiers are selected from the classifier-database.
- the retrieved classifiers are then utilized to recognize the inserted products.
- the classifier retrieval method is based on extraction of product characteristics from a set of cameras and sensors on the AIC cart.
- the set of extracted characteristics are used as keys to retrieve a set of relevant classifiers from the database.
- Product characteristics can be, for example:
- AIC Classifier Pseudo-code Inputs • Product images ⁇ images ⁇ i Product weight [gr], and product weights database ‘weightDB’
- Logo detection CNN Convolutional Neural Network
- LogoNet Shape detection CNN
- Text detection CNN - ‘TextDetectorNet’
- Text recognition network OCR
- OCRNet Text recognition network
- the product's wrap colors By extracting the color-palette of an inserted product, it may be used as a set of keys for retrieving classifiers that were trained on products that exhibit a similar color-palette.
- a typical method to represent a product color-palette may be a color Histogram applied to an image or a partial image of the product.
- the color histogram may be applied to RGB images or interchangeably to other known color-space representations such as hue saturation value (HSV), CIE-LAB, CIE-XYZ, CIE-LCH, and YCbCr to produce such histogram of a product image.
- HSV hue saturation value
- CIE-LAB CIE-LAB
- CIE-XYZ CIE-LCH
- YCbCr YCbCr
- the image can undergo a preprocessing step to reduce the number of colors by aggregating similar colors by one of the methods for color-quantization such as K-means, minmax, fuzzy c-means, balanced iterative reducing and clustering using hierarchies (BIRCH), median-cut, center-cut, rwm-cut, octree, variance-based, binary splitting, greedy orthogonal bipartitioning, optimal principal multilevel quantizer, and the like or their combination.
- Another option to reduce the presentation size of a color palette is the use of wider histogram bins and/or self-organizing maps.
- the color quantization in other words, reduction, or averaging
- a custom (adaptive) palette is determined based on the color distribution of the image.
- Other preprocessing steps may also include steps to filter background information that may interfere with the extraction of the product's color palette (i.e. background subtraction), reduce artifacts such as reflections and shadows, image denoising, apply contrast enhancements, and other common image processing techniques intended for image improvement thereby improving the accuracy (in other words, the specificity and selectivity), of the characteristics' extraction.
- Another possible presentation of product's colors is as a form of point-cloud where each color in the palette is presented as a point in the 3D color-space. This presentation lacks the color-occurrence-rate as in color histograms but may represent the existence of the product colors.
- EMD Earth Movers Distance
- Wasserstein-Distance an accurate metric since, being a cross-bin distance, it is robust to color shifts due to lighting and changes in image-acquisition conditions (e.g., focal length.—.
- Another example of cross bin distance measure that can be used to reidentify captured color histogram can be, for example, Mahalanobis, measuring the distance between an observed point and a distribution.
- Other metrics can be the Kullback- Leibler (KL)-Divergence, Bhattacharyya distance, and Chi-square (X 2 ) distance (all bin-to-bin distance metrics).
- Hausdorff distance This is a measure for similarity between two points clouds. Hausdorff provides accurate similarity between a query image typically exhibiting a single or few facets of the product's wrap/box (packaging) and the product's color-cloud based presentations that should include all facets of the product wrap/box.
- the products box/wrap shape designator (referring to a string, number or any other means that can be used to designate an extracted feature), is another characteristic that may be used for classifier retrieval.
- the box/wrap (packaging) shape may be extracted by various computer-vision based techniques. Such techniques typically include feature extraction of the product boundaries by processing edges presented in the images and extracting edge features. For example, the shape may be presented by extracting contours of the product, or by estimation applied to the products silhouette.
- HOG histogram of oriented gradients
- DNN deep-neural-networks
- shape parameters that can be used alone, or in combination as classifier(s) in the shape-based image retrieval can be at least one of Center of gravity, Axis of least inertia, Digital bending energy, Eccentricity, Circularity ratio, Elliptic variance, Rectangularity, Convexity, Solidity, Euler number, Profiles, Hole area ratio, and the like.
- shape or shape-descriptor When the shape or shape-descriptor is embedded in some space on the package, and the shapes descriptors are semantically-related (e.g a “Jar” is “closer” to a “bottle” than to a paper box, while “cup” and “bowl” may be closer than “bowl” and “can”). This relatedness can be used in certain embodiment, to increase the selectivity and specificity of the classification and re-identification.
- the product's weight and physical dimensions are other product characteristics that are used as a key-word for classifier retrieval.
- Weight is achieved by a load cell module (referring to any device that detects a physical force such as the weight of a load and generates a corresponding electrical signal).
- the product physical size i.e. physical dimensions
- Another option for size detection is by utilizing—imaging modules such as RGB-D where the typical RGB color information is accompanied by depth information.
- the size and weight combination may be used to differentiate the same products that only differ in box size and products that are sold in variable batch sizes, such as some dairy products and fruits—thus providing the required level of granularity.
- shape of the product can also be extracted by utilizing the depth information from the RGBD camera.
- Products logos and trademarks are both examples of characteristics that can be used for classifier retrieval.
- a given product can exhibit multiple logos and trademarks on its package.
- a products box/wrap can exhibit the company/producer logo, a products series logo (if the products is a part of a series), the specific products logo, along with other logos and trademarks.
- the products logos are extracted by various techniques. Notable techniques utilize image descriptors such as HOG, oriented FAST rotated BRIEF (ORB), KAZE, that are used to produce local image description, allowing the detection of specific logo-pattern in a given product image.
- Examples for such networks include single-shot MultiBox detector (SSD), Regional convolutional neural network (RCNN), Fast-RCNN, Faster-RCNN, you only look once (YOLO), that were trained to detect and locate desired logos.
- SSD single-shot MultiBox detector
- RCNN Regional convolutional neural network
- YOLO you only look once
- the available logos are stored in the logo database where they are associated with a numeric value allowing their retrieval. When a products exhibits multiple logos, all logos for that product may be extracted and used to characterize the relevant classifiers.
- Specific key-words on the product may be used as key-words for classifier retrieval. If a specific key word can be identified in a product image, it may be used as a key to access the relevant classifiers in the classifiers database. For example, the word ‘Apple’ may appear on various products containing apples as one of their ingredients.
- OCR optical character recognition
- DNN's are used for both text-localization and recognition in a given image.
- the quality of the detected text may be provided, allowing to use words that were detected and recognized with high confidence.
- Examples for text detection in natural images are EAST, PSENet, FOTS, Text-Snakes for text detection.
- Examples of a text recognition algorithms include ASTER and ESIR. Products typically exhibit an abundance of words on their box/wrap. Some are not useful for the classifier retrieval task. As an example of such words are those presented in nutritional values table (e.g. carbohydrates, fat etc . . . ).
- the system can be configured to filter out such words that have a too common appearance and discards them when they are processed for producing keys for classifier retrieval.
- DNN Deep Neural Network
- CNN Convolutonal Neural Network
- Inception, Xception or any other features produced by a DNN
- Such ‘Deep Features’ can be extracted a single or multiple network layers. Combining deep features from multiple DNN' s is also an option to improve the sensitivity and specificity for a given product.
- the extracted ‘Deep Features’ are used as an image descriptor to characterize a product and to provide probable-candidates for the product and direct the retrieval of relevant classifiers.
- the extracted text, key-words and OCR-retrieved texts can be further used to develop a site-specific ontology, which can then be used to assign relatedness metric. That metric can then be used in associating and retrieving relevant classifiers when the classifiers are retrieved by matching, and/or mapping representative words of the inserted products, extracted by the imaging module, to representative words associated to the classifiers in the classifiers database.
- the classifiers can be selected by a nearest-neighbors (NN) or approximated nearest neighbors (ANN) algorithm. There are multiple variants that can be considered for classifier selection form the database. One is to apply NN or ANN to each type of product characteristics separately and intersecting the results to obtain a smaller set that is more appropriate for on-line processing.
- NN nearest-neighbors
- ANN approximated nearest neighbors
- each of those characteristics is used separately to retrieve relevant models—
- the color palette is used to retrieve classifiers that were trained on products with similar blue and red color pallets
- the bottle shape is used to retrieve a set of classifiers that were trained on various bottles' shapes.
- the key word—‘beer’ will be used to filter products that exhibit the same key-word ‘beer’ on their package.
- hash functions it is possible to both classify and rapidly retrieve text strings such as “light beer”.
- the classifiers retrieved based on each characteristic will result a small set of classifiers which can identify the product with higher accuracy despite having thousands of products available in the product database.
- the formula above formulates this approach.
- the formula is presented with 6 different characteristics intersected (in other words, selected by both NN and ANN).
- the described approach may be extended to include other characteristics (e.g., key word, graphics, and the like), by adding additional elements as additional intersections in the formula.
- Text-based key words extracted by OCR are converted in an exemplary implementation to numeric presentation.
- Various methods can be used to represent words as numeric vector (a.k.a word embedding). For example, embedding words into dense vectors (where each word is represented by a dense vector, a point in a vector space, such that the dimension of the semantic representation d is usually much smaller than the size of the vocabulary (d ⁇ V, the vocabulary size), and all dimensions contain real-valued numbers, which can be normalized between -1 and 1) and others to sparse presentations.
- Additional methods can use neural networks for this embedding (where word is represented by a one-hot representation, a unique symbolic ID, such that the dimension of the symbolic representation for each word is equal to the size of the vocabulary V (number of words belonging to the reference text), where all but one dimension are equal to zero, and one dimension is set to be equal to one.
- An example is laccard distance' (measure the fraction of words any two sets have in common) can be used to compare lists of key-words.
- Other similarity metrics such as, for example Cosine similarity, Spearman's rank correlation coefficient, or Pearson X 2 (Chi-sqaure) test-based X 2 distance may be used additionally or alternatively.
- Another option for words is to use the character string as hashing keys to retrieve relevant classifiers who match a word or a set of words (for example “gluten free”).
- Shape of product may also be encoded to a numeric set of values. Accordingly, the semantic relation between different box/wrap shapes may be defined. The semantic relation between various shapes is needed since shape appearance may also be dependent on the imaging direction. For example, a circular shaped product packaging may be perceived as either a bottle or a jar, depending on the direction of image capturing. Since the number of available box/wrap/package shapes' is typically smaller, the shape similarity or relatedness may be produced by manually defining similarity between various wrap/box (packaging) shapes. For example, a cylindrical-shaped bottle will be similar to a jar and to an oval-shaped bottle. The similarity or distance may be stored as an adjacency matrix. But other presentations are applicable as well.
- Scale and rotation invariant descriptors can be used in an exemplary implementation to extract local information from the product image. These include, for example at least one of: BRIEF, BRISK, FREAK, HOG, ORB and KAZE. These descriptors characterize image information around key-points (e.g., features) extracted from the image captured by the imaging module. Key points are typically prominent points, for example, corners, and can be extracted by various methods such as Harris corner detection, SUSAN, Laplacian of Gaussian (LoG), FAST.
- the descriptor e.g., BRIEF, BRISK, FREAK, HOG, ORB and KAZE
- BRIEF e.g., BRIEF, BRISK, FREAK, HOG, ORB and KAZE
- the descriptor can be configured to extract and store information from an area around each key point.
- the descriptors may also be used as a local pattern-detector allowing to search for similar patterns in other images.
- the descriptors may be used as product characteristics for classifier retrieval.
- the extracted classifier will have to exhibit descriptors that have matches in the query image.
- one such descriptor is ORB, which is a binary-descriptor that allows fast matching by using Hamming distance.
- key-points are determined using features from accelerated segment test (FAST). Then a Harris corner measure (referring to a point whose local neighborhood stands in two dominant and different edge directions), is applied to find top N points.
- FAST does not compute the orientation and is rotation variant, but rather the intensity weighted centroid of the patch with located corner at center. The direction of the vector from this corner point to centroid gives the orientation.
- each characteristic can be used individually to retrieve relevant classifiers, in certain embodiments, multiple characteristics are used for the retrieval step, thereby improving retrieval accuracy and reducing the retrieved classifier set size considerably.
- classifiers are extracted based on multiple product characteristics is by concatenating (in other words, juxtaposing one bit field onto another) all characteristics to a single multi-dimensional vector containing all characteristics extracted from a single image or a product.
- the vector may be subjected to normalization and/or standardization.
- Another variant of presentation includes assigning weights to each characteristic—giving certain characteristics higher significance than others.
- each characteristic is used separately to retrieve a larger set of relevant classifiers followed by an intersecting step to select the set of classifiers which match multiple characteristics.
- Methods for extracting nearest neighbors (NN) or k-Nearest neighbors (kNN) from database can be used.
- the classifier database can be very large (typically more than 100k classifiers), which implies that retrieving the appropriate classifiers by comparing each character to the entire database may be time consuming and not appropriate for real-time product recognition as needed in the AIC. Therefore, in an exemplary implementation, approximate-nearest-neighbors (ANN) techniques are used.
- metric-trees e.g. Ball trees, VP trees
- LSH Locality sensitive hashing
- Other proximity-based search methods may be used interchangeably.
- the used metric-trees and LSH may also be utilized separately for each characteristic and then the results merged into a set of relevant classifiers as in Formula 1.
- FIG. 5 illustrates the characteristics vector that is extracted from a query product inserted into the AIC.
- the classifiers in the classifier data base are also associated with similar presentation.
- the vector can be a numerical representation of the entire set of characteristics obtained, for example by concatenation of numerical representation of extracted features) and can serve as the basis for the retrieval method described.
- the two-retrieval methods; vantage point (VP)-tree, and LSH are presented for illustrative purposes.
- the models used for creating the classifiers and retrieving the models used to retrieve the classifiers are determined using a preliminary neural network, which is configured to select the appropriate model based on initial product characteristic input obtained from the imaging module and other AIC sensors, as well as the location of the AIC in the 3D space of the store/warehouse. It is further noted, that the preliminary neural network used to select the model(s) or other CNN, RNN, used for classifiers creation (in other words, building the classifier) and/or retrieval, do not necessarily reside on the AIC, but are rather located on a backend management server, configured to communicate with a transceiver in the AIC through a communication network.
- images and other output generated by the AIC's sensor array are transmitted through the (wireless) communication network to the backend management server (BMS), where both classifier creation (building), and classifier retrieval/modification can take place.
- BMS backend management server
- imaging module means a unit that includes a plurality of built-in image and/or optic sensors and outputs electrical signals, which have been obtained through photoelectric conversion, as an image
- module refers to software, hardware, for example, a processor, or a combination thereof that is programmed with instructions for carrying an algorithm or method.
- the modules described herein may communicate through a wired connection, for example, a hard-wired connections, a local area network, or the modules may communicate wirelessly.
- the imaging module may comprise charge coupled devices (CCDs), a complimentary metal-oxide semiconductor (CMOS), an RGB-D camera, or a combination comprising one or more of the foregoing.
- CCDs charge coupled devices
- CMOS complimentary metal-oxide semiconductor
- RGB-D camera or a combination comprising one or more of the foregoing.
- the imaging module can comprise a digital frame camera, where the field of view (FOV) can be predetermined by, for example, the camera size and the distance from a point of interest in the cart.
- the cameras used in the imaging modules of the systems and methods disclosed can be a digital camera.
- the term “digital camera” refers in an exemplary implementation to a digital still camera, a digital video recorder that can capture a still image of an object and the like.
- the digital camera can comprise an image capturing unit or module, a capture controlling module, a processing unit (which can be the same or separate from the central processing module).
- the systems used herein can be computerized systems further comprising a central processing module; a display module; and a user interface module.
- the Display modules which can include display elements, which may include any type of element which acts as a display.
- a typical example is a Liquid Crystal Display (LCD).
- LCD for example, includes a transparent electrode plate arranged on each side of a liquid crystal.
- OLED displays and Bi-stable displays.
- New display technologies are also being developed constantly. Therefore, the term display should be interpreted widely and should not be associated with a single display technology.
- the display module may be mounted on a printed circuit board (PCB) of an electronic device, arranged within a protective housing and the display module is protected from damage by a glass or plastic plate arranged over the display element and attached to the housing.
- PCB printed circuit board
- user interface module broadly refers to any visual, graphical, tactile, audible, sensory, or other means of providing information to and/or receiving information from a user or other entity.
- GUI graphical user interface
- a set of instructions which enable presenting a graphical user interface (GUI) on a display module to a user for displaying and changing and or inputting data associated with a data object in data fields.
- the user interface module is capable of displaying any data that it reads from the imaging module.
- a computer program comprising program code means for carrying out the steps of the methods described herein, implementable in the systems provided, as well as a computer program product (e.g., a micro-controller) comprising program code means stored on a medium that can be read by a computer, such as a hard disk, CD-ROM, DVD, USB, SSD, memory stick, or a storage medium that can be accessed via a data network, such as the Internet or Intranet, when the computer program product is loaded in the main memory of a computer [or micro-controller] and is carried out by the computer [or micro controller].
- Memory device as used in the methods, programs and systems described herein can be any of various types of memory devices or storage devices.
- memory device is intended to encompass an installation medium, e.g., a CD-ROM, SSD, or tape device; a computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Rambus RAM, etc.; or a non-volatile memory such as a magnetic media, e.g., a hard drive, optical storage, or ROM, EPROM, FLASH, SSD, etc.
- the memory device may comprise other types of memory as well, or combinations thereof.
- the memory medium may be located in a first computer in which the programs are executed, and/or may be located in a second different computer [or micro controller] which connects to the first computer over a network, such as the Internet [or, they might be even not connected and information will be transferred using USB].
- the second computer may further provide program instructions to the first computer for execution.
- FIG. 1A represent the entire classification work flow.
- the product images are presented to the classifiers, each providing a classification.
- the combination of the classifications resulting from multiple classifiers are able to provide fine-grained prediction for the inserted product with high accuracy.
- the different predictions (or classifications) made from different classifiers are fed into a ‘Decision Module’ combining predictions to a single most-probable decision for the inserted product.
- the classifier database is formed by training multiple classifiers on products sharing some or all characteristics.
- the formed classifiers are associated with these set of characteristics for later retrieval.
- the product characteristics are stored in the characteristics database in one or more of the data structures mentioned above.
- the product images which are needed for maintenance of the system are also kept in the products images data base.
- FIG. 1B illustrates the algorithm flow during classifier training and represents the flow diagram for adding a product into the classifier system.
- the product characteristics are extracted and added to the characteristics database, followed by model training and model fine tuning needed for inclusion of a product in the system.
- FIG. 4 illustrate the system components in high level details.
- FIG. 3 shows the flow diagram of the process of adding or removing new product into or from the database.
- a product needs to be added to the database, its characteristics are extracted and used to identify other products sharing the similar characteristics.
- the amount of the products in this similarity set is used to determine which classifier is suitable for the classification of this set.
- the system may also choose to split the set to smaller plurality of sets and train them separately as individual classifiers. This split can lead to some form of classifiers overlapping where the same product may be recognized by different classifiers. This split is needed in order to avoid the classifier capacity limit.
- the decision which classifier model to use is based on the overall obtained accuracy. If the classifier fails to achieve the desired accuracy for its products, it can be split into several classifiers which are trained on smaller set of products, until the desired accuracy is obtained for all participating products. In particular cases, the split can lead to ‘binary classifiers’ i.e. having two classes in their outputs. In other particular cases, the system may choose, as a result of the training process used with the particular model, to fuse several similar products to be in one class, if this can provide improved accuracy in distinguishing a specific product versus a group of similar products.
- the addition of a new product can also affect other classifiers in the classifier database, which now may need to be able to cope with the new product and the changes in the similarity sets, thus requiring a fine tuning of the classifier parameters by applying additional training iterations which will include the new product images as the input. If a product needs to be removed from the database, the classifier(s) that contain this product among their output classes are updated to ignore this product. In one embodiment, the eliminated product class is regarded as ‘Don't Care’—the eliminated class is ignored.
- Another option to remove a class is to revise the number of outputs (i.e. classes) of the classifiers and fine tune the classifiers parameters by applying additional training (a.k.a. fine tuning). If too many products are removed, resulting some classifiers having too few classes, the system may also decide to fuse two classifiers to a single larger classifier, as long as the accuracy of the new larger classifier is maintained. Similar process can be used for additional product management in the cart, and likewise can be directed to at least one of splitting, and merging products. These processes will ostensibly change the number of classifiers and their magnitude.
- Creating a classifier refers for example to either initially training a classifier, or retraining (modifying) a classifier (e.g., as the result of model/machine learning of a deep feature).
- the term “classifier” refers, in an exemplary implementation , to either a binary classifier or a multiclass classifier.
- a binary classifier is a classifier that has two output categories, for example, two products or two characteristics, whereas a multiclass classifier has multiple categorical outputs and is used to classify multiple products or to classify multiple characteristics.
- products can be merged, forming a new product with its own set of classifiers that are either independent or derivative of the products making the merged product.
- merged product characteristics may be identical to the products making the merged product, except for one or two characteristic (and hence classifier), for example at least one of: weight, volume, and shape factor.
- these classifiers can then be added to the classifier database, while the initial and sub-product packages images can be added to the images database, as well as to the characteristics' (key/hashing function) database.
- the three-dimensional space namely the warehouse, store, etc.
- locator system used in an exemplary implementation to establish the relative location of the AIC by transmitting a signal between the AIC and an array of receiving devices having a known location.
- Providing the location of the AIC can be used by the processing module to limit the classifiers retrieved thus making the identification (and re-identification) more efficient, increasing both the selectivity and specificity of the classifiers, whether in the training stage or in the retrieval stage.
- a suitable locator system can be based on “ultrawideband” (UWB) signals, which are short pulses of radiofrequency energy.
- UWB ultrawideband
- the distances and/or bearings are measured by sending UWB pulses, typically on the order of one nanosecond long, between the AIC and the receiving device.
- the distance between an object and a receiving device can be calculated by measuring the times-of-flight (TOF) of the UWB pulses between the object and the receiving device.
- TOF times-of-flight
- Other location systems can be those utilizing BlueTooth beacon systems and RFID locator systems.
- an outwardly-sensing optical sensor can likewise be used to narrow the retrieval parameters for the classifiers, both in the training and re-identification stage.
- the 3D space can be outfitted with various markers or fiducials, which can be introduced as a classifier parameter, again, making the training more efficient and increasing the specificity and selectivity of the process.
- the AIC further comprises a transceiver in communication with the central processing module, the transceiver adapted to transmit and receive a signal configured to provide location of the cart, for example a UWB signal and the classifier produced initially, and retrieved during re-identification by the processor, is, in addition to other image-extracted features, location based.
- FIG. 1A is a flow diagram showing how product images 100 are used for characteristics extraction 101 - 106 such as color palette 101 ; box/wrap type & shape 102 ; weight 103 , key-words & logo 104 and scale and rotation invariant features 105 and Deep Feature 106 .
- the deep feature 106 may represent a sub-field of machine learning involving learning of a product characteristic representations provided from a deep-neural-network (DNN) trained to identify that product.
- DNN deep-neural-network
- a DNN can be used to provide product characteristics by collecting its response to a given input, such as a product's image, from at least one of its layers. The response of a DNN to a product image is hierarchical by nature.
- Deep Features typically corresponds to responses produced by deeper layers of a DNN model. Such ‘Deep Features’ are typically specific to the input product or to similar products, therefore can be used as a unique descriptors for the product, such descriptors can be used as characteristics for classifier retrieval and re-identification once the product is scanned.
- Images 100 may be processed in order to prepare them for classification.
- Image (pre-)processing may include multiple steps such as: cropping, background subtraction, denoising, lighting adjustment and others. Once a clean image is obtained, the product characteristics are extracted and used for retrieval of relevant classifiers that are needed to obtain an accurate decision on the inserted product.
- the product characteristics 107 are used to select relevant classifiers 108 from classifier data-base 440 (see e.g., FIG. 4 ).
- the input images 100 or a processed version of the input images 100 are provided to the set of relevant classifiers 108 which produce product classification (inference) 109
- a decision module 450 collects the classified results 109 from all retrieved classifiers 108 and makes a single decision 110 for the inserted product, thus identifying the product 111 .
- FIG. 1B shows the process of training a classifier with similar product characteristics.
- Images of a new product 112 undergo characteristics (feature) extraction 101 - 106 .
- Characteristics 101 - 106 are extracted using the original input images 112 (organic images) or a processed version of the input images (synthetic images) creating the product characteristics vector 113 .
- Product characteristics vector 113 from multiple products are used to form the product characteristic data-base 114 .
- the product characteristics are used to identify other products (species) who share similar characteristics 115 and create an updated product characteristic data-base 116 .
- the classifier 118 is trained on images from existing products' database 117 of products which share similar characteristics (genus).
- the training 118 allows the classifier to distinguish between the similar products (in other words, identify species within the genus).
- the new classifier 118 is added to the updated classifier data base 119 and the product-characteristics data-base 116 is updated with the new product characteristics 113 .
- FIG. 2 shows an illustration of the characteristics vector and its use for classifier retrieval.
- the extracted product characteristics 201 - 205 are provided as a separate numeric vector, or as keys, or concatenated to a single numeric vector presentation.
- Specialized data structures such as VP tree 212 or other metric/ball trees and/or Hashing functions 211 are used for obtaining an approximated nearest neighbor 220 .
- the neighbors are then used to retrieve the set or relevant classifiers 230 from the classifier database.
- FIG. 3 shows the flow diagram for insertion and/or removal of a product from the products database.
- the product characteristics extraction is applied to obtain a set of product characteristics 301 .
- the product characteristics 301 are utilized to identify products 302 that exhibit similar characteristics.
- the set of products 302 are used to identify a suitable classifier 303 based on the size and complexity of the selected product set 302 .
- the selected classifier is then trained 304 with images or processed version of the selected product set 302 .
- a validation module 305 test the performance of the new classifier and decides whether the classifier provides the intended accuracy.
- FIG. 4 is a schematic illustrating an exemplary implementation of the various modules used in implementing the methods described herein.
- plurality of imaging sensors 401 i (e.g. RGB-D cameras), some of which are directed toward the inside of the cart (not shown).
- the plurality of imaging sensors can be configured to capture an image of the product inserted into the cart, thus providing perspective allowing distinguishing both size and shape characteristics of the product, and together with the load cell module 402 , present the extracted features to acquisition and synchronization module 410 , which can form in certain embodiment, a part of imaging module (dashed trapezoid)
- Extracted features are input to a processing module 420 , which compares them with characteristics already stored in characteristic database 430 whereupon finding a corresponding characteristic set in characteristics database 430 , the set of identified characteristics serves either together as a single concatenated vector or each individually, as a key to retrieve a set of classifiers from classifier database 440 .
- the retrieved classifier, or set of classifiers are then input to data fusion and decision module 450 , where a determination is made regarding the confidence level in identifying the product, and if the confidence level is above a predetermined level, the product is identified 500 and optionally displayed on a cart display (not shown).
- Images captured by acquisition and synchronization module 410 can be transmitted to product image database 470 , and to classifier training module 460 , where and discrepancy between retrieved classifiers and product characteristics extracted is identified and based on extracted features, a new classifier is established, and the characteristic associated with the classifier is established as a retrieval key for that particular classifier.
- the captured image is then stored in product image database 470 , and the added classifier and added feature (characteristic) are stored in classifier database 440 and characteristic database 430 respectively.
- any image captured by the imaging module serves simultaneously as training, validation and test image.
- a system for automated, real-time product identification in a shopping cart of a store and/or warehouse having more than a thousand stock keeping items (SKIs), comprising: a cart having a front wall, a rear wall and two side walls forming an apically open container with a base; a load cell module operably coupled to the base of the cart; a plurality of imaging modules coupled to the cart, adapted to, at least one of image an item inserted into the cart, and image an area of interest outside the cart; a central processing module in communication with the load cell and the plurality of imaging modules, the central processing module comprising a processor and being in further communication with a non-volatile memory having thereon: a classifiers' database; a product characteristics database; and a processor readable media comprising a set of executable instruction, configured, when executed to cause the processor to retrieve a set of a plurality of classifiers, wherein each set of classifiers is configured to identify a single product inserted
- SKIs stock keeping items
- (RCNN), a Fast-RCNN, a Faster-RCNN, and a You Only Look Once (YOLO) neural network wherein (xii) the set of executable instructions, are configured, when executed, to cause the processor to assign a retrieval key to each classifier, wherein (xiii) the set of plurality of the classifiers identifying a single product are selected by applying a nearest-neighbors algorithm and approximated nearest neighbors (ANN) algorithm to the classifiers associated with the product independently; and selecting the classifiers selected by both algorithms, wherein (xiv) all extracted features are encoded numerically, wherein (xv) the classifiers' database is formed by training the plurality of classifiers on products sharing at least one characteristic, wherein formed classifiers are associated with these set of characteristics for later retrieval, wherein (xvi) the set of executable instructions, are configured, when executed, to cause the processor to associate each of the product characteristic to a classifier or a set of classifiers, wherein (xvii) the set
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Economics (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Development Economics (AREA)
- Databases & Information Systems (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Entrepreneurship & Innovation (AREA)
- Human Resources & Organizations (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Medical Informatics (AREA)
- Operations Research (AREA)
- Chemical & Material Sciences (AREA)
- Combustion & Propulsion (AREA)
Abstract
The disclosure relates to systems and methods for real-time detection of a very large number of items in a given constrained volume. Specifically, the disclosure relates to systems and methods for retrieving an optimized set of classifiers from a self-updating classifiers' database, configured to selectively and specifically identify products inserted into a cart in real time, from a database comprising a large number of stock-keeping items, whereby the inserted items' captured images serve simultaneously as training dataset, validation dataset and test dataset for the recognition/identification/re-identification of the product.
Description
- This patent application is a Continuation of commonly owned and pending PCT Application No. PCT/IL2019/061390, filed Dec. 19, 2019, claiming priority from US. Provisional Patent Application No. 62/782,377, filed Dec. 20, 2018, both which are incorporated herein by reference in their entirety.
- A portion of the disclosure herein below contains material that is subject to copyright protection. The copyright owner has no objection to the reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever
- The disclosure is directed to systems and methods for real-time detection of items in a given constrained volume. Specifically, the disclosure relates to systems and methods of classifier assignment and retrieval from database for large-scale product detection.
- The Artificial Intelligent Cart (AIC) is configured to automatically identify, in substantially real-time, inserted physical products- out of a significantly large number of different available products (stock keeping items) the inventory of a store and/or a warehouse. The AIC can comprise a plurality of imaging sensors and at least one load cell that are utilized for product recognition. To date, there is no available technology that can deal with the huge numbers of products typically available within a single retailer's or wholesaler's storeroom or shelves. Current state-of-the-art technologies, such as large-scale deep neural networks provide insufficient recognition accuracy when the number of products that are expected to be recognized is increased substantially.
- This limitation, which stems in part from lack of sufficient classifiers to successfully recognize the large numbers of products (i.e. classes), is known as the classifier-capacity problem where the increase in the number of items used for training leads to increased error in the test dataset. Another challenge is that the product recognition task is typically a ‘fine-grained recognition’ (in other words, distinguishing large number species in a genus), since products may have almost identical wrap/box besides a small graphic element distinguishing between them (e.g., weight, % fat, ageing time, etc.).
- Both challenges—classifier capacity and fine-grained recognition are still areas of active research. These and other shortcoming of the current state of affairs is addressed in the following description.
- Upon product insertion to the AIC, a set of computer-based algorithms need to be applied in order to correctly identify the inserted product among thousands of products available in a typical retail store, such as a supermarket. Even with current state-of-the-art technologies, such as deep neural networks and especially convolutional-neural-networks (CNN), the achievable accuracy is limited when using a single classifier to classify the thousands of products in a single retail store. This limitation of classifiers to successfully recognize large numbers of products is known in literature as the ‘classifier-capacity problem’. Another challenge in large scale product recognition, is that a large number of products may have almost identical wrap/box besides a small graphic element distinguishing between them. This is typical for product-series. Capturing such small differences among different classes is known in literature as ‘fine-grained’ recognition. Both challenges can be resolved by the characteristic-based classifier training and retrieval method described here
- Accordingly and in an exemplary implementation , provided herein is a system for automated product identification in a shopping cart, comprising: a cart having a front wall, a rear wall and two side walls forming an apically open container with a base; a load cell module operably coupled to the base of the cart; a plurality of imaging modules coupled to the cart, adapted to identify, at least one of image an item inserted into the cart, and image an area of interest outside the cart; a central processing module in communication with the load cell and the plurality of imaging modules, the central processing module comprising a processor and being in further communication with a non-volatile memory having thereon: a classifiers' database; a product characteristics database; and a processor readable media comprising a set of executable instruction, configured, when executed to cause the processor to retrieve a set of a plurality of classifiers, wherein each set of classifiers is configured to identify a single product inserted into the cart.
- In another exemplary implementation , provided herein is a method of recognizing and selectively and specifically identifying a product among a vast plurality of products, implemented in a system comprising: a cart having a front wall, a rear wall and two side walls forming an apically open container with a base; a load cell module operably coupled to the base of the cart; a plurality of imaging modules coupled to the cart, adapted to identify, at least one of image an item inserted into the cart, and image an area of interest outside the cart; a central processing module in communication with the load cell and the plurality of imaging modules, the central processing module comprising a processor and being in further communication with a non-volatile memory having thereon: a classifiers' database; a product characteristics database; and a processor readable media comprising a set of executable instruction, configured, when executed to cause the processor to retrieve a set of a plurality of classifiers, wherein each set of classifiers is configured to identify a single product inserted into the cart, the method comprising: inserting an item into the cart; using the imaging module, capturing an image of the inserted item; extracting predetermined number of features (characteristics) from the captured image; retrieving a set of classifiers associated with at least some of the extracted features from the classifier database, wherein the set of classifier is associated with a previously identified product.
- For a better understanding of the systems and methods of classifier assignment and retrieval from database for large-scale product detection, with regard to the embodiments thereof, reference is made to the accompanying examples and figures, in which:
-
FIG. 1A —Illustrates the characteristic extraction from AIC images, also showing the inference steps in the AIC, provided for illustrative purposes whileFIG. 1B illustrates a flow diagrams of the characteristics extraction during the training process, including the classifier database update provided for illustrative purposes; -
FIG. 2 Illustrates the structure of the unified characteristics vector and its use for classifier retrieval; -
FIG. 3 . shows a flow diagram of the classifier training and management when new products is inserted or when a product is removed; and -
FIG. 4 , illustrates a schematic of the system components associated with the execution of the methods. - The disclosure provides embodiments of systems and methods of classifier determination, assignment and retrieval from database for fine-grained, large-number—product detection.
- In order to address the above-identified challenges, the product recognition task is split among multiple classifiers—each classifier is trained to recognize a small portion of products sharing similar characteristics. Upon product insertion to the AIC, the product characteristics are extracted and identified by the AIC. The extracted product characteristics are then utilized as keys to extract a set of relevant classifiers that are needed to produce a single correct recognition. The relevant classifiers are retrieved from a large classifier database that is typically created to cover the entire available products of a retail store or chain of stores. By using such product-characteristics based classifier retrieval method described here, the number of products that the AIC can recognize is substantially increased and the total detection accuracy is substantially improved.
- In order to achieve high recognition accuracies for large number of products, (e.g. Costco (US˜4,000 SKUs), BFs (US˜7,200 SKUs), Sam's Club (US˜6500 SKUs) multiple classifiers are utilized for each product recognition. There may be one or more classifiers needed for a single recognition and a specific product may be recognized by a single or multiple classifier. The different classifiers in the classifier database are trained on products sharing similar characteristics. For each inserted product, a set of relevant classifiers are selected from the classifier-database.
- The retrieved classifiers are then utilized to recognize the inserted products. The classifier retrieval method is based on extraction of product characteristics from a set of cameras and sensors on the AIC cart. The set of extracted characteristics are used as keys to retrieve a set of relevant classifiers from the database.
- In the field of data-structures, there are multiple common methods to retrieve data from large databases but none of them couple product characteristics to classifiers or other recognition models as disclosed herein. A notable example of such data-structures and retrieval methods includes using various tree-based data-structures such as a ‘k-d tree’, allowing fast search and nearest neighbor approximation. Once the set of product characteristics are extracted and used for classifier retrieval, it is expected that the retrieved classifiers were trained on products that share similar characteristics to those of the inserted product, and therefore are tuned to differentiate between similar products.
- Product characteristics can be, for example:
-
- a. box/wrap colors
- b. box/wrap shape (e.g. box, bottle, jar)
- c. Box/wrap size and weight.
- d. Key-words located on the box/wrap (e.g. ‘popcorn’, ‘cookies’, etc.,)
- e. Brand/product logos, and other graphic elements in a specific brand/product.
- f. Scale and rotation invariant features (e.g. ORB, BRIEF, KAZE)
- g. Features obtained from a pre-trained neural networks, such as ‘deep features’
- An example for pseudocode that can be employed which incorporates a number of the characteristics of the product is presented below:
-
AIC Classifier Pseudo-code Inputs: • Product images {images}i Product weight [gr], and product weights database ‘weightDB’ Logo detection CNN (Convolutional Neural Network) - ‘ LogoNet’ Shape detection CNN - ‘ShapeNet’ Text detection CNN - ‘TextDetectorNet’ Text recognition network (OCR) - ‘OCRNet’ Word embedding network - ‘Word2VecNet’ Nearest neighbor retrieval thresholds for each characteristic - ‘THRES_<name>’ Outputs: A single product decision. For each image[i] in {images}i { prod_hist = ObtainHistogram(image) prod_color_points = ProduceColorPoints(image) prod_logo = ExtractProductLogo(image, LogoNet) prod_shape = ExtractProductShape(image, ShapeNet) prod_words = ExtractWordsFromImage(image, TextDetectorNet, OCRNet) # Convert words vector to numerical presentation words_vec = ProduceWordEmbedding(prod_words, Word2VecNet) # Retrieve classifiers by weight classifier_list0 = FindNearestNeighbors(prod_weight, weigthDB ,THRES_weight) # Retrieve classifiers by color histogram classifier_list1 = FindNearestNeighbors(prod_hist, pVPtree_EMD, THRES_EMD) # Retrieve classifiers by color points classifier_list2 = FindNearestNeighbors(prod_color_points, pVPtree_Hausdorff, THRES_COLOR) # Retrieve classifiers by product logos classifier_list3 = FindNearestNeighbors(prod_logo, pVPtree_Logo, THRES_LOGO) # Retrieve classifiers by product's shape classifier_list4 = FindNearestNeighbors(prod_shape, pVPtree_shape, THRES_SHAPE) # Retrieve classifiers by product's key_words classifier_list5 = FindNearestNeighbors(words_vec, pVPtree_words, THRES_WORDS) # Intersect classifiers to a smaller set classifier_set[i] = Intersect(classifier_list0, classifier_list1, classifier_list2, classifier_list3, classifier_list4, classifier_list5) } # Apply images to the selected classifiers from previous step For each image[i] in {images}i { For each classifier[k] in classifier_set[i]{ class_ind[i][k],confidence[i][k] = Classify(image[i], classifier[k]) # Merge classifications to a single product identification identified_prod = MakeProductDecision(class_ind, confidences) - One of the most notable characteristics is the product's wrap colors. By extracting the color-palette of an inserted product, it may be used as a set of keys for retrieving classifiers that were trained on products that exhibit a similar color-palette.
- A typical method to represent a product color-palette may be a color Histogram applied to an image or a partial image of the product. The color histogram may be applied to RGB images or interchangeably to other known color-space representations such as hue saturation value (HSV), CIE-LAB, CIE-XYZ, CIE-LCH, and YCbCr to produce such histogram of a product image. The image can undergo a preprocessing step to reduce the number of colors by aggregating similar colors by one of the methods for color-quantization such as K-means, minmax, fuzzy c-means, balanced iterative reducing and clustering using hierarchies (BIRCH), median-cut, center-cut, rwm-cut, octree, variance-based, binary splitting, greedy orthogonal bipartitioning, optimal principal multilevel quantizer, and the like or their combination. Another option to reduce the presentation size of a color palette is the use of wider histogram bins and/or self-organizing maps.
- In an exemplary implementation , and in order to achieve high degree of specificity and selectivity in classifying and recognizing (and/or re-recognizing) the product, the color quantization (in other words, reduction, or averaging) is image-dependent whereby a custom (adaptive) palette is determined based on the color distribution of the image.
- An example of a pseudocode for performing K-means quantization is as follows:
-
input : X = {x1, x2, . . . , xN} ∈ D (N × D input data set) output: C = {c1, c2, . . . , cK} ∈ D (K cluster centers) Select a random subset C of X as the initial set of cluster centers: while termination criterion is not met do | for (i = 1; i ≤ N; i = i + 1) do | | Assign xi to the nearest cluster: | | | end | Recalculate the cluster centers; | for (k = 1; k ≤ K; k = k + 1) do | | Cluster Sk contains the set of points xi that are nearest to the center ck; | | Sk = {xi |m[i] = k}; | | Calculate the new center ck as the mean of the points that belong to Sk; | | | end end - Other preprocessing steps may also include steps to filter background information that may interfere with the extraction of the product's color palette (i.e. background subtraction), reduce artifacts such as reflections and shadows, image denoising, apply contrast enhancements, and other common image processing techniques intended for image improvement thereby improving the accuracy (in other words, the specificity and selectivity), of the characteristics' extraction. Another possible presentation of product's colors is as a form of point-cloud where each color in the palette is presented as a point in the 3D color-space. This presentation lacks the color-occurrence-rate as in color histograms but may represent the existence of the product colors.
- There are several metrics that may be used with color histograms for re-identification purposes. These can be, for example the Earth Movers Distance (EMD) also known as the Wasserstein-Distance, an accurate metric since, being a cross-bin distance, it is robust to color shifts due to lighting and changes in image-acquisition conditions (e.g., focal length.—. Another example of cross bin distance measure that can be used to reidentify captured color histogram can be, for example, Mahalanobis, measuring the distance between an observed point and a distribution. Other metrics can be the Kullback- Leibler (KL)-Divergence, Bhattacharyya distance, and Chi-square (X2) distance (all bin-to-bin distance metrics).
- For color palette presented as color cloud, a suitable distance is the Hausdorff distance. This is a measure for similarity between two points clouds. Hausdorff provides accurate similarity between a query image typically exhibiting a single or few facets of the product's wrap/box (packaging) and the product's color-cloud based presentations that should include all facets of the product wrap/box.
- The products box/wrap shape designator (referring to a string, number or any other means that can be used to designate an extracted feature), is another characteristic that may be used for classifier retrieval. The box/wrap (packaging) shape may be extracted by various computer-vision based techniques. Such techniques typically include feature extraction of the product boundaries by processing edges presented in the images and extracting edge features. For example, the shape may be presented by extracting contours of the product, or by estimation applied to the products silhouette.
- Other suitable methods utilize histogram of oriented gradients (HOG) descriptors features and deep-neural-networks (DNN) for shape extraction. Examples of shape parameters that can be used alone, or in combination as classifier(s) in the shape-based image retrieval can be at least one of Center of gravity, Axis of least inertia, Digital bending energy, Eccentricity, Circularity ratio, Elliptic variance, Rectangularity, Convexity, Solidity, Euler number, Profiles, Hole area ratio, and the like. In other circumstances, it may be possible to associate and retrieve relevant classifiers based on product's wrap/box shape or shape descriptor, captured as, for example—a key word on the wrap/box/package. When the shape or shape-descriptor is embedded in some space on the package, and the shapes descriptors are semantically-related (e.g a “Jar” is “closer” to a “bottle” than to a paper box, while “cup” and “bowl” may be closer than “bowl” and “can”). This relatedness can be used in certain embodiment, to increase the selectivity and specificity of the classification and re-identification.
- The product's weight and physical dimensions are other product characteristics that are used as a key-word for classifier retrieval. Weight is achieved by a load cell module (referring to any device that detects a physical force such as the weight of a load and generates a corresponding electrical signal). The product physical size (i.e. physical dimensions) can be estimated based on the use of a plurality of imaging modules with a given and known location (in other words, pose), with respect to the product. Another option for size detection is by utilizing—imaging modules such as RGB-D where the typical RGB color information is accompanied by depth information. The size and weight combination may be used to differentiate the same products that only differ in box size and products that are sold in variable batch sizes, such as some dairy products and fruits—thus providing the required level of granularity. In configurations using RGBD the shape of the product can also be extracted by utilizing the depth information from the RGBD camera.
- Products logos and trademarks are both examples of characteristics that can be used for classifier retrieval. A given product can exhibit multiple logos and trademarks on its package. For example, a products box/wrap can exhibit the company/producer logo, a products series logo (if the products is a part of a series), the specific products logo, along with other logos and trademarks. The products logos are extracted by various techniques. Notable techniques utilize image descriptors such as HOG, oriented FAST rotated BRIEF (ORB), KAZE, that are used to produce local image description, allowing the detection of specific logo-pattern in a given product image.
- Other approaches can be utilizing deep neural networks for logo detection and localization in an image. Example for such networks include single-shot MultiBox detector (SSD), Regional convolutional neural network (RCNN), Fast-RCNN, Faster-RCNN, you only look once (YOLO), that were trained to detect and locate desired logos. The available logos are stored in the logo database where they are associated with a numeric value allowing their retrieval. When a products exhibits multiple logos, all logos for that product may be extracted and used to characterize the relevant classifiers.
- Specific key-words on the product may be used as key-words for classifier retrieval. If a specific key word can be identified in a product image, it may be used as a key to access the relevant classifiers in the classifiers database. For example, the word ‘Apple’ may appear on various products containing apples as one of their ingredients. There are number of methods to detect text in images and to recognize the words through optical character recognition (OCR). In an exemplary implementation , DNN's are used for both text-localization and recognition in a given image.
- Typically, the quality of the detected text (i.e. a confidence measure) may be provided, allowing to use words that were detected and recognized with high confidence. Examples for text detection in natural images are EAST, PSENet, FOTS, Text-Snakes for text detection. Examples of a text recognition algorithms include ASTER and ESIR. Products typically exhibit an abundance of words on their box/wrap. Some are not useful for the classifier retrieval task. As an example of such words are those presented in nutritional values table (e.g. carbohydrates, fat etc . . . ). The system can be configured to filter out such words that have a too common appearance and discards them when they are processed for producing keys for classifier retrieval.
- Another type of product characteristics can be provided from a Neural networks such as Deep Neural Network (DNN). A specific example for this can be utilizing a Convolutonal Neural Network (CNN), such as Inception, Xception or any other, to extract image related features. In literature, such features produced by a DNN are termed ‘Deep Features’. Such ‘Deep Features’ can be extracted a single or multiple network layers. Combining deep features from multiple DNN' s is also an option to improve the sensitivity and specificity for a given product. The extracted ‘Deep Features’ are used as an image descriptor to characterize a product and to provide probable-candidates for the product and direct the retrieval of relevant classifiers.
- In addition, for each site (store, warehouse, etc.), the extracted text, key-words and OCR-retrieved texts can be further used to develop a site-specific ontology, which can then be used to assign relatedness metric. That metric can then be used in associating and retrieving relevant classifiers when the classifiers are retrieved by matching, and/or mapping representative words of the inserted products, extracted by the imaging module, to representative words associated to the classifiers in the classifiers database.
- After the set of product characteristics was formed it is used to produce keys for classifiers retrieval. The classifiers can be selected by a nearest-neighbors (NN) or approximated nearest neighbors (ANN) algorithm. There are multiple variants that can be considered for classifier selection form the database. One is to apply NN or ANN to each type of product characteristics separately and intersecting the results to obtain a smaller set that is more appropriate for on-line processing. As an example, for a product exhibiting a color palette of red and blue colors, weighing 550 [gr], in a bottle-shaped case and with an identified key-word ‘beer’, each of those characteristics is used separately to retrieve relevant models—The color palette is used to retrieve classifiers that were trained on products with similar blue and red color pallets, the bottle shape is used to retrieve a set of classifiers that were trained on various bottles' shapes. The key word—‘beer’ will be used to filter products that exhibit the same key-word ‘beer’ on their package. Moreover, using hash functions, it is possible to both classify and rapidly retrieve text strings such as “light beer”.
- Intersecting (in other words, overlapping in the classifier selected using NN and ANN independently), the classifiers retrieved based on each characteristic will result a small set of classifiers which can identify the product with higher accuracy despite having thousands of products available in the product database.
-
{C}={C(Color)}∩{C(Weight)}∩{C(OCR)}∩{C(Shape)}∩{C(ORB)}∩{C(Deep)} - Formula 1: Combining classifiers by characteristics
- The formula above formulates this approach. For example, the formula is presented with 6 different characteristics intersected (in other words, selected by both NN and ANN). However, the described approach may be extended to include other characteristics (e.g., key word, graphics, and the like), by adding additional elements as additional intersections in the formula.
- Other variations of the formula utilizing one or more of the characteristics for classifier retrieval and selection can be interchangeably used without altering the scope described and claimed herein. In similarity-based retrieval, the various characteristics can be presented numerically. The conversion to a numeric presentation can be made in a straight-forward fashion for hierarchical characteristics that have numerical values by nature, such as weight, volume, and color or color-histogram. Other characteristics, for example, categorical characteristics may need to be converted to a numeric presentation prior to using them in a classifier search and retrieval method.
- Text-based key words extracted by OCR are converted in an exemplary implementation to numeric presentation. Various methods can be used to represent words as numeric vector (a.k.a word embedding). For example, embedding words into dense vectors (where each word is represented by a dense vector, a point in a vector space, such that the dimension of the semantic representation d is usually much smaller than the size of the vocabulary (d<<V, the vocabulary size), and all dimensions contain real-valued numbers, which can be normalized between -1 and 1) and others to sparse presentations. Additional methods can use neural networks for this embedding (where word is represented by a one-hot representation, a unique symbolic ID, such that the dimension of the symbolic representation for each word is equal to the size of the vocabulary V (number of words belonging to the reference text), where all but one dimension are equal to zero, and one dimension is set to be equal to one. An example is laccard distance' (measure the fraction of words any two sets have in common) can be used to compare lists of key-words. Other similarity metrics, such as, for example Cosine similarity, Spearman's rank correlation coefficient, or Pearson X2 (Chi-sqaure) test-based X2 distance may be used additionally or alternatively. Another option for words is to use the character string as hashing keys to retrieve relevant classifiers who match a word or a set of words (for example “gluten free”).
- Shape of product may also be encoded to a numeric set of values. Accordingly, the semantic relation between different box/wrap shapes may be defined. The semantic relation between various shapes is needed since shape appearance may also be dependent on the imaging direction. For example, a circular shaped product packaging may be perceived as either a bottle or a jar, depending on the direction of image capturing. Since the number of available box/wrap/package shapes' is typically smaller, the shape similarity or relatedness may be produced by manually defining similarity between various wrap/box (packaging) shapes. For example, a cylindrical-shaped bottle will be similar to a jar and to an oval-shaped bottle. The similarity or distance may be stored as an adjacency matrix. But other presentations are applicable as well.
- Scale and rotation invariant descriptors can be used in an exemplary implementation to extract local information from the product image. These include, for example at least one of: BRIEF, BRISK, FREAK, HOG, ORB and KAZE. These descriptors characterize image information around key-points (e.g., features) extracted from the image captured by the imaging module. Key points are typically prominent points, for example, corners, and can be extracted by various methods such as Harris corner detection, SUSAN, Laplacian of Gaussian (LoG), FAST. The descriptor (e.g., BRIEF, BRISK, FREAK, HOG, ORB and KAZE) can be configured to extract and store information from an area around each key point. The descriptors may also be used as a local pattern-detector allowing to search for similar patterns in other images. Here, the descriptors may be used as product characteristics for classifier retrieval. The extracted classifier will have to exhibit descriptors that have matches in the query image. In an exemplary implementation , one such descriptor is ORB, which is a binary-descriptor that allows fast matching by using Hamming distance. Using ORB, key-points are determined using features from accelerated segment test (FAST). Then a Harris corner measure (referring to a point whose local neighborhood stands in two dominant and different edge directions), is applied to find top N points. FAST does not compute the orientation and is rotation variant, but rather the intensity weighted centroid of the patch with located corner at center. The direction of the vector from this corner point to centroid gives the orientation.
- Moments can be computed to improve the rotation invariance. The Binary Robust Independent Elementary Features' descriptor (BRIEF) typically performs poorly if there is an in-plane rotation. In ORB, a rotation matrix is computed using the orientation of patch and then the BRIEF descriptors are steered according to the computed orientation. Other feature descriptors can be also used interchangeably.
- Although each characteristic can be used individually to retrieve relevant classifiers, in certain embodiments, multiple characteristics are used for the retrieval step, thereby improving retrieval accuracy and reducing the retrieved classifier set size considerably. In an exemplary implementation , classifiers are extracted based on multiple product characteristics is by concatenating (in other words, juxtaposing one bit field onto another) all characteristics to a single multi-dimensional vector containing all characteristics extracted from a single image or a product. The vector may be subjected to normalization and/or standardization. Another variant of presentation includes assigning weights to each characteristic—giving certain characteristics higher significance than others.
- In another exemplary implementation, each characteristic is used separately to retrieve a larger set of relevant classifiers followed by an intersecting step to select the set of classifiers which match multiple characteristics. Methods for extracting nearest neighbors (NN) or k-Nearest neighbors (kNN) from database can be used. The classifier database can be very large (typically more than 100k classifiers), which implies that retrieving the appropriate classifiers by comparing each character to the entire database may be time consuming and not appropriate for real-time product recognition as needed in the AIC. Therefore, in an exemplary implementation, approximate-nearest-neighbors (ANN) techniques are used.
- In an exemplary implementation , metric-trees (e.g. Ball trees, VP trees) and Locality sensitive hashing (LSH) are used. Other proximity-based search methods may be used interchangeably. The used metric-trees and LSH may also be utilized separately for each characteristic and then the results merged into a set of relevant classifiers as in
Formula 1.FIG. 5 illustrates the characteristics vector that is extracted from a query product inserted into the AIC. The classifiers in the classifier data base are also associated with similar presentation. The vector can be a numerical representation of the entire set of characteristics obtained, for example by concatenation of numerical representation of extracted features) and can serve as the basis for the retrieval method described. The two-retrieval methods; vantage point (VP)-tree, and LSH are presented for illustrative purposes. - In certain examples, the models used for creating the classifiers and retrieving the models used to retrieve the classifiers are determined using a preliminary neural network, which is configured to select the appropriate model based on initial product characteristic input obtained from the imaging module and other AIC sensors, as well as the location of the AIC in the 3D space of the store/warehouse. It is further noted, that the preliminary neural network used to select the model(s) or other CNN, RNN, used for classifiers creation (in other words, building the classifier) and/or retrieval, do not necessarily reside on the AIC, but are rather located on a backend management server, configured to communicate with a transceiver in the AIC through a communication network. Therefore, images and other output generated by the AIC's sensor array (e.g., load cell, location sensors, etc.) are transmitted through the (wireless) communication network to the backend management server (BMS), where both classifier creation (building), and classifier retrieval/modification can take place.
- The various appearances of “one example,” “an exemplary implementation ” or “certain circumstances” do not necessarily all refer to the same implementation or operational configuration. Although various features of the invention may be described in the context of a single example or implementation, the features may also be provided separately or in any suitable combination. Conversely, although the invention may be described herein in the context of separate embodiments for clarity, the invention may also be implemented in a single embodiment. Also, reference in the specification to “an exemplary implementation ”, “some embodiments” or “other embodiments” means that a particular feature, structure, step, operation, application, or characteristic described in connection with the examples is included in at least one implementation, but not necessarily in all. It is understood that the phraseology and terminology employed herein is not to be construed as limiting and are provided as background or examples useful for understanding the invention.
- It is noted that the term “imaging module” as used herein means a unit that includes a plurality of built-in image and/or optic sensors and outputs electrical signals, which have been obtained through photoelectric conversion, as an image, while the term “module” refers to software, hardware, for example, a processor, or a combination thereof that is programmed with instructions for carrying an algorithm or method. The modules described herein may communicate through a wired connection, for example, a hard-wired connections, a local area network, or the modules may communicate wirelessly. The imaging module may comprise charge coupled devices (CCDs), a complimentary metal-oxide semiconductor (CMOS), an RGB-D camera, or a combination comprising one or more of the foregoing. If static images are required, the imaging module can comprise a digital frame camera, where the field of view (FOV) can be predetermined by, for example, the camera size and the distance from a point of interest in the cart. The cameras used in the imaging modules of the systems and methods disclosed, can be a digital camera. The term “digital camera” refers in an exemplary implementation to a digital still camera, a digital video recorder that can capture a still image of an object and the like. The digital camera can comprise an image capturing unit or module, a capture controlling module, a processing unit (which can be the same or separate from the central processing module). The systems used herein can be computerized systems further comprising a central processing module; a display module; and a user interface module.
- The Display modules, which can include display elements, which may include any type of element which acts as a display. A typical example is a Liquid Crystal Display (LCD). LCD for example, includes a transparent electrode plate arranged on each side of a liquid crystal. There are however, many other forms of displays, for example OLED displays and Bi-stable displays. New display technologies are also being developed constantly. Therefore, the term display should be interpreted widely and should not be associated with a single display technology. Also, the display module may be mounted on a printed circuit board (PCB) of an electronic device, arranged within a protective housing and the display module is protected from damage by a glass or plastic plate arranged over the display element and attached to the housing.
- Additionally, “user interface module” broadly refers to any visual, graphical, tactile, audible, sensory, or other means of providing information to and/or receiving information from a user or other entity. For example, a set of instructions which enable presenting a graphical user interface (GUI) on a display module to a user for displaying and changing and or inputting data associated with a data object in data fields. In an exemplary implementation, the user interface module is capable of displaying any data that it reads from the imaging module.
- In addition, the term ‘module’, as used herein, means, but is not limited to, a software or hardware component, such as a Field Programmable Gate-Array (FPGA) or Application-Specific Integrated Circuit (ASIC), which performs certain tasks. A module may advantageously be configured to reside on an addressable storage medium and configured to execute on one or more processors. Thus, a module may include, by way of example, components, such as software components, object-oriented software components, class components and task components, processes, functions, attributes, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. The functionality provided for in the components and modules may be combined into fewer components and modules or further separated into additional components and modules.
- As indicated, provided herein is a computer program, comprising program code means for carrying out the steps of the methods described herein, implementable in the systems provided, as well as a computer program product (e.g., a micro-controller) comprising program code means stored on a medium that can be read by a computer, such as a hard disk, CD-ROM, DVD, USB, SSD, memory stick, or a storage medium that can be accessed via a data network, such as the Internet or Intranet, when the computer program product is loaded in the main memory of a computer [or micro-controller] and is carried out by the computer [or micro controller]. Memory device as used in the methods, programs and systems described herein can be any of various types of memory devices or storage devices.
- The term “memory device” is intended to encompass an installation medium, e.g., a CD-ROM, SSD, or tape device; a computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Rambus RAM, etc.; or a non-volatile memory such as a magnetic media, e.g., a hard drive, optical storage, or ROM, EPROM, FLASH, SSD, etc. The memory device may comprise other types of memory as well, or combinations thereof. In addition, the memory medium may be located in a first computer in which the programs are executed, and/or may be located in a second different computer [or micro controller] which connects to the first computer over a network, such as the Internet [or, they might be even not connected and information will be transferred using USB]. In the latter instance, the second computer may further provide program instructions to the first computer for execution. The building blocks of this method shall become more perceptible when presented in the drawings of the flowcharts.
-
FIG. 1A represent the entire classification work flow. After using the product characteristics to extract a set of relevant classifiers, the product images are presented to the classifiers, each providing a classification. The combination of the classifications resulting from multiple classifiers are able to provide fine-grained prediction for the inserted product with high accuracy. The different predictions (or classifications) made from different classifiers are fed into a ‘Decision Module’ combining predictions to a single most-probable decision for the inserted product. The classifier database is formed by training multiple classifiers on products sharing some or all characteristics. The formed classifiers are associated with these set of characteristics for later retrieval. The product characteristics are stored in the characteristics database in one or more of the data structures mentioned above. The product images which are needed for maintenance of the system are also kept in the products images data base. -
FIG. 1B , illustrates the algorithm flow during classifier training and represents the flow diagram for adding a product into the classifier system. The product characteristics are extracted and added to the characteristics database, followed by model training and model fine tuning needed for inclusion of a product in the system. -
FIG. 4 illustrate the system components in high level details. -
FIG. 3 shows the flow diagram of the process of adding or removing new product into or from the database. When a product needs to be added to the database, its characteristics are extracted and used to identify other products sharing the similar characteristics. The amount of the products in this similarity set is used to determine which classifier is suitable for the classification of this set. The system may also choose to split the set to smaller plurality of sets and train them separately as individual classifiers. This split can lead to some form of classifiers overlapping where the same product may be recognized by different classifiers. This split is needed in order to avoid the classifier capacity limit. - The decision which classifier model to use, is based on the overall obtained accuracy. If the classifier fails to achieve the desired accuracy for its products, it can be split into several classifiers which are trained on smaller set of products, until the desired accuracy is obtained for all participating products. In particular cases, the split can lead to ‘binary classifiers’ i.e. having two classes in their outputs. In other particular cases, the system may choose, as a result of the training process used with the particular model, to fuse several similar products to be in one class, if this can provide improved accuracy in distinguishing a specific product versus a group of similar products.
- The addition of a new product can also affect other classifiers in the classifier database, which now may need to be able to cope with the new product and the changes in the similarity sets, thus requiring a fine tuning of the classifier parameters by applying additional training iterations which will include the new product images as the input. If a product needs to be removed from the database, the classifier(s) that contain this product among their output classes are updated to ignore this product. In one embodiment, the eliminated product class is regarded as ‘Don't Care’—the eliminated class is ignored.
- Another option to remove a class is to revise the number of outputs (i.e. classes) of the classifiers and fine tune the classifiers parameters by applying additional training (a.k.a. fine tuning). If too many products are removed, resulting some classifiers having too few classes, the system may also decide to fuse two classifiers to a single larger classifier, as long as the accuracy of the new larger classifier is maintained. Similar process can be used for additional product management in the cart, and likewise can be directed to at least one of splitting, and merging products. These processes will ostensibly change the number of classifiers and their magnitude.
- For example, certain egg cartons are sold with scored packaging, and can be split. In some embodiments, an outward looking imaging sensor can initially identify certain product characteristic prior to being inserted into the AIC, however, when inserted to the AIC, the user split the package to two or more sub-packages. In these circumstances, the imaging system can extract the new package and create a classifier that is specific to the initial product, indicating it is “splittable” (or, in other words, capable of being retrieved), and a classifier for each of the sub-parts. These classifiers can then be trained to regard the split product independently from the original product, then added to the classifier database, while the initial and sub-product packages images can be added to the images database, as well as to the characteristics' (key/hashing function) database. Creating a classifier (and the classifier database) refers for example to either initially training a classifier, or retraining (modifying) a classifier (e.g., as the result of model/machine learning of a deep feature). The term “classifier” refers, in an exemplary implementation , to either a binary classifier or a multiclass classifier. A binary classifier is a classifier that has two output categories, for example, two products or two characteristics, whereas a multiclass classifier has multiple categorical outputs and is used to classify multiple products or to classify multiple characteristics.
- Similarly, products can be merged, forming a new product with its own set of classifiers that are either independent or derivative of the products making the merged product. It is noted that merged product characteristics may be identical to the products making the merged product, except for one or two characteristic (and hence classifier), for example at least one of: weight, volume, and shape factor. Here too, these classifiers can then be added to the classifier database, while the initial and sub-product packages images can be added to the images database, as well as to the characteristics' (key/hashing function) database.
- In the current embodiment various types of convolutional neural networks are used (CNN's). To date, CNN's produce superior results to other commonly used classifiers such as SVM's, and Decision trees. The selection of the type of CNN depends on the number of the products (i.e. classes) it is intended to classify. If a set of characteristics are relatively common, i.e. many products share these characteristics, larger types of CNN' s can be selected for use (e.g. Inception, Xception, VGG). For less frequent characteristics smaller CNN's may be used without loss of accuracy. Despite using CNN's, this methodology is interchangeably applicable to any type of classifiers.
- In addition, the three-dimensional space, namely the warehouse, store, etc., can be further outfitted with locator system, used in an exemplary implementation to establish the relative location of the AIC by transmitting a signal between the AIC and an array of receiving devices having a known location. Providing the location of the AIC can be used by the processing module to limit the classifiers retrieved thus making the identification (and re-identification) more efficient, increasing both the selectivity and specificity of the classifiers, whether in the training stage or in the retrieval stage.
- For example, a suitable locator system can be based on “ultrawideband” (UWB) signals, which are short pulses of radiofrequency energy. In a location system using UWB technology, the distances and/or bearings are measured by sending UWB pulses, typically on the order of one nanosecond long, between the AIC and the receiving device. The distance between an object and a receiving device can be calculated by measuring the times-of-flight (TOF) of the UWB pulses between the object and the receiving device. UWB signals are assumed to travel at the speed of light, therefore the distance between the object and the receiving device can be calculated from the time taken by a UWB pulse to travel between them. Other location systems can be those utilizing BlueTooth beacon systems and RFID locator systems.
- As indicated previously, an outwardly-sensing optical sensor can likewise be used to narrow the retrieval parameters for the classifiers, both in the training and re-identification stage. In certain embodiments, the 3D space can be outfitted with various markers or fiducials, which can be introduced as a classifier parameter, again, making the training more efficient and increasing the specificity and selectivity of the process. Accordingly and in an exemplary implementation, the AIC further comprises a transceiver in communication with the central processing module, the transceiver adapted to transmit and receive a signal configured to provide location of the cart, for example a UWB signal and the classifier produced initially, and retrieved during re-identification by the processor, is, in addition to other image-extracted features, location based.
-
FIG. 1A is a flow diagram showing how product images 100 are used for characteristics extraction 101-106 such ascolor palette 101; box/wrap type &shape 102;weight 103, key-words &logo 104 and scale and rotation invariant features 105 andDeep Feature 106. Thedeep feature 106 may represent a sub-field of machine learning involving learning of a product characteristic representations provided from a deep-neural-network (DNN) trained to identify that product. A DNN can be used to provide product characteristics by collecting its response to a given input, such as a product's image, from at least one of its layers. The response of a DNN to a product image is hierarchical by nature. The term ‘Deep Features’ typically corresponds to responses produced by deeper layers of a DNN model. Such ‘Deep Features’ are typically specific to the input product or to similar products, therefore can be used as a unique descriptors for the product, such descriptors can be used as characteristics for classifier retrieval and re-identification once the product is scanned. - Images 100 may be processed in order to prepare them for classification. Image (pre-)processing may include multiple steps such as: cropping, background subtraction, denoising, lighting adjustment and others. Once a clean image is obtained, the product characteristics are extracted and used for retrieval of relevant classifiers that are needed to obtain an accurate decision on the inserted product.
- The
product characteristics 107 are used to selectrelevant classifiers 108 from classifier data-base 440 (see e.g.,FIG. 4 ). The input images 100 or a processed version of the input images 100 are provided to the set ofrelevant classifiers 108 which produce product classification (inference) 109 - A decision module 450 (see e.g.,
FIG. 4 ) collects the classifiedresults 109 from all retrievedclassifiers 108 and makes asingle decision 110 for the inserted product, thus identifying theproduct 111. -
FIG. 1B shows the process of training a classifier with similar product characteristics. Images of anew product 112 undergo characteristics (feature) extraction 101-106. Characteristics 101-106 are extracted using the original input images 112 (organic images) or a processed version of the input images (synthetic images) creating theproduct characteristics vector 113.Product characteristics vector 113, from multiple products are used to form the product characteristic data-base 114. The product characteristics are used to identify other products (species) who sharesimilar characteristics 115 and create an updated product characteristic data-base 116. Theclassifier 118 is trained on images from existing products'database 117 of products which share similar characteristics (genus). Thetraining 118 allows the classifier to distinguish between the similar products (in other words, identify species within the genus). Upon successful training, thenew classifier 118 is added to the updatedclassifier data base 119 and the product-characteristics data-base 116 is updated with thenew product characteristics 113. -
FIG. 2 shows an illustration of the characteristics vector and its use for classifier retrieval. The extracted product characteristics 201-205 are provided as a separate numeric vector, or as keys, or concatenated to a single numeric vector presentation. Specialized data structures such asVP tree 212 or other metric/ball trees and/or Hashing functions 211 are used for obtaining an approximated nearestneighbor 220. The neighbors are then used to retrieve the set orrelevant classifiers 230 from the classifier database. -
FIG. 3 shows the flow diagram for insertion and/or removal of a product from the products database. Upon receiving of images of anew product 300, the product characteristics extraction is applied to obtain a set ofproduct characteristics 301. Theproduct characteristics 301 are utilized to identifyproducts 302 that exhibit similar characteristics. The set ofproducts 302 are used to identify asuitable classifier 303 based on the size and complexity of the selectedproduct set 302. The selected classifier is then trained 304 with images or processed version of the selectedproduct set 302. Avalidation module 305 test the performance of the new classifier and decides whether the classifier provides the intended accuracy. If the newly-formed classifier fails to achieve the intended accuracy it may lead to a splitting the product set to multiple smaller product sets 307, where then each smaller set will be subjected toclassifier selection 303 and training separately 304. If the intended accuracy was achieved thedecision module 306 will than decide which classifiers in the database needs to be updated 308. The classifiers will then be provided with additional training images for allowing them to cope with the new product and the new classifiers space partition. When no additional training or fine tuning is needed, the classifier database is updated with thenew classifiers 309 and the product-characteristics database is updated 310 with the new product characteristics. Upon selection of a product to be removed from thedatabase 320, the set of classifiers able to identify this product are identified 321. This set of classifiers are then tested 322 if fusing of two or more classifiers is needed. If some are all of the classifiers maintain sufficient amount of products. Thus, no classifier fusion is needed, the selectedclassifiers 308 are tuned byadditional training steps 304 without the images of the removed products. If classifier fusion is desired the system will identify relevantsimilar products 302 forselection 303 andtraining 304 of a new classifier on the newly formed dataset. Typically, artificial neural-networks (ANN) produce superior results to other commonly used classifiers such as SVM's, Decision trees. But any classifier may be interchangeably used with this suggested method. -
FIG. 4 , is a schematic illustrating an exemplary implementation of the various modules used in implementing the methods described herein. As illustrated, plurality of imaging sensors 401 i (e.g. RGB-D cameras), some of which are directed toward the inside of the cart (not shown). The plurality of imaging sensors can be configured to capture an image of the product inserted into the cart, thus providing perspective allowing distinguishing both size and shape characteristics of the product, and together with theload cell module 402, present the extracted features to acquisition andsynchronization module 410, which can form in certain embodiment, a part of imaging module (dashed trapezoid) - Extracted features are input to a
processing module 420, which compares them with characteristics already stored incharacteristic database 430 whereupon finding a corresponding characteristic set incharacteristics database 430, the set of identified characteristics serves either together as a single concatenated vector or each individually, as a key to retrieve a set of classifiers fromclassifier database 440. The retrieved classifier, or set of classifiers are then input to data fusion and decision module 450, where a determination is made regarding the confidence level in identifying the product, and if the confidence level is above a predetermined level, the product is identified 500 and optionally displayed on a cart display (not shown). - Images captured by acquisition and
synchronization module 410 can be transmitted toproduct image database 470, and toclassifier training module 460, where and discrepancy between retrieved classifiers and product characteristics extracted is identified and based on extracted features, a new classifier is established, and the characteristic associated with the classifier is established as a retrieval key for that particular classifier. The captured image is then stored inproduct image database 470, and the added classifier and added feature (characteristic) are stored inclassifier database 440 andcharacteristic database 430 respectively. Thus, any image captured by the imaging module serves simultaneously as training, validation and test image. - Unless specifically stated otherwise, as apparent from the description, it is appreciated that throughout the specification discussions utilizing terms such as “processing,” “loading,” “in communication,” “detecting,” “calculating,” “determining”, “analyzing,” “presenting”, “retrieving” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as the captured and acquired image of the product inserted into the cart (or removed) into other data similarly represented as series of numerical values, such as the transformed data.
- In an exemplary implementation , provided herein is a system for automated, real-time product identification in a shopping cart of a store and/or warehouse having more than a thousand stock keeping items (SKIs), comprising: a cart having a front wall, a rear wall and two side walls forming an apically open container with a base; a load cell module operably coupled to the base of the cart; a plurality of imaging modules coupled to the cart, adapted to, at least one of image an item inserted into the cart, and image an area of interest outside the cart; a central processing module in communication with the load cell and the plurality of imaging modules, the central processing module comprising a processor and being in further communication with a non-volatile memory having thereon: a classifiers' database; a product characteristics database; and a processor readable media comprising a set of executable instruction, configured, when executed to cause the processor to retrieve a set of a plurality of classifiers, wherein each set of classifiers is configured to identify a single product inserted into the cart; wherein (i) the set of executable instructions is configured, when executed, to cause the processor further to: using the imaging module, acquire an image of each inserted product; extract a predetermined number of features from the image; and identify the features extracted, wherein (ii) the set of classifiers are retrieved from the classifier database, (iii) the classifier database comprises a plurality of classifiers configured to classify at least one of: an entire warehouse, a storeroom, a shop, a grocery store, a supermarket, and a combination comprising one or more of the foregoing, wherein (iv) the features extracted by the imaging module from the inserted product is at least one of: color pallet designator, a shape designator, volumetric size, weight, a key-word, a graphic element, a scale and rotation invariant feature, and a synthesized deep feature, (v) the color pallet designator is HSV, CIE-LAB, CIE-XYZ, CIE-LCH or a color pallet designator comprising a combination of the foregoing, wherein (vi) the set of executable instructions, prior to extracting the predetermined number of features, are configured to preprocess the image of each product inserted to the cart by reducing the number of colors of the product, for example by (vii) quantizing colors using at least one of: K-means, median-cut, octree, variance-based, binary splitting, greedy orthogonal bipartitioning, optimal principal multilevel quantizer, minmax, fuzzy c-means and their combination, wherein (viii) the set of executable instructions, when executed, are configured to cause the processor to produce a color histogram of the image color palette, wherein (ix) the imaging module comprises a RGB-D camera, and wherein the set of executable instructions, when executed, are configured to cause the processor to: using the RGB-D, determine the size of the product; using the load cell module, determine the weight of the product: and combine the size and the weight to a single parameter, (x) the set of executable instructions, are further configured, when executed, to cause the processor to identify and retrieve at least one of: a logo, a watermark, a key word, a graphic symbol, and their combination, (xi) retrieving at least one of the logo, the watermark, the key word, the graphic symbol, and their combination, comprises using at least one of: a single-shot MultiBox detector (SSD) neural network, a Regional convolutional neural network. (RCNN), a Fast-RCNN, a Faster-RCNN, and a You Only Look Once (YOLO) neural network, wherein (xii) the set of executable instructions, are configured, when executed, to cause the processor to assign a retrieval key to each classifier, wherein (xiii) the set of plurality of the classifiers identifying a single product are selected by applying a nearest-neighbors algorithm and approximated nearest neighbors (ANN) algorithm to the classifiers associated with the product independently; and selecting the classifiers selected by both algorithms, wherein (xiv) all extracted features are encoded numerically, wherein (xv) the classifiers' database is formed by training the plurality of classifiers on products sharing at least one characteristic, wherein formed classifiers are associated with these set of characteristics for later retrieval, wherein (xvi) the set of executable instructions, are configured, when executed, to cause the processor to associate each of the product characteristic to a classifier or a set of classifiers, wherein (xvii) the set of executable instructions, are configured, when executed, to cause the processor to retrieve a classifier or a plurality of classifiers from the classifier database, by using each of the product characteristics or their combination as a key for retrieval, wherein (xviii) the set of executable instructions, are configured, when executed, to cause the processor to identify the color characteristic of the inserted product, through retrieving relevant classifiers by associating the color histogram of the inserted product to the classifier's-associated color-histograms by determining similarity that is based on cross histogram bin distance analysis, (xix) the cross histogram bin distance is measured using at least one of: Earth Movers Distance (EMD), and Mahalanobis Distance, wherein (xx) the set of executable instructions, are configured, when executed, to cause the processor to identify the color characteristic of the inserted product, through retrieving relevant classifiers by associating the color histogram of the inserted product to the classifier's-associated color-histograms by determining similarity that is based on bin-to-bin distance analysis, (xxi) the cross-histogram bin distance is measured using at least one of: Kullback-Leibler (KL)-Divergence, Bhattacharyya distance, and Chi-square (X2) distance, wherein (xxii) the set of executable instructions, are configured, when executed, to cause the processor to train products' recognition and/or identification algorithm, by grouping products sharing at least one characteristics, wherein (xxiii) the non-volatile memory further comprises a product image database, configured to store images captured by the imaging module, wherein (xxiv) the set of executable instructions, are configured, when executed, to cause the processor to associate and/or retrieve the classifier based on product's packaging shape or shape descriptor, wherein the shape or shape-descriptor is identified as the key-word on the package, (xxv) the retrieved, key-word based classifier is semantically related to the inserted product characteristic, wherein (xxvi) the set of executable instructions, are configured, when executed, to cause the processor to associate and/or retrieve the classifier by matching representative key-words of the inserted products, to key-words associated to the classifiers in the classifiers database, wherein (xxvii) the set of executable instructions, are configured, when executed, to cause the processor to associate and/or retrieve the classifier, the classifier's associated descriptors in the classifier database is matched to the inserted product descriptors, by applying ORB, BRIEF descriptors, wherein (xxviii) the cart further comprises a transceiver in communication with the central processing module, the transceiver adapted to transmit and receive a signal configured to provide location of the cart, (xxix) the signal is an ultra-wideband radio pulse, (xxx) the classifiers retrieved are dependent on the cart's location, wherein (xxxi) the deep feature is synthesized from at least one of: a predetermined CNN, and a plurality of the product characteristics, and wherein (xxxii) the plurality of product characteristics further comprises the product location within at least one of: the store, and the warehouse.
- While the invention has been described in detail and with reference to specific exemplary implementations thereof, it will be apparent to one of ordinary skill in the art that various changes and modifications can be made therein without departing from the spirit and scope thereof. Accordingly, it is intended that the present disclosure covers the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.
Claims (18)
1. A system for automated product identification in a shopping cart, comprising:
a. a cart having a front wall, a rear wall and two side walls forming an apically open container with a base;
b. a plurality of imaging modules coupled to the cart, adapted to, at least one of image an item inserted into the cart, and image an area of interest outside the cart;
c. a central processing module in communication with the plurality of imaging modules, the central processing module comprising a processor and being in further communication with a non-volatile memory having thereon:
i. a classifiers' database;
ii. a product characteristics database; and
iii. a processor readable media comprising a set of executable instruction, configured, when executed to cause the processor to retrieve from the classifier database a set of a plurality of classifiers,
wherein each set of classifiers is configured to identify a single product inserted into the from an entire warehouse, storeroom, shop, grocery store, supermarket, or a combination comprising one or more of the foregoing.
2. The system of claim 1 , wherein the set of executable instructions is configured, when executed, to cause the processor further to:
a. using the imaging module, acquire an image of each inserted product;
b. extract a predetermined number of features from the image; and
c. identify the features extracted.
3. The system of claim 2 , wherein the features extracted by the imaging module from the inserted product is at least one of: color pallet designator, a shape designator, volumetric size, weight, a key-word, a graphic element, a scale and rotation invariant feature, and a deep feature synthesized from at least one of: a predetermined CNN, and a plurality of the product characteristics.
4. The system of claim 3 , wherein the color pallet designator is HSV, CIE-LAB, CIE-XYZ, CIE-LCH or a color pallet designator comprising a combination of the foregoing.
5. The system of claim 4 , wherein the set of executable instructions, prior to extracting the predetermined number of features, are configured to preprocess the image of each product inserted to the cart by reducing the number of colors of the product by quantizing colors using at least one of: K-means, median-cut, octree, variance-based, binary splitting, greedy orthogonal bipartitioning, optimal principal multilevel quantizer, minmax, fuzzy c-means and their combination.
6. The system of claim 5 , wherein the set of executable instructions, when executed, are configured to cause the processor to:
a. produce a color histogram of the image color palette; and
b. identify the color characteristic of the inserted product, through retrieving relevant classifiers by associating the color histogram of the inserted product to the classifier's-associated color-histograms by determining similarity that is based on cross histogram bin distance analysis.
7. The system of claim 1 , wherein the imaging module comprises a RGB-D camera, and wherein the set of executable instructions, when executed, are further configured to cause the processor to:
a. using the RGB-D, determine the size of the product;
b. using a load cell module included with the system, determine the weight of the product: and
c. combine the size and the weight to a single parameter.
11. The system of claim 1 , wherein the set of executable instructions, are further configured, when executed, to cause the processor to identify and retrieve at least one of: a logo, a watermark, a key word, a graphic symbol, and their combination using at least one of: a single-shot MultiBox detector (SSD) neural network, a Regional convolutional neural network. (RCNN), a Fast-RCNN, a Faster-RCNN, and a You Only Look Once (YOLO) neural network.
12. The system of claim 1 , wherein the set of plurality of the classifiers identifying a single product are selected by applying a nearest-neighbors algorithm and approximated nearest neighbors (ANN) algorithm to the classifiers associated with the product independently; and selecting the classifiers selected by both algorithms.
13. The system of claim 2 , wherein product shapes are encoded numerically.
14. The system of claim 1 , wherein the classifiers' database is formed by training the plurality of classifiers on products sharing at least one characteristic, wherein formed classifiers are associated with these set of characteristics for later retrieval.
15. The system of claim 14 , wherein the set of executable instructions, are configured, when executed, to cause the processor to associate each of the product characteristic to a classifier or a set of classifiers.
16. The system of claim 14 , wherein the set of executable instructions, are configured, when executed, to cause the processor to retrieve a classifier or a plurality of classifiers from the classifier database, by using each of the product characteristics or their combination as a key for retrieval.
17. The system of claim 6 , wherein the cross-histogram bin distance is measured using at least one of: Kullback-Leibler (KL)-Divergence, Bhattacharyya distance, and Chi-square (X2) distance.
18. The system of claim 15 , wherein the set of executable instructions, are configured, when executed, to cause the processor to train products' recognition and/or identification algorithm, by: grouping products sharing at least one characteristics.
19. The system of claim 1 , wherein the non-volatile memory further comprises a product image database, configured to store images captured by the imaging module.
20. The system of claim 11 , wherein the set of executable instructions, are configured, when executed, to cause the processor to associate and/or retrieve the classifier based on:
a. product's packaging shape or shape descriptor, wherein the shape or shape-descriptor is identified as the key-word on the package that is semantically related to the inserted product characteristic;
b. matching representative key-words of the inserted products, to key-words associated to the classifiers in the classifiers database; and
c. matching the classifier's associated descriptors in the classifier database to the inserted product descriptors by applying local image descriptors' matching algorithm.
21. The system of claim 1 , wherein the cart further comprises a transceiver in communication with the central processing module, the transceiver adapted to transmit and receive an ultra-wideband radio pulse configured to provide location of the cart and wherein the classifiers retrieved are dependent on the cart's location.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/353,024 US20210312206A1 (en) | 2018-12-20 | 2021-06-21 | System and method for classifier training and retrieval from classifier database for large scale product identification |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862782377P | 2018-12-20 | 2018-12-20 | |
PCT/IL2019/051390 WO2020129066A1 (en) | 2018-12-20 | 2019-12-19 | System and method for classifier training and retrieval from classifier database for large scale product identification |
US17/353,024 US20210312206A1 (en) | 2018-12-20 | 2021-06-21 | System and method for classifier training and retrieval from classifier database for large scale product identification |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IL2019/051390 Continuation WO2020129066A1 (en) | 2018-12-20 | 2019-12-19 | System and method for classifier training and retrieval from classifier database for large scale product identification |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210312206A1 true US20210312206A1 (en) | 2021-10-07 |
Family
ID=71102081
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/267,839 Active US11386639B2 (en) | 2018-12-20 | 2019-12-19 | System and method for classifier training and retrieval from classifier database for large scale product identification |
US17/353,024 Abandoned US20210312206A1 (en) | 2018-12-20 | 2021-06-21 | System and method for classifier training and retrieval from classifier database for large scale product identification |
US17/667,590 Active US11941581B2 (en) | 2018-12-20 | 2022-02-09 | System and method for classifier training and retrieval from classifier database for large scale product identification |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/267,839 Active US11386639B2 (en) | 2018-12-20 | 2019-12-19 | System and method for classifier training and retrieval from classifier database for large scale product identification |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/667,590 Active US11941581B2 (en) | 2018-12-20 | 2022-02-09 | System and method for classifier training and retrieval from classifier database for large scale product identification |
Country Status (5)
Country | Link |
---|---|
US (3) | US11386639B2 (en) |
EP (1) | EP3899789A4 (en) |
AU (1) | AU2019402308A1 (en) |
IL (1) | IL273136B (en) |
WO (1) | WO2020129066A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11216652B1 (en) * | 2021-03-01 | 2022-01-04 | Institute Of Automation, Chinese Academy Of Sciences | Expression recognition method under natural scene |
KR102433786B1 (en) * | 2021-10-13 | 2022-08-18 | 주식회사 케이티 | Modular electric cart and method for remote work instruction using the same |
US11868865B1 (en) * | 2022-11-10 | 2024-01-09 | Fifth Third Bank | Systems and methods for cash structuring activity monitoring |
US11934995B1 (en) * | 2022-03-28 | 2024-03-19 | Amazon Technologies, Inc. | Package similarity prediction system |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11361470B2 (en) * | 2019-05-09 | 2022-06-14 | Sri International | Semantically-aware image-based visual localization |
US11651024B2 (en) * | 2020-05-13 | 2023-05-16 | The Boeing Company | Automated part-information gathering and tracking |
EP3985620B1 (en) * | 2020-10-13 | 2023-12-20 | Tata Consultancy Services Limited | Fine-grained classification of retail products |
CN113139932B (en) * | 2021-03-23 | 2022-12-20 | 广东省科学院智能制造研究所 | Deep learning defect image identification method and system based on ensemble learning |
US11507915B1 (en) * | 2021-08-24 | 2022-11-22 | Pitt Ohio | System and method for monitoring a transport of a component |
US20230088925A1 (en) * | 2021-09-21 | 2023-03-23 | Microsoft Technology Licensing, Llc | Visual Attribute Expansion via Multiple Machine Learning Models |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140157A1 (en) * | 2014-11-14 | 2016-05-19 | International Business Machines Corporation | Aesthetics data identification and evaluation |
WO2017146595A1 (en) * | 2016-02-26 | 2017-08-31 | Imagr Limited | System and methods for shopping in a physical store |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7337960B2 (en) * | 2004-02-27 | 2008-03-04 | Evolution Robotics, Inc. | Systems and methods for merchandise automatic checkout |
US7053820B2 (en) * | 2004-05-05 | 2006-05-30 | Raytheon Company | Generating three-dimensional images using impulsive radio frequency signals |
JP4883649B2 (en) * | 2006-08-31 | 2012-02-22 | 公立大学法人大阪府立大学 | Image recognition method, image recognition apparatus, and image recognition program |
EP2569721A4 (en) * | 2010-05-14 | 2013-11-27 | Datalogic Adc Inc | Systems and methods for object recognition using a large database |
US10200625B2 (en) | 2014-01-30 | 2019-02-05 | Bd Kiestra B.V. | System and method for image acquisition using supervised high quality imaging |
US9524563B2 (en) * | 2014-06-26 | 2016-12-20 | Amazon Technologies, Inc. | Automatic image-based recommendations using a color palette |
WO2016054778A1 (en) * | 2014-10-09 | 2016-04-14 | Microsoft Technology Licensing, Llc | Generic object detection in images |
-
2019
- 2019-12-19 EP EP19898295.1A patent/EP3899789A4/en active Pending
- 2019-12-19 WO PCT/IL2019/051390 patent/WO2020129066A1/en unknown
- 2019-12-19 AU AU2019402308A patent/AU2019402308A1/en active Pending
- 2019-12-19 US US17/267,839 patent/US11386639B2/en active Active
-
2020
- 2020-03-08 IL IL273136A patent/IL273136B/en active IP Right Grant
-
2021
- 2021-06-21 US US17/353,024 patent/US20210312206A1/en not_active Abandoned
-
2022
- 2022-02-09 US US17/667,590 patent/US11941581B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140157A1 (en) * | 2014-11-14 | 2016-05-19 | International Business Machines Corporation | Aesthetics data identification and evaluation |
WO2017146595A1 (en) * | 2016-02-26 | 2017-08-31 | Imagr Limited | System and methods for shopping in a physical store |
Non-Patent Citations (10)
Title |
---|
Baldevbhai, Patel Janakkumar. "Color Image Segmentation for Medical Images Using L*a*b* Color Space." IOSR Journal of Electronics and Communication Engineering, vol. 1, no. 2, 2012, pp. 24–45. DOI.org (Crossref), https://doi.org/10.9790/2834-0122445. (Year: 2012) * |
Cheng, Danni, et al. "Large-Scale Visible Watermark Detection and Removal with Deep Convolutional Networks." Pattern Recognition and Computer Vision, edited by Jian-Huang Lai et al., Springer International Publishing, 2018, pp. 27–40. Springer Link, https://doi.org/10.1007/978-3-030-03338-5_3. (Year: 2018) * |
Choong et al ("Evaluation of Convolutionary Neural Networks Modeling of DNA Sequences using Ordinal versus one-hot Encoding Method", Jan 2018). https://doi.org/10.1109/ICONDA.2017.8270400 (Year: 2018) * |
Contigiani, Marco, et al. "Implementation of a Tracking System Based on UWB Technology in a Retail Environment." 2016 12th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications (MESA), IEEE, 2016, pp. 1–6. DOI.org (Crossref), https://doi.org/10.1109/MESA.2016.7587123. (Year: 2016) * |
Gul, Asma, et al. "Ensemble of a Subset of kNN Classifiers." Advances in Data Analysis and Classification, vol. 12, no. 4, Dec. 2018, pp. 827–40. Springer Link, https://doi.org/10.1007/s11634-015-0227-5. (Year: 2018) * |
Marín-Reyes, Pedro A., et al. Comparative Study of Histogram Distance Measures for Re-Identification. arXiv, 24 Nov. 2016. arXiv.org, https://doi.org/10.48550/arXiv.1611.08134. (Year: 2016) * |
Merler, Michele, et al. "Recognizing Groceries in Situ Using in Vitro Training Data." 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8. IEEE Xplore, https://doi.org/10.1109/CVPR.2007.383486. (Year: 2007) * |
Qi, Chengzuo, et al. "Logo Retrieval Using Logo Proposals and Adaptive Weighted Pooling." IEEE Signal Processing Letters, vol. 24, no. 4, Apr. 2017, pp. 442–45. IEEE Xplore, https://doi.org/10.1109/LSP.2017.2673119. (Year: 2017) * |
Wang Z, Walsh KB, Verma B. On-Tree Mango Fruit Size Estimation Using RGB-D Images. Sensors. 2017; 17(12):2738. https://doi.org/10.3390/s17122738 (Year: 2017) * |
Zingade, Akarsh. "Logo Detection Using YOLOv2." Medium, 8 Dec. 2017, https://medium.com/@akarshzingade/logo-detection-using-yolov2-8cda5a68740e. (Year: 2017) * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11216652B1 (en) * | 2021-03-01 | 2022-01-04 | Institute Of Automation, Chinese Academy Of Sciences | Expression recognition method under natural scene |
KR102433786B1 (en) * | 2021-10-13 | 2022-08-18 | 주식회사 케이티 | Modular electric cart and method for remote work instruction using the same |
US11934995B1 (en) * | 2022-03-28 | 2024-03-19 | Amazon Technologies, Inc. | Package similarity prediction system |
US11868865B1 (en) * | 2022-11-10 | 2024-01-09 | Fifth Third Bank | Systems and methods for cash structuring activity monitoring |
Also Published As
Publication number | Publication date |
---|---|
IL273136B (en) | 2021-02-28 |
EP3899789A1 (en) | 2021-10-27 |
US20210312205A1 (en) | 2021-10-07 |
EP3899789A4 (en) | 2022-10-05 |
US20220165046A1 (en) | 2022-05-26 |
IL273136A (en) | 2020-06-30 |
US11941581B2 (en) | 2024-03-26 |
WO2020129066A1 (en) | 2020-06-25 |
US11386639B2 (en) | 2022-07-12 |
AU2019402308A1 (en) | 2021-08-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11941581B2 (en) | System and method for classifier training and retrieval from classifier database for large scale product identification | |
Santra et al. | A comprehensive survey on computer vision based approaches for automatic identification of products in retail store | |
Hameed et al. | A comprehensive review of fruit and vegetable classification techniques | |
CN106682233B (en) | Hash image retrieval method based on deep learning and local feature fusion | |
Shalunts et al. | Architectural style classification of building facade windows | |
US9330111B2 (en) | Hierarchical ranking of facial attributes | |
Geng et al. | Fine-grained grocery product recognition by one-shot learning | |
EP2054855B1 (en) | Automatic classification of objects within images | |
Bosch et al. | Scene classification using a hybrid generative/discriminative approach | |
Mehmood et al. | A novel image retrieval based on a combination of local and global histograms of visual words | |
Girod et al. | Mobile visual search: Architectures, technologies, and the emerging MPEG standard | |
US20160155011A1 (en) | System and method for product identification | |
Zawbaa et al. | Automatic fruit image recognition system based on shape and color features | |
CN102402621A (en) | Image retrieval method based on image classification | |
George et al. | Fine-grained product class recognition for assisted shopping | |
CN104281572B (en) | A kind of target matching method and its system based on mutual information | |
CN106557728B (en) | Query image processing and image search method and device and monitoring system | |
US11861669B2 (en) | System and method for textual analysis of images | |
Walia et al. | An effective and fast hybrid framework for color image retrieval | |
Gothai et al. | Design features of grocery product recognition using deep learning | |
Emmanuel et al. | Fuzzy clustering and Whale-based neural network to food recognition and calorie estimation for daily dietary assessment | |
Wang et al. | A chordiogram image descriptor using local edgels | |
Tombari et al. | Online learning for automatic segmentation of 3D data | |
Kumar et al. | Retrieval of flower based on sketches | |
US12125080B2 (en) | System and method for textual analysis of images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: TRACXPOINT LLC, FLORIDA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:TRACXONE LTD.;REEL/FRAME:064517/0652 Effective date: 20230808 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |