JP7228671B2

JP7228671B2 - Store Realog Based on Deep Learning

Info

Publication number: JP7228671B2
Application number: JP2021504467A
Authority: JP
Inventors: ジョーダンイー．フィッシャー，; ダニエルエル．フィシェッティ，; マイケルエス．サスワル，; ニコラスジェイ．ロカスチオ，
Original assignee: スタンダードコグニションコーポレーション
Priority date: 2018-07-26
Filing date: 2019-07-25
Publication date: 2023-02-24
Anticipated expiration: 2039-07-25
Also published as: TWI779219B; TW202013240A; EP3827391A4; JP2021533449A; EP3827391A1; CA3107446A1; WO2020023798A1

Description

priority application

本出願は、参照により本明細書に組み込まれる２０１８年７月２６日出願の米国仮特許出願第６２／７０３，７８５号（代理人整理番号ＳＴＣＧ１００６－１）、及び、２０１９年１月２４日出願の米国特許出願第１６／２５６，３５５号（代理人整理番号ＳＴＣＧ１００７－１）の利益を主張する。該米国特許出願第１６／２５６，３５５号は、２０１７年８月７日出願の米国仮特許出願第６２／５４２，０７７号（代理人整理番号ＳＴＣＧ１０００－１）の利益を主張する、２０１７年１２月１９日出願の米国特許出願第１５／８４７，７９６（代理人整理番号ＳＴＣＧ１００１－１）（現在は米国特許第１０，０５５，８５３号、８月２１日発行）の一部継続出願である、２０１８年２月２７日出願の米国特許出願第１５／９０７，１１２号（代理人整理番号ＳＴＣＧ１００２－１）（現在は米国特許第１０，１３３，９３３号、２０１８年１１月２０日発行）の一部継続出願である、２０１８年４月４日出願の米国特許出願第１５／９４５，４７３号（代理人整理番号ＳＴＣＧ１００５－１）の一部継続出願である。これらの米国特許出願は参照により本明細書に組み込まれる。 No. 62/703,785 (Attorney Docket No. STCG 1006-1), filed July 26, 2018, which is incorporated herein by reference, and January 24, 2019 The benefit of filing US patent application Ser. No. 16/256,355 (Attorney Docket No. STCG 1007-1) is claimed. Said U.S. patent application Ser. In a continuation-in-part of U.S. patent application Ser. U.S. Patent Application Serial No. 15/907,112 (Attorney Docket No. STCG 1002-1) filed February 27, 2018 (now U.S. Patent No. 10,133,933, issued November 20, 2018) No. 15/945,473, filed April 4, 2018 (Attorney Docket No. STCG 1005-1), which is a continuation-in-part of US Patent Application No. These US patent applications are incorporated herein by reference.

本発明は、在庫陳列構造を含む実空間のエリア内の在庫商品を追跡するシステムに関する。 The present invention relates to a system for tracking inventory items within areas of real space containing inventory display structures.

ショッピングストア等の実空間のエリア内の在庫陳列構造にストックされる様々な在庫商品の数量及び位置を決定することは、ショッピングストアの効率的な業務のために必要とされる。顧客等の実空間のエリア内にいる被写体は、棚から商品を取り、その商品をそれぞれのショッピングカートまたはバスケット内に置く。また、顧客は、商品を購入したくない場合にはその商品を同じ棚または別の棚に置いて戻すこともできる。従って、ある期間にわたって、在庫商品は、棚上の指定された位置から取り出され、ショッピングストア内の他の棚に分散し得る。幾つかのシステムでは、ストックされた商品の数量は、領収書とストック在庫とを連結する必要があるため、かなりの遅延の後に利用可能である。ショッピングストアにストックされる商品の数量に関する情報の利用可能性の遅延は、顧客の購入決定、並びに、高需要の在庫商品をより多く注文するための店舗管理者の行動に影響を及ぼす可能性がある。 Determining the quantity and location of various inventory items stocked in an inventory display structure within an area of a real space such as a shopping store is required for efficient operation of the shopping store. A subject within an area of real space, such as a customer, picks an item from a shelf and places the item in their shopping cart or basket. Also, if the customer does not want to purchase the item, the customer can put the item back on the same shelf or another shelf. Thus, over a period of time, inventory items may be removed from designated locations on the shelf and distributed to other shelves within the shopping store. In some systems, stocked item quantities are available after significant delays due to the need to concatenate receipts and stock inventory. Delays in the availability of information about the quantity of products stocked in a shopping store can affect customer purchasing decisions and the actions of store managers to order more of high-demand inventory. be.

棚にストックされた商品の量をリアルタイムでより効果的且つ自動的に提供し、棚上の商品の位置を識別することができるシステムを提供することが望ましい。 It would be desirable to provide a system that can more effectively and automatically provide the quantity of items stocked on a shelf in real time and identify the location of items on the shelf.

実空間のエリアの在庫商品を追跡するためのシステム及びシステムの動作方法が提供される。複数のカメラまたはその他のセンサは、実空間内の対応する視野のそれぞれの画像シーケンスを生成する。このシステムは、複数のセンサに結合され、少なくとも２つのセンサによって生成された画像シーケンスを使用して、在庫イベントを識別する処理ロジックを含む。システムは、在庫イベントに応答して、実空間のエリア内の在庫商品を追跡する。 A system for tracking inventory in an area of real space and a method of operating the system are provided. Multiple cameras or other sensors generate respective image sequences of corresponding fields of view in real space. The system includes processing logic coupled to a plurality of sensors and using image sequences generated by at least two sensors to identify inventory events. The system tracks inventory items within areas of real space in response to inventory events.

実空間のエリア内の在庫商品を追跡するためのシステム及び方法が提供される。複数のカメラまたはその他のセンサは、実空間内の対応する視野のそれぞれの画像シーケンスを生成する。各センサの視野は、複数のセンサにおける少なくとも１つの他のセンサの視野と重なる。システムは、複数のセンサ内の少なくとも２つのセンサによって生成された画像のシーケンスを使用して、在庫イベントを識別する。システムは、在庫イベントに応答して、実空間のエリア内の在庫商品の位置を追跡する。 Systems and methods are provided for tracking inventory items within areas of real space. Multiple cameras or other sensors generate respective image sequences of corresponding fields of view in real space. The field of view of each sensor overlaps the field of view of at least one other sensor in the plurality of sensors. The system uses a sequence of images produced by at least two sensors in the plurality of sensors to identify inventory events. The system tracks the location of inventory items within areas of real space in response to inventory events.

一実施形態では、在庫イベントは、商品識別子、置くまたは取るインジケータ、実空間のエリアの３つの軸に沿った位置によって表される位置、及びタイムスタンプを含む。システムは、実空間のエリア内に座標を有する複数のセルを規定するデータセットを格納するメモリを含むか、またはアクセスを有することができる。システムは、在庫商品の位置をセルの座標とマッチングさせるロジックを含み、複数のセル内のセルとマッチングする在庫商品を表すデータを維持する。実空間のエリアは、複数の在庫位置を含むことができる。複数のセル内のセルの座標は、複数の在庫位置内の在庫位置または在庫位置の一部と相関することができる。システムは、在庫イベントのそれぞれのカウントを使用して、特定のセルにマッチングする位置を有する在庫商品について、スコアリング時にスコアを計算するロジックを含む。セルのスコアを計算するロジックは、置くこと及び取ることのタイムスタンプとスコアリング時との間の分離によって重み付けされた在庫商品を置くこと及び取ることの合計を使用する。システムは、スコアをメモリに格納するロジックを含む。 In one embodiment, an inventory event includes an item identifier, a put or take indicator, a position represented by a position along three axes of an area in real space, and a timestamp. The system may include or have access to a memory storing a data set defining a plurality of cells having coordinates within an area of real space. The system includes logic that matches inventory item locations with cell coordinates, and maintains data representing inventory items that match cells within a plurality of cells. An area of real space can include multiple inventory locations. Coordinates of cells within the plurality of cells can be correlated with inventory locations or portions of inventory locations within the plurality of inventory locations. The system includes logic that uses each count of inventory events to calculate scores during scoring for inventory items that have a matching position in a particular cell. The logic to calculate a cell's score uses the total put and take inventory weighted by the separation between the put and take time stamps and the time of scoring. The system includes logic to store the scores in memory.

一実施形態では、システムが、複数のセル内のセル及び該セルのスコアを表す表示画像をレンダリングするロジックを含む。本実施形態では、セルを表す表示画像における色の変化によってスコアが表される。システムは、スコアに基づいてセル毎の在庫商品のセットを選択するロジックを含んでいる。一実施形態では、実空間のエリアが複数の在庫位置を含み、複数のセル内のセルの座標は複数の在庫位置内の在庫位置と相関する。この実施形態では、メモリ内のデータセットが、実空間のエリア内に座標を有する複数のセルを規定する。 In one embodiment, a system includes logic to render a display image representing a cell and the score of the cell within a plurality of cells. In this embodiment, the score is represented by a color change in the displayed image representing the cell. The system includes logic that selects a set of inventory items for each cell based on the scores. In one embodiment, an area of real space includes a plurality of inventory locations, and coordinates of cells within the plurality of cells correlate to inventory locations within the plurality of inventory locations. In this embodiment, a data set in memory defines a plurality of cells having coordinates within an area of real space.

システムは、実空間のエリア内の在庫位置及び在庫位置に配置される在庫商品を識別するプラノグラムを格納するメモリを含むか、またはメモリへのアクセスを有することができる。プラノグラムは、また、特定の在庫商品に対して指定された在庫位置の部分に関する情報を含むことができる。プラノグラムは、実空間のエリア内の在庫位置上の在庫商品の配置のためのプランに基づいて生成することができる。 The system may include or have access to a memory that stores a planogram identifying inventory locations and inventory items located at the inventory locations within an area of real space. The planogram may also include information regarding the portion of inventory locations designated for particular inventory items. A planogram can be generated based on a plan for placement of inventory items on inventory locations within an area of real space.

システムは、複数のセル内のセルとマッチングする在庫商品を表すデータを維持するロジックを含む。システムは、また、セルとマッチングする在庫商品を表すデータをプラノグラムと比較することによって、誤配置された商品を決定するロジックを含むことができる。 The system includes logic to maintain data representing inventory items matching cells in a plurality of cells. The system may also include logic to determine misplaced items by comparing data representing inventory items that match the cell to the planogram.

システムは、ここで議論されたように、検出された在庫イベントにおける商品及びその位置ついてのデータの蓄積に基づいて、実空間のエリア内の在庫商品の位置を識別する、本明細書で「リアログラム」と呼ばれるデータ構造を、メモリ内に生成し格納することができる。リアログラム内のデータは、誤配置された商品の位置を見つける等、在庫商品が計画と比較してエリア内にどのように配置されているかを決定するために、プラノグラム内のデータと比較することができる。また、リアログラムは、例えば、在庫位置のプラノグラムまたは他のマップから決定され得るように、３次元セル内の在庫商品の位置を見つけ、それらのセルを店舗内の在庫位置と相関させるために処理され得る。また、リアログラムを処理して、エリア内の様々な位置にある特定の在庫商品に関連する活動を追跡することができる。リアログラムの他の使用も可能である。 The system identifies the location of an inventory item within an area of real space based on the accumulation of data about the item and its location in detected inventory events, as discussed herein, referred to herein as a "rear A data structure called a "program" can be generated and stored in memory. The data in the realogogram is compared to the data in the planogram to determine how inventory items are placed in the area compared to the plan, such as locating misplaced items be able to. Realog may also be used to locate inventory items within three-dimensional cells and correlate those cells with inventory locations within a store, as may be determined, for example, from a planogram or other map of inventory locations. can be processed. Also, re-logograms can be processed to track activity associated with a particular inventory item at various locations within an area. Other uses of realograms are also possible.

在庫陳列構造を含む実空間のエリア内の在庫商品を追跡するためのシステム及び方法が提供される。システムは、在庫陳列構造の上方に配置された複数のカメラを含む。カメラは、実空間内の対応する視野内に在庫陳列構造のそれぞれの画像シーケンスを生成する。各カメラの視野は、複数のカメラ内の少なくとも１つの他のカメラの視野と重なる。データセットは、実空間のエリア内に座標を有する複数のセルを規定する。データセットはメモリに保存される。システムは、実空間のエリア内の３次元における在庫イベントの位置を見つけるために、複数のカメラによって生成された画像シーケンスを処理する。在庫イベントに応答して、システムは、在庫イベントの位置に基づいてデータセット内の最も近いセルを決定するロジックを含む。システムは、在庫イベントのそれぞれのカウントを使用して特定のセルにマッチングする位置を有する在庫イベントに関連する在庫商品についてスコアをスコアリング時に計算するロジックを含む。 A system and method are provided for tracking inventory items within an area of real space that includes an inventory display structure. The system includes multiple cameras positioned above an inventory display structure. A camera produces a respective image sequence of an inventory display structure within a corresponding field of view in real space. The field of view of each camera overlaps the field of view of at least one other camera in the plurality of cameras. A dataset defines a plurality of cells having coordinates within an area of real space. Datasets are stored in memory. The system processes image sequences generated by multiple cameras to locate inventory events in three dimensions within an area of real space. In response to the inventory event, the system includes logic to determine the closest cell in the dataset based on the location of the inventory event. The system includes logic that, at scoring, computes scores for inventory items associated with inventory events that have locations that match a particular cell using the respective count of inventory events.

一実施形態では、システムは、スコアに基づいてセル毎の在庫商品のセットを選択するロジックを含んでいる。一実施形態では、在庫イベントは、商品識別子、置くまたは取るインジケータ、実空間のエリアの３つの軸に沿った位置によって表される位置、及びタイムスタンプを含む。一実施形態では、システムは、実空間のエリア内の座標を有する２次元グリッドとして表される複数のセルを規定するデータセットを含む。セルは、在庫位置の前面図の部分と相関することができる。処理システムは、在庫イベントの位置に基づいて最も近いセルを決定するロジックを含む。一実施形態では、システムは、実空間のエリア内の座標を有する３次元グリッドとして表される複数のセルを規定するデータセットを含む。セルは、在庫位置上の容積の部分と相関することができる。処理システムは、在庫イベントの位置に基づいて最も近いセルを決定するロジックを含む。置くインジケータは、商品が在庫位置に置かれたことを識別し、取るインジケータは、商品が在庫位置から取り出されたことを識別する。 In one embodiment, the system includes logic to select a set of inventory items for each cell based on the score. In one embodiment, an inventory event includes an item identifier, a put or take indicator, a position represented by a position along three axes of an area in real space, and a timestamp. In one embodiment, a system includes a dataset defining a plurality of cells represented as a two-dimensional grid having coordinates within an area of real space. A cell can be correlated with a front view portion of an inventory location. The processing system includes logic to determine the closest cell based on the location of the inventory event. In one embodiment, the system includes a dataset defining a plurality of cells represented as a three-dimensional grid having coordinates within an area of real space. A cell can be correlated with a portion of volume on an inventory location. The processing system includes logic to determine the closest cell based on the location of the inventory event. A put indicator identifies that the item has been placed in the inventory location, and a take indicator identifies that the item has been removed from the inventory location.

一実施形態では、複数のカメラによって生成された画像シーケンスを処理するロジックは、画像認識エンジンを備える。画像認識エンジンは、手に対応する画像内の要素を表すデータセットを生成する。システムは、少なくとも２つのカメラからの画像シーケンスからのデータセットの分析を実行して、３次元における在庫イベントの位置を決定するロジックを含む。画像認識エンジンは、畳み込みニューラル・ネットワークを備える。 In one embodiment, the logic for processing image sequences generated by multiple cameras comprises an image recognition engine. An image recognition engine generates a dataset representing elements in the image that correspond to the hand. The system includes logic to perform analysis of data sets from image sequences from at least two cameras to determine the location of inventory events in three dimensions. The image recognition engine comprises a convolutional neural network.

一実施形態において、システムは、置くこと及び取ることのタイムスタンプとスコアリング時との間の分離によって重み付けされた在庫商品を取ること及び置くことの合計を使用してセルのスコアを計算するロジックを含む。スコアはメモリに記憶される。一実施形態では、在庫イベントの位置に基づいてデータセット内の最も近いセルを決定するロジックは、在庫イベントの位置からデータセット内のセルまでの距離を計算することと、計算された距離に基づいて在庫イベントをセルとマッチングさせることとを含む。 In one embodiment, the system has logic that calculates a cell's score using the sum of pick and put inventory items weighted by the separation between the put and pick time stamps and the time of scoring. including. The score is stored in memory. In one embodiment, the logic for determining the closest cell in the dataset based on the location of the inventory event comprises: calculating the distance from the location of the inventory event to the cell in the dataset; and matching inventory events with cells using the method.

コンピュータ・システムによって実行することができる方法及びコンピュータ・プログラム製品も、本明細書において説明されている。 Methods and computer program products that can be executed by computer systems are also described herein.

本明細書で説明される機能は、在庫イベントに関連する商品を含む在庫イベントを識別し、実空間のエリア内の座標を有する複数のセル内のセルにリンクすること、及び、店舗リアログラムを更新することを含み、これらに限定されないが、例えば、処理される画像データのタイプ、画像データのどの処理を実行すべきか、及び、画像データからどのように動作を高い信頼性で決定するかに関する、コンピュータ・エンジニアリングの複雑な問題を提示する。 The functionality described herein identifies inventory events that include items associated with the inventory event, links them to cells within a plurality of cells that have coordinates within an area of real space, and stores a store realog. relating to, for example, the type of image data to be processed, what processing of the image data to perform, and how to reliably determine operations from the image data, including but not limited to updating , presents a complex problem in computer engineering.

本発明の他の態様及び利点は、以下の図面、詳細な説明、及び特許請求の範囲を検討することによって理解することができる。 Other aspects and advantages of the invention can be understood from a review of the following drawings, detailed description, and claims.

店舗在庫エンジン及び店舗リアログラム・エンジンが在庫陳列構造を含む実空間のエリア内の在庫商品を追跡するシステムのアーキテクチャ・レベルの概略図を示す。1 shows an architecture-level schematic diagram of a system in which a store inventory engine and a store re-alog engine track inventory items within an area of a real space containing an inventory display structure; FIG.

ショッピングストア内の被写体、在庫陳列構造、及びカメラア配置を示すショッピングストア内の通路の側面図である。1 is a side view of an aisle within a shopping store showing subjects within the shopping store, inventory display structure, and camera placement; FIG.

在庫陳列構造内の棚から商品を取り出す被写体を示す図２Ａの通路内の在庫陳列構造の斜視図である。2B is a perspective view of the aisle inventory structure of FIG. 2A showing a subject removing items from a shelf in the inventory structure; FIG.

在庫陳列構造における棚の２Ｄ及び３Ｄマップの例を示す。Fig. 3 shows an example of 2D and 3D maps of shelves in an inventory display structure;

被写体の関節情報を記憶するための例示的なデータ構造を示す。4 illustrates an exemplary data structure for storing joint information of a subject;

関連する関節の情報を含む被写体を記憶するための例示的なデータ構造を示す。4 shows an exemplary data structure for storing an object including associated joint information.

棚から取り出された商品を示す在庫イベントの位置に基づく在庫陳列構造内の棚の選択を示す、ショッピングストア内の図２Ａの通路内の棚ユニットの在庫陳列構造の上面図である。2B is a top view of an inventory display structure of the in-aisle shelf units of FIG. 2A in a shopping store showing selection of a shelf within the inventory display structure based on the location of an inventory event indicating an item being picked from the shelf; FIG.

被写体のショッピングカート、または、棚上またはショッピングストア内にストックされている在庫商品を格納するために利用可能なログ・データ構造の例を示す。FIG. 10 illustrates an example of a log data structure that can be used to store a subject's shopping cart or inventory items stocked on a shelf or in a shopping store; FIG.

在庫商品を置くこと及び取ることの位置に基づいて棚上及びショッピングストア内の在庫商品を決定する処理ステップを示すフローチャートである。FIG. 4 is a flow chart showing the process steps for determining inventory on shelves and in a shopping store based on placement and pick-up locations of inventory.

図８のフローチャートに示された技術を使用して、実空間のエリア内の棚上の在庫商品を決定することができる例示的なアーキテクチャである。9 is an exemplary architecture in which the techniques illustrated in the flowchart of FIG. 8 can be used to determine inventory on shelves within an area of real space.

図８のフローチャートに示された技術を使用して、店舗在庫データ構造を更新することができる例示的なアーキテクチャである。9 is an exemplary architecture in which the techniques illustrated in the flowchart of FIG. 8 can be used to update a store inventory data structure;

２次元（２Ｄ）グリッドを使用した在庫陳列構造内の部分における棚の離散化を示す。FIG. 11 illustrates the discretization of shelves in portions within an inventory display structure using a two-dimensional (2D) grid; FIG.

在庫陳列構造内の棚の部分上の指定された位置から、同じ棚上の他の位置へ、及び、１日後にショッピングストア内の他の在庫陳列構造内の異なる棚上の位置へ分散した在庫商品の位置を示す、棚の３次元（３Ｄ）グリッドを使用したリアログラムの例示である。Dispersed inventory from a designated position on a portion of a shelf within an inventory display structure to other positions on the same shelf and after one day to different shelf positions within other inventory structures within the shopping store. 1 is an illustration of a rearogram using a three-dimensional (3D) grid of shelves showing the location of items;

コンピューティング・デバイスのユーザ・インタフェース上に表示される図１１Ａのリアログラムを示す一例である。11B is an example showing the realogram of FIG. 11A displayed on a user interface of a computing device; FIG.

在庫商品を置くこと及び取ることの位置に基づいて、ショッピングストア内の在庫陳列構造の棚にストックされている在庫商品のリアログラムを算出するための処理ステップを示すフローチャートである。Fig. 3 is a flow chart showing the processing steps for calculating a realog of inventory items stocked on shelves of an inventory display structure within a shopping store based on placement and pick-up locations of the inventory items;

リアログラムを用いて在庫商品の再ストックを決定する処理ステップを示すフローチャートである。FIG. 10 is a flow chart showing processing steps for determining restocking of inventory items using a rearogram; FIG.

在庫商品に対する再ストック通知を表示する例示的なユーザ・インタフェースである。4 is an exemplary user interface displaying restock notifications for inventory items.

リアログラムを用いて、プラノグラムのコンプライアンスを判定する処理ステップを示すフローチャートである。FIG. 4 is a flow chart showing the process steps for determining planogram compliance using a realogram; FIG.

在庫商品に対する誤配置商品通知を表示する例示的なユーザ・インタフェースである。4 is an exemplary user interface displaying misplaced item notifications for inventory items.

リアログラムを使用して在庫商品予想の信頼度スコア確率を調整するための処理ステップを示すフローチャートである。FIG. 4 is a flow chart showing the process steps for adjusting the confidence score probabilities of inventory forecasts using a realog.

図１の在庫統合エンジン及び店舗リアログラム・エンジンをホストするように構成されたカメラ及びコンピュータ・ハードウェア構成である。2 is a camera and computer hardware configuration configured to host the inventory consolidation engine and store re-log engine of FIG. 1;

以下の説明は、当業者が本発明を作成し使用することを可能にするために提示され、特定の用途及びその要件に即して提供される。開示された実施態様に対する様々な修正は、当業者には容易に明らかであり、本明細書で定義される一般原則は、本発明の精神及び範囲から逸脱することなく、他の実施態様及び用途に適用され得る。従って、本発明は、示された実施態様に限定されることを意図するものではなく、本明細書に開示された原理及び特徴と一致する最も広い範囲が与えられるべきである。
［システム概要］ The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of particular applications and requirements thereof. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be adapted to other embodiments and applications without departing from the spirit and scope of the invention. can be applied to Accordingly, the invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
[System Overview]

図１～図１３を参照して、対象技術のシステム及び様々な実施態様を説明する。システム及び処理は、本実施態様によるシステムのアーキテクチャ・レベル概略図である図１を参照して説明される。図１は、アーキテクチャ図であるため、説明の明確性を向上させるために、特定の詳細は省略されている。 A system and various implementations of the subject technology are described with reference to FIGS. The system and processing are described with reference to FIG. 1, which is an architectural level schematic diagram of the system according to the present embodiment. Since FIG. 1 is an architectural diagram, certain details have been omitted to improve the clarity of the description.

図１の説明は、以下のように編成される。最初に、システムの要素を説明し、次にそれらの相互接続を説明する。次に、システムにおける要素の使用についてより詳細に説明する。 The description of FIG. 1 is organized as follows. First, the elements of the system are described, then their interconnections. The use of the elements in the system will now be described in more detail.

図１は、システム１００のブロック図レベルの説明図を提供する。本システム１００は、カメラ１１４、ネットワーク・ノードがホストする画像認識エンジン１１２ａ、１１２ｂ及び１１２ｎ、ネットワーク上のネットワーク・ノード（または、ノード）１０４内に配置された店舗在庫エンジン１８０、ネットワーク上のネットワーク・ノード（または、ノード）１０４内に配置された店舗リアログラム・エンジン１９０、被写体追跡エンジンをホストするネットワーク・ノード１０２、マップ・データベース１４０、在庫イベント・データベース１５０、プラノグラム及び在庫データベース１６０、リアログラム・データベース１７０、及び、１または複数の通信ネットワーク１８１を含む。ネットワーク・ノードは、１つの画像認識エンジンのみ、または、本明細書で説明されるように、複数の画像認識エンジンをホストすることができる。システムは、また、被写体データベース、及び、他のサポートデータを含むことができる。 FIG. 1 provides a block diagram level illustration of system 100 . The system 100 includes a camera 114, image recognition engines 112a, 112b and 112n hosted by network nodes, a store inventory engine 180 located in a network node (or nodes) 104 on the network, a network node on the network. A store realog engine 190 located within a node (or nodes) 104, a network node 102 hosting an object tracking engine, a map database 140, an inventory event database 150, a planogram and inventory database 160, a realog • includes a database 170 and one or more communication networks 181; A network node can host only one image recognition engine, or multiple image recognition engines as described herein. The system can also include subject databases and other supporting data.

本明細書で使用されるように、ネットワーク・ノードは、ネットワークに接続され、通信チャネルを介して他のネットワーク・ノードとの間で情報を送信、受信、または転送することができる、アドレス可能なハードウェア・デバイスまたは仮想デバイスである。ハードウェア・ネットワーク・ノードとして配置することができる電子デバイスの例には、あらゆる種類のコンピュータ、ワークステーション、ラップトップ・コンピュータ、ハンドヘルド・コンピュータ、及びスマートフォンが含まれる。ネットワーク・ノードは、クラウドベースのサーバ・システムで実施することができる。ネットワーク・ノードとして構成された複数の仮想デバイスを、単一の物理デバイスを使用して実施することができる。 As used herein, a network node is an addressable node that is connected to a network and capable of sending, receiving, or transferring information to and from other network nodes via communication channels. A hardware device or a virtual device. Examples of electronic devices that can be deployed as hardware network nodes include computers of all kinds, workstations, laptop computers, handheld computers, and smart phones. A network node may be implemented in a cloud-based server system. Multiple virtual devices configured as network nodes can be implemented using a single physical device.

明確性のために、画像認識エンジンをホストする３つのネットワーク・ノードのみがシステム１００に示されている。しかしながら、画像認識エンジンをホストする任意の数のネットワーク・ノードを、ネットワーク１８１を介して被写体追跡エンジン１１０に接続することができる。同様に、本明細書で説明する画像認識エンジン、被写体追跡エンジン、店舗在庫エンジン、店舗リアログラム・エンジン、及び、他の処理エンジンは、分散アーキテクチャ内の複数のネットワーク・ノードを使用して実行することができる。 For clarity, only three network nodes hosting image recognition engines are shown in system 100 . However, any number of network nodes hosting image recognition engines can be connected to object tracking engine 110 via network 181 . Similarly, the image recognition engine, object tracking engine, store inventory engine, store realog engine, and other processing engines described herein run using multiple network nodes in a distributed architecture. be able to.

次に、システム１００の要素の相互接続について説明する。ネットワーク１８１は、画像認識エンジン１１２ａ、１１２ｂ、及び１１２ｎをそれぞれホストするネットワーク・ノード１０１ａ、１０１ｂ、及び１０１ｎ、店舗在庫エンジン１８０をホストするネットワーク・ノード１０４、店舗リアログラム・エンジン１９０をホストするネットワーク・ノード１０６、追跡エンジン１１０をホストするネットワーク・ノード１０２、マップ・データベース１４０、在庫イベント・データベース１５０、在庫データベース１６０、及び、リアログラム・データベース１７０を結合する。カメラ１１４は、画像認識エンジン１１２ａ、１１２ｂ、及び１１２ｎをホストするネットワーク・ノードを介して被写体追跡エンジン１１０に接続される。一実施形態では、カメラ１１４がショッピングストア（スーパーマーケット等）に設置され、重なり合う視野を有するカメラ１１４のセット（２つ以上）が各通路の上に配置されて、店舗内の実空間の画像を取得する。図１では、２つのカメラが通路１１６ａの上に配置され、２つのカメラが通路１１６ｂの上に配置され、３つのカメラが通路１１６ｎの上に配置されている。カメラ１１４は、重なり合う視野を有する通路上に設置される。斯かる実施形態では、カメラは、ショッピングストアの通路内を移動する顧客がいつの時点でも２つ以上のカメラの視野内に存在することを目標として構成される。 The interconnection of the elements of system 100 will now be described. Network 181 includes network nodes 101a, 101b, and 101n, which host image recognition engines 112a, 112b, and 112n, respectively; network node 104, which hosts store inventory engine 180; It couples node 106 , network node 102 hosting tracking engine 110 , map database 140 , inventory event database 150 , inventory database 160 and realogram database 170 . Camera 114 is connected to object tracking engine 110 via network nodes that host image recognition engines 112a, 112b, and 112n. In one embodiment, cameras 114 are installed in a shopping store (such as a supermarket), and a set (two or more) of cameras 114 with overlapping fields of view are placed over each aisle to capture images of the real space within the store. do. In FIG. 1, two cameras are positioned above aisle 116a, two cameras are positioned above aisle 116b, and three cameras are positioned above aisle 116n. Cameras 114 are placed on the passageway with overlapping fields of view. In such embodiments, the cameras are configured with the goal that a customer moving through the aisles of the shopping store is within the field of view of more than one camera at any given time.

カメラ１１４は、互いに時間的に同期させることができ、その結果、画像は、同時にまたは時間的に近く、かつ同じ画像キャプチャレートで取得される。カメラ１１４は、画像認識エンジン１１２ａ～１１２ｎをホストするネットワーク・ノードに、所定のレートでそれぞれの継続的な画像ストリームを送ることができる。同時にまたは時間的に近くに、実空間のエリアをカバーする全てのカメラにおいて取得された画像は、同期された画像が実空間において固定された位置を有する被写体の異なる光景を表すものとして処理エンジンにおいて識別され得るという意味で、同期している。例えば、一実施形態では、カメラが、３０フレーム／秒（ｆｐｓ）のレートで、画像認識エンジン１１２ａ～１１２ｎをホストするそれぞれのネットワーク・ノードに画像フレームを送信する。各フレームは、画像データと共に、タイムスタンプ、カメラの識別情報（「カメラＩＤ」と略される）、及びフレーム識別情報（「フレームＩＤ」と略される）を有する。開示された技術の他の実施形態は、このデータを生成するために、赤外線イメージ・センサ、高周波イメージ・センサ、超音波センサ、熱センサ、ライダ（Ｌｉｄａｒ）等の様々なタイプのセンサを使用することができる。ＲＧＢカラー出力を生成するカメラ１１４に追加して、例えば、赤外線または高周波イメージ・センサを含む、複数タイプのセンサが使用され得る。複数のセンサは互いに時間的に同期され、その結果、フレームは、センサによって同時または時間的に近接して、同じフレーム・キャプチャレートで取得される。明細書に開示される全ての実施形態において、カメラ以外のセンサ、または、複数タイプのセンサが、使用される画像シーケンスを生成するために使用され得る。 The cameras 114 can be synchronized in time with each other so that images are acquired at the same time or close in time and at the same image capture rate. Cameras 114 may send respective continuous image streams at a predetermined rate to network nodes hosting image recognition engines 112a-112n. Images acquired at all cameras covering an area of real space at the same time or nearby in time are viewed by the processing engine as synchronized images representing different views of an object having a fixed position in real space. Synchronous in the sense that they can be identified. For example, in one embodiment, cameras transmit image frames at a rate of 30 frames per second (fps) to respective network nodes hosting image recognition engines 112a-112n. Along with the image data, each frame has a time stamp, camera identification information (abbreviated as “camera ID”), and frame identification information (abbreviated as “frame ID”). Other embodiments of the disclosed technology use various types of sensors, such as infrared image sensors, radio frequency image sensors, ultrasonic sensors, thermal sensors, lidar, etc., to generate this data. be able to. Multiple types of sensors may be used in addition to camera 114 that produces RGB color output, including, for example, infrared or high frequency image sensors. The multiple sensors are time-synchronized with each other so that frames are acquired by the sensors at the same time or close in time and at the same frame capture rate. In all of the embodiments disclosed herein, sensors other than cameras, or multiple types of sensors, may be used to generate the image sequences used.

通路上に設置されたカメラは、それぞれの画像認識エンジンに接続される。例えば、図１において、通路１１６ａ上に設置された２つのカメラは、画像認識エンジン１１２ａをホストするネットワーク・ノード１０１ａに接続される。同様に、通路１１６ｂ上に設置された２つのカメラは、画像認識エンジン１１２ｂをホストするネットワーク・ノード１０１ｂに接続される。ネットワーク・ノード１０１ａ～１０１ｎ内でホストされる各画像認識エンジン１１２ａ～１１２ｎは、図示の例ではそれぞれ１つのカメラから受信した画像フレームを別々に処理する。 Cameras installed on the corridor are connected to their respective image recognition engines. For example, in FIG. 1, two cameras located on corridor 116a are connected to network node 101a, which hosts image recognition engine 112a. Similarly, two cameras placed on the corridor 116b are connected to the network node 101b that hosts the image recognition engine 112b. Each image recognition engine 112a-112n hosted within a network node 101a-101n separately processes image frames received from a respective camera in the illustrated example.

一実施形態では、各画像認識エンジン１１２ａ、１１２ｂ、及び１１２ｎは、畳み込みニューラル・ネットワーク（ＣＮＮと略す）などの深層学習アルゴリズムとして実装される。斯かる実施形態では、ＣＮＮがトレーニング・データベース１５０を使用してトレーニングされる。本明細書で説明される実施形態では、実空間内の被写体の画像認識が、画像内で認識可能な関節を識別しグループ化することに基づいており、関節のグループは個々の被写体に帰属することができる。この関節ベースの分析のために、トレーニング・データベース１５０は、被写体のための異なるタイプの関節の各々に対して膨大な画像を収集している。ショッピングストアの例示的な実施形態では、被写体は、棚の間の通路を移動する顧客である。例示的な実施形態では、ＣＮＮのトレーニング中に、システム１００は「トレーニング・システム」と呼ばれる。トレーニング・データベース１５０を使用してＣＮＮをトレーニングした後、ＣＮＮは、プロダクション・モードに切り替えられ、ショッピングストア内の顧客の画像をリアルタイムで処理する。 In one embodiment, each image recognition engine 112a, 112b, and 112n is implemented as a deep learning algorithm, such as a convolutional neural network (CNN for short). In such embodiments, the CNN is trained using training database 150 . In the embodiments described herein, image recognition of objects in real space is based on identifying and grouping recognizable joints in the image, and joint groups are attributed to individual objects. be able to. For this joint-based analysis, the training database 150 has collected a large number of images for each different type of joint for the subject. In an exemplary embodiment of a shopping store, the subject is a customer moving through the aisles between the shelves. In an exemplary embodiment, during CNN training, system 100 is referred to as a "training system." After training the CNN using the training database 150, the CNN is switched to production mode to process images of customers in the shopping store in real time.

例示的な実施形態では、プロダクション中に、システム１００はランタイム・システムと呼ばれる（推論システムとも呼ばれる）。それぞれの画像認識装置のＣＮＮは、それぞれの画像ストリーム中の画像に対して関節データ構造の配列を生成する。本明細書に記載される実施形態では、関節データ構造の配列が、各処理された画像に対して生成されることで、各画像認識エンジン１１２ａ～１１２ｎが、関節データ構造の配列の出力ストリームを生成する。重なり合う視野を有するカメラからの関節データ構造のこれらの配列は、関節のグループを形成し、斯かる関節のグループを被写体として識別するために、更に処理される。システムは、被写体が実空間のエリア内に存在している間、識別子「被写体ＩＤ」を使用して被写体を識別及び追跡することができる。 In an exemplary embodiment, during production, system 100 is referred to as a runtime system (also referred to as an inference system). Each image recognizer's CNN generates an array of joint data structures for the images in each image stream. In the embodiments described herein, an array of joint data structures is generated for each processed image such that each image recognition engine 112a-112n generates an output stream of arrays of joint data structures. Generate. These arrays of joint data structures from cameras with overlapping fields of view are further processed to form joint groups and identify such joint groups as objects. The system can use the identifier "subject ID" to identify and track the subject while the subject resides within an area of real space.

被写体追跡エンジン１１０は、ネットワーク・ノード１０２上でホストされ、この例では、画像認識エンジン１１２ａ～１１２ｎから被写体の関節データ構造の配列の継続的なストリームを受信する。被写体追跡エンジン１１０は、関節データ構造の配列を処理し、様々なシーケンスの画像に対応する関節データ構造の配列内の要素の座標を、実空間内の座標を有する候補関節に変換する。同期画像の各セットについて、実空間全体にわたって識別された候補関節の組み合わせは、類推目的のために、候補関節の銀河に似ていると考えることができる。後続の各時点において、銀河が経時的に変化するように、候補関節の動きが記録される。被写体追跡エンジン１１０は、ある時点での実空間のエリア内の被写体を識別する。 The subject tracking engine 110 is hosted on the network node 102 and, in this example, receives a continuous stream of arrays of subject joint data structures from the image recognition engines 112a-112n. The object tracking engine 110 processes the array of joint data structures and transforms the coordinates of the elements in the array of joint data structures corresponding to the various sequences of images into candidate joints having coordinates in real space. For each set of synchronous images, the combination of candidate joints identified over real space can be thought of as resembling a galaxy of candidate joints for analogy purposes. At each subsequent time point, motion of the candidate joint is recorded as the galaxy changes over time. Object tracking engine 110 identifies objects within an area of real space at a point in time.

追跡エンジン１１０は、実空間内の座標を有する候補関節のグループまたはセットを、実空間内の被写体として識別するロジックを使用する。類推目的のために、候補点の各セットは、各時点における候補関節の星座（コンステレーション）に似ている。候補関節のコンステレーションは、時間とともに移動することができる。ある期間にわたる被写体追跡エンジン１１０の出力の時系列分析は、実空間のエリア内の被写体の動きを識別する。 The tracking engine 110 uses logic to identify groups or sets of candidate joints having coordinates in real space as objects in real space. For analogy purposes, each set of candidate points resembles a constellation of candidate joints at each instant. The constellation of candidate joints can move over time. A time-series analysis of the output of the object tracking engine 110 over time identifies the movement of objects within areas of real space.

例示的な実施形態では、候補関節のセットを識別するロジックが、実空間内の被写体の関節間の物理的関係に基づくヒューリスティック関数を含む。これらのヒューリスティック関数は、候補関節のセットを被写体として識別するために使用される。候補関節のセットは、他の個々の候補関節とのヒューリスティック・パラメータに基づく関係を有する個々の候補関節、及び、個々の被写体として識別された、または識別され得る特定され得る所与のセット内の候補関節のサブセットを含む。 In an exemplary embodiment, the logic for identifying the set of candidate joints includes a heuristic function based on physical relationships between the subject's joints in real space. These heuristic functions are used to identify a set of candidate joints as objects. The set of candidate joints includes individual candidate joints that have heuristic parameter-based relationships with other individual candidate joints, and individual subjects within a given set that have been or can be identified as individual subjects. Contains a subset of candidate joints.

ショッピングストアの例では、顧客（上記の被写体とも呼ばれる）が通路内及びオープンスペース内を移動する。顧客は、在庫陳列構造内の棚上の在庫位置から商品を取り出す。在庫陳列構造の一例では、棚はフロアから様々なレベル（または高さ）に配置され、在庫商品は棚上にストックされる。棚は、壁に固定されるか、または、ショッピングストア内の通路を形成する自立棚として配置され得る。在庫陳列構造の他の例には、ペグボード棚、マガジン棚、回転式棚、倉庫棚、及び、冷蔵棚ユニットが含まれる。在庫商品は、積み重ねワイヤバスケット、ダンプビン等の他のタイプの在庫陳列構造にストックすることもできる。また、顧客は商品を、それらが取り出された棚から同じ棚に、または別の棚に戻すこともできる。 In the example of a shopping store, customers (also referred to as subjects above) move through aisles and open spaces. A customer picks up an item from an inventory position on a shelf within an inventory display structure. In one example of an inventory display structure, shelves are arranged at various levels (or heights) from the floor and inventory items are stocked on the shelves. Shelves may be fixed to walls or arranged as free-standing shelves forming an aisle within a shopping store. Other examples of inventory display structures include pegboard shelving, magazine shelving, carousel shelving, warehouse shelving, and refrigerated shelving units. Inventory may also be stocked in other types of inventory display structures such as stacked wire baskets, dump bins, and the like. Customers can also return items to the same shelf from which they were removed or to a different shelf.

システムは、顧客が棚に物品を置き、棚から物品を取り出すときに、ショッピングストア内の在庫位置における在庫を更新するために、（ネットワーク・ノード１０４上でホストされる）店舗在庫エンジン１８０を含む。店舗在庫エンジンは、在庫位置に置かれた在庫商品の識別子（在庫管理単位またはＳＫＵ等）を示すことによって、在庫位置の在庫データ構造を更新する。在庫統合エンジンは、また、ショッピングストアにストックされたそれらの数量を更新することによって、ショッピングストアの在庫データ構造を更新する。在庫位置及び店舗在庫データは、顧客の在庫データ（在庫商品のログ・データ構造、または、ショッピングカート・データ構造とも呼ばれる）とともに、在庫データベース１６０に格納される。 The system includes a store inventory engine 180 (hosted on network node 104) to update inventory at inventory locations within the shopping store as customers place and remove items from the shelves. . The store inventory engine updates the inventory data structure of the inventory location by indicating the identifiers (such as inventory keeping units or SKUs) of the inventory items placed in the inventory location. The inventory consolidation engine also updates the shopping store's inventory data structure by updating those quantities stocked in the shopping store. Inventory locations and store inventory data are stored in inventory database 160 along with customer inventory data (also referred to as an inventory log data structure or shopping cart data structure).

店舗在庫エンジン１８０は、在庫位置における在庫商品の状態を提供する。しかしながら、どの在庫商品が棚のどの部分に置かれているかを、何時でも決定することは困難である。これは、ショッピングストアの管理者や従業員にとって重要な情報である。在庫商品は、棚及び在庫商品がストックされることが計画されている棚上の位置を識別するプラノグラムに基づいて、在庫位置に配置することができる。例えば、ケチャップボトルは、列状の配置を形成する在庫陳列構造において、全ての棚の所定の左側部分にストックされてもよい。時間の経過につれて、顧客は棚からケチャップボトルを取り出し、それぞれのバスケットまたはショッピングカートに入れる。一部の顧客は、ケチャップボトルを、同じ在庫陳列構造内の同じ棚の別の部分に戻すかもしれない。また、顧客は、ショッピングストア内の他の在庫陳列構造の棚にケチャップボトルを戻すこともあり得る。店舗リアログラム・エンジン１９０（ネットワーク・ノード１０６上でホストされる）は、ケチャップボトルが時間「ｔ」に配置される棚の部分を識別するために使用できるリアログラムを生成する。この情報は、誤配置されたケチャップボトルの位置を有する従業員への通知を生成するために、システムによって使用され得る。 The store inventory engine 180 provides the status of inventory items in inventory locations. However, it is difficult to determine which inventory items are located on which portion of the shelf at any given time. This is important information for shopping store managers and employees. Inventory items can be placed in inventory locations based on a planogram that identifies shelves and locations on the shelves where the inventory items are planned to be stocked. For example, ketchup bottles may be stocked in predetermined left portions of all shelves in an inventory display structure forming a row arrangement. Over time, customers remove ketchup bottles from the shelves and place them in their respective baskets or shopping carts. Some customers may return the ketchup bottle to another portion of the same shelf within the same inventory display structure. Customers may also return ketchup bottles to shelves in other inventory structures within the shopping store. Store realog engine 190 (hosted on network node 106) generates a realog that can be used to identify the portion of the shelf where the ketchup bottle is placed at time "t". This information can be used by the system to generate a notification to the employee who has the location of the misplaced ketchup bottle.

また、この情報は、実空間のエリア内の在庫商品の位置を遅れずに追跡する、本明細書でリアログラムと呼ばれるデータ構造を生成するために、実空間のエリア内の在庫商品にわたって使用され得る。在庫商品の現在の状態を反映し、幾つかの実施形態では、ある時間間隔にわたって指定された時間「ｔ」における在庫商品の状態を反映する、店舗リアログラム・エンジン１９０によって生成されたショッピングストアのリアログラムは、リアログラム・データベース１７０に保存することができる。 This information is also used across inventory items within areas of real space to generate a data structure, referred to herein as a realog, that keeps track of the location of inventory items within areas of real space. obtain. A representation of the shopping store generated by the store rearogram engine 190 that reflects the current state of the inventory and, in some embodiments, the state of the inventory at a specified time "t" over an interval of time. Realogograms may be stored in realog database 170 .

ネットワーク１８１を介した、店舗在庫エンジン１７０をホストするネットワーク・ノード１０４、及び、店舗リアログラム・エンジン１９０をホストするネットワーク・ノード１０６への実際の通信経路は、公衆ネットワーク及び／またはプライベート・ネットワーク上のポイント・ツー・ポイントとすることができる。通信は、プライベート・ネットワーク、ＶＰＮ、ＭＰＬＳ回路、またはインターネットなどの様々なネットワーク１８１を介して行うことができ、適切なアプリケーション・プログラミング・インターフェース（ＡＰＩ）及びデータ交換フォーマット、例えば、ＲＥＳＴ（Representational State Transfer）、ＪＳＯＮ（JavaScript（商標）Object Notation）、ＸＭＬ（Extensible Markup Language）、ＳＯＡＰ（Simple Object Access Protocol）、ＪＭＳ（Java（商標）Message Service）、及び／またはＪａｖａプラットフォーム・モジュール・システム等を使用することができる。全ての通信は、暗号化することができる。通信は、一般に、ＥＤＧＥ、３Ｇ、４ＧＬＴＥ、Ｗｉ－Ｆｉ、及びＷｉＭＡＸ等のプロトコルを介して、ＬＡＮ(ローカル・エリア・ネットワーク）、ＷＡＮ(ワイド・エリア・ネットワーク）、電話ネットワーク（公衆交換電話網（ＰＳＴＮ））、セッション開始プロトコル（ＳＩＰ）、無線ネットワーク、ポイント・ツー・ポイント・ネットワーク、星型ネットワーク，トークンリング型ネットワーク，ハブ型ネットワーク、インターネット（モバイルインターネットを含む）等のネットワーク上で行われる。更に、ユーザ名／パスワード、オープン許可（ＯＡｕｔｈ）、Ｋｅｒｂｅｒｏｓ、ＳｅｃｕｒｅＩＤ、デジタル証明書などの様々な承認及び認証技術を使用して、通信を保護することができる。 The actual communication path to network node 104 hosting store inventory engine 170 and network node 106 hosting store realog engine 190 via network 181 may be over public and/or private networks. can be point-to-point. Communication can occur over various networks 181, such as private networks, VPNs, MPLS circuits, or the Internet, using suitable application programming interfaces (APIs) and data exchange formats, such as REST (Representational State Transfer). ), JSON (JavaScript™ Object Notation), XML (Extensible Markup Language), SOAP (Simple Object Access Protocol), JMS (Java™ Message Service), and/or the Java Platform Module System, etc. be able to. All communications can be encrypted. Communications are generally carried out over LANs (Local Area Networks), WANs (Wide Area Networks), telephone networks (Public Switched Telephone Networks) via protocols such as EDGE, 3G, 4G LTE, Wi-Fi, and WiMAX. PSTN)), Session Initiation Protocol (SIP), wireless networks, point-to-point networks, star networks, token ring networks, hub networks, the Internet (including mobile Internet), and the like. Additionally, various authorization and authentication techniques such as username/password, open authorization (OAuth), Kerberos, SecureID, digital certificates, etc. can be used to secure communications.

本明細書に開示される技術は、データベース・システム、マルチテナント環境、または、Ｏｒａｃｌｅ（商標）と互換性のあるデータベース実施態様、ＩＢＭＤＢ２ＥｎｔｅｒｐｒｉｓｅＳｅｒｖｅｒ（商標）と互換性のあるリレーショナル・データベース実施態様、ＭｙＳＱＬ（商標）またはＰｏｓｔｇｒｅＳＱＬ（商標）と互換性のあるリレーショナル・データベース実施態様またはＭｉｃｒｏｓｏｆｔＳＱＬＳｅｒｖｅｒ（商標）と互換性のあるリレーショナル・データベース実施態様等のリレーショナル・データベース実施態様、または、Ｖａｍｐｉｒｅ（商標）と互換性のある非リレーショナル・データベース実施態様、ＡｐａｃｈｅＣａｓｓａｎｄｒａ（商標）と互換性のある非リレーショナル・データベース実施態様、ＢｉｇＴａｂｌｅ（商標）と互換性のある非リレーショナル・データベース実施態様、またはＨＢａｓｅ（商標）またはＤｙｎａｍｏＤＢ（商標）と互換性のある非リレーショナル・データベース実施態様、等のＮｏＳＱＬ（商標）の非リレーショナル・データベース実施態様を含む何かのコンピュータ実装システムという状況下で実施され得る。更に、開示された技術は、ＭａｐＲｅｄｕｃｅ（商標）、バルク同期プログラミング、ＭＰＩプリミティブ等の様々なプログラミングモデル、または、ＡｐａｃｈｅＳｔｏｒｍ（商標）、ＡｐａｃｈｅＳｐａｒｋ（商標）、ＡｐａｃｈｅＫａｆｋａ（商標）、ＡｐａｃｈｅＦｌｉｎｋ（商標）、Ｔｒｕｖｉｓｏ（商標）、ＡｍａｚｏｎＥｌａｓｔｉｃｓｅａｒｃｈＳｅｒｖｉｃｅ（商標）、ＡｍａｚｏｎＷｅｂＳｅｒｖｉｃｅｓ（ＡＷＳ）（商標）、ＩＢＭＩｎｆｏ‐Ｓｐｈｅｒｅ（商標）、Ｂｏｒｅａｌｉｓ（商標）、及びＹａｈｏｏ！Ｓ４（商標）等の様々なスケーラブルなバッチ及びストリーム管理システムを使用して実施され得る。

［カメラ配置］ The technology disclosed herein can be used in database systems, multi-tenant environments, or database implementations compatible with Oracle™, relational database implementations compatible with IBM DB2 Enterprise Server™. , MySQL(TM) or PostgreSQL(TM) compatible relational database implementations or Microsoft SQL Server(TM) compatible relational database implementations, or Vampire(TM) ), a non-relational database implementation compatible with Apache Cassandra™, a non-relational database implementation compatible with BigTable™, or HBase™ ) or any non-relational database implementation of NoSQL™, such as a DynamoDB™ compatible non-relational database implementation. Further, the disclosed technology can be used with various programming models such as MapReduce™, bulk synchronous programming, MPI primitives, or Apache Storm™, Apache Spark™, Apache Kafka™, Apache Flink™ ), Truviso™, Amazon Elasticsearch Service™, Amazon Web Services (AWS)™, IBM Info-Sphere™, Borealis™, and Yahoo! It can be implemented using various scalable batch and stream management systems such as S4™.

[Camera layout]

カメラ１１４は、３次元（３Ｄと略される）実空間において多関節被写体（または存在物）を追跡するように配置される。ショッピングストアの例示的な実施形態では、実空間は、販売用の商品が棚に積み重ねられるショッピングストアのエリアを含むことができる。実空間内の点は、（ｘ，ｙ，ｚ）座標系で表すことができる。システムが適用される実空間のエリア内の各点は、２つ以上のカメラ１１４の視野によってカバーされる。 Camera 114 is positioned to track an articulated object (or entity) in three-dimensional (abbreviated 3D) real space. In an exemplary embodiment of a shopping store, the physical space may include an area of the shopping store where items for sale are stacked on shelves. A point in real space can be represented by an (x, y, z) coordinate system. Each point in the area of real space to which the system is applied is covered by the fields of view of two or more cameras 114 .

ショッピングストアでは、棚及び他の在庫陳列構造は、ショッピングストアの側壁に沿って、または通路を形成する列に、または２つの構成の組合せでなど、様々な方法で配置することができる。図２Ａは、通路１１６ａの一端から見た、通路１１６ａを形成する棚ユニットＡ２０２及び棚ユニットＢ２０４の配置を示す。２つのカメラ、カメラＡ２０６及びカメラＢ２０８は、棚ユニットＡ２０２及び棚ユニットＢ２０４等の棚のような在庫陳列構造の上のショッピングストアの天井２３０及びフロア２２０から所定の距離で通路１１６ａの上に配置される。カメラ１１４は、実空間内の在庫陳列構造及びフロアエリアのそれぞれの部分を包含する視野を有し、その上に配置されたカメラを備える。図２Ａに示すように、カメラＡ２０６の視野２１６とカメラＢ２０８の視野２１８は互いに重なり合っている。被写体として識別された候補関節のセットのメンバーの実空間内の座標は、フロアエリア内の被写体の位置を識別する。 In a shopping store, shelves and other inventory display structures can be arranged in a variety of ways, such as along the sidewalls of the shopping store, or in rows forming aisles, or in combinations of the two configurations. FIG. 2A shows the arrangement of shelving unit A 202 and shelving unit B 204 forming aisle 116a, viewed from one end of aisle 116a. Two cameras, camera A 206 and camera B 208, are positioned in the aisle 116a at a predetermined distance from the shopping store ceiling 230 and floor 220 above shelf-like inventory display structures such as shelf unit A 202 and shelf unit B 204. placed above. Camera 114 comprises a camera positioned thereon having a field of view encompassing respective portions of the inventory structure and floor area in real space. As shown in FIG. 2A, the field of view 216 of camera A 206 and the field of view 218 of camera B 208 overlap each other. The coordinates in real space of the members of the set of candidate joints identified as the subject identify the location of the subject within the floor area.

ショッピングストアの例示的な実施形態では、実空間は、ショッピングストア内のフロア２２０の全てを含むことができる。カメラ１１４は、フロア２２０及び棚のエリアが少なくとも２つのカメラによって見えるように配置され、配向される。カメラ１１４は、また、棚２０２及び２０４の前のフロアスペースを覆う。カメラの角度は急峻な視点、真っ直ぐな視点及び角度の付いた視点の両方を有するように選択され、これにより、顧客のより完全な身体画像が得られる。一実施形態では、カメラ１１４が、ショッピングストア全体を通して、８フィート高さ以上で構成される。図１３に、斯かる実施形態の説明図を示す。 In an exemplary embodiment of a shopping store, the real space may include all floors 220 within the shopping store. Cameras 114 are positioned and oriented such that the floor 220 and shelf areas are viewed by at least two cameras. Camera 114 also covers the floor space in front of shelves 202 and 204 . The angles of the cameras are selected to have both steep, straight and angled points of view, which gives a more complete body image of the customer. In one embodiment, cameras 114 are configured at eight feet or higher throughout the shopping store. FIG. 13 shows an explanatory diagram of such an embodiment.

図２Ａにおいて、被写体２４０は在庫陳列構造の棚ユニットＢ２０４の傍に立っており、片手は、棚ユニットＢ２０４内の棚（目に見えない）の近くに位置している。図２Ｂは、フロアから異なる高さに配置された４つの棚、棚１、棚２、棚３、及び棚４を備えた棚ユニットＢ２０４の斜視図である。在庫商品はこれらの棚にストックされる。

［３次元シーン生成］ In FIG. 2A, subject 240 is standing beside shelving unit B 204 of an inventory display structure, with one hand positioned near a shelf (not visible) within shelving unit B 204 . FIG. 2B is a perspective view of shelving unit B 204 with four shelves, shelf 1, shelf 2, shelf 3, and shelf 4, located at different heights from the floor. Inventory items are stocked on these shelves.

[3D scene generation]

実空間内の位置は、実空間座標系の（ｘ，ｙ，ｚ）点として表される。「ｘ」及び「ｙ」は、ショッピングストアのフロア２２０とすることができる２次元（２Ｄ）平面上の位置を表し、値「ｚ」は、１つの構成ではフロア２２０における２Ｄプレーン上の点の高さである。システムは、２つ以上のカメラからの２Ｄ画像を組み合わせて、実空間のエリア内における関節及び在庫イベント（商品を棚へ置く及び棚から商品を取る）の３次元位置を生成する。本項では、関節及び在庫イベントの３Ｄ座標を生成するための処理を説明する。該処理は、３Ｄシーン生成とも呼ばれる。 A position in real space is represented as a (x, y, z) point in the real space coordinate system. 'x' and 'y' represent locations on a two-dimensional (2D) plane that may be a floor 220 of a shopping store, and the value 'z' is a point on the 2D plane on the floor 220 in one configuration. Height. The system combines 2D images from two or more cameras to generate 3D locations of joints and inventory events (shelf and pick off) within areas of real space. This section describes the process for generating 3D coordinates for joints and inventory events. The process is also called 3D scene generation.

在庫商品を追跡するために訓練または推論モードでシステム１００を使用する前に、２つのタイプのカメラ較正、すなわち、内部較正と外部較正が実行される。内部較正では、カメラ１１４の内部パラメータが較正される。内部カメラ・パラメータの例には、焦点距離、主点、スキュー、魚眼係数等がある。内部カメラ較正のための種々の技術を使用することができる。斯かる技術の１つは、Ｚｈａｎｇによって、２０００年１１月の第２２巻第１１号、パターン解析と機械知能に関するＩＥＥＥトランザクションで発表された「カメラ較正のためのフレキシブルな新手法」において提示されている。 Before using the system 100 in training or inference mode to track inventory, two types of camera calibration are performed: internal and external. In internal calibration, the internal parameters of camera 114 are calibrated. Examples of internal camera parameters include focal length, principal point, skew, fisheye factor, and the like. Various techniques for internal camera calibration can be used. One such technique is presented by Zhang in "Flexible New Approaches for Camera Calibration," published in IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 11, November 2000. there is

外部較正では、外部カメラ・パラメータが、２Ｄ画像データを実空間の３Ｄ座標に変換するためのマッピング・パラメータを生成するために較正される。一実施形態では、人物などの１つの多関節被写体が実空間に導入される。多関節被写体は、各カメラ１１４の視野を通過する経路上で実空間を移動する。実空間内の任意の所与の点において、多関節被写体は、３Ｄシーンを形成する少なくとも２つのカメラの視野内に存在する。しかしながら、２つのカメラは、それぞれの２次元（２Ｄ）画像平面において同じ３Ｄシーンの異なるビューを有する。多関節被写体の左手首などの３Ｄシーン内の特徴は、それぞれの２Ｄ画像平面内の異なる位置にある２つのカメラによって見られる。 In external calibration, the extrinsic camera parameters are calibrated to generate mapping parameters for transforming 2D image data into 3D coordinates in real space. In one embodiment, one articulated subject, such as a person, is introduced into real space. The articulated subject moves in real space on a path passing through the field of view of each camera 114 . At any given point in real space, the articulated object is within the field of view of at least two cameras forming a 3D scene. However, the two cameras have different views of the same 3D scene in their respective two-dimensional (2D) image planes. A feature in the 3D scene, such as the left wrist of an articulated subject, is viewed by two cameras at different positions in their respective 2D image planes.

点対応は、所与のシーンについて重複する視野を有する全てのカメラ・ペアの間で確立される。各カメラは同じ３Ｄシーンの異なる視野を有するので、点対応は３Ｄシーンにおける同じ点の投影を表す２つのピクセル位置（重なり合う視野を有する各カメラからの１つの位置）である。外部較正のために、画像認識エンジン１１２ａ～１１２ｎの結果を使用して、各３Ｄシーンについて多くの点対応が識別される。画像認識エンジンは関節の位置を、それぞれのカメラ１１４の２Ｄ画像平面内のピクセルの（ｘ，ｙ）座標、例えば、行及び列番号として識別する。一実施形態では、関節は、多関節被写体の１９の異なるタイプの関節のうちの１つである。多関節被写体が異なるカメラの視野を通って移動するとき、追跡エンジン１１０は、較正に使用される多関節被写体の１９の異なるタイプの関節の各（ｘ，ｙ）座標を、画像毎にカメラ１１４から受け取る。 Point correspondences are established between all camera pairs that have overlapping fields of view for a given scene. Since each camera has a different field of view of the same 3D scene, the point correspondence is two pixel positions (one position from each camera with overlapping field of view) representing the projection of the same point in the 3D scene. For external calibration, a number of point correspondences are identified for each 3D scene using the results of the image recognition engines 112a-112n. The image recognition engine identifies joint positions as (x,y) coordinates, eg, row and column numbers, of pixels in the 2D image plane of each camera 114 . In one embodiment, the joint is one of nineteen different types of joints of the articulated subject. As the articulated object is moved through the fields of view of the different cameras, the tracking engine 110 captures the (x,y) coordinates of each of the 19 different types of joints of the articulated object used for calibration from the cameras 114 for each image. receive from

例えば、カメラＡからの画像と、カメラＢからの画像との両方が同じ時点に、重なり合う視野で撮影された場合を考える。カメラＡからの画像には、カメラＢからの同期画像のピクセルに対応するピクセルがあり、カメラＡとカメラＢの両方の視野内の或る物体または表面の特定の点があり、その点が両方の画像フレームのピクセルに取り込まれていると考える。外部カメラ較正では、多数のそのような点が識別され、対応点と呼ばれる。較正中にカメラＡ及びカメラＢの視野内に１つの多関節被写体があるので、この多関節被写体の主要な関節、例えば左手首の中心が識別される。これらの主要な関節がカメラＡ及びカメラＢの両方からの画像フレーム内に見える場合、これらは対応点を表すと仮定される。この処理は、多くの画像フレームについて繰り返され、重なり合う視野を有する全てのカメラ・ペアについて対応点の大きな集合を構築する。一実施形態では、画像が３０ＦＰＳ(フレーム／秒）以上のレートで、フルＲＧＢ(赤、緑、及び青）カラーで７２０ピクセルの解像度で、全てのカメラからストリーミングされる。これらの画像は、一次元配列（フラット配列とも呼ばれる）の形態である。 For example, consider the case where an image from camera A and an image from camera B were both captured at the same point in time with overlapping fields of view. There are pixels in the image from camera A that correspond to pixels in the synchronized image from camera B, and there are particular points on some object or surface within the fields of view of both camera A and camera B that are both image frame pixels. For external camera calibration, a number of such points are identified and called corresponding points. Since there is one articulated object in the field of view of camera A and camera B during calibration, the main joint of this articulated object, eg the center of the left wrist, is identified. If these major joints are visible in the image frames from both camera A and camera B, they are assumed to represent corresponding points. This process is repeated for many image frames to build a large set of corresponding points for all camera pairs with overlapping fields of view. In one embodiment, images are streamed from all cameras at a rate of 30 FPS (frames per second) or higher at a resolution of 720 pixels in full RGB (red, green, and blue) color. These images are in the form of one-dimensional arrays (also called flat arrays).

多関節被写体について上記で収集された多数の画像を使用して、重なり合う視野を有するカメラ間の対応点を決定することができる。重なり合う視野を有する２つのカメラＡ及びＢを考える。カメラＡ、Ｂのカメラ中心と３Ｄシーンの関節位置（特徴点ともいう）を通る平面を「エピポーラ平面」と呼び、エピポーラ平面とカメラＡ、Ｂの２Ｄ画像平面との交差箇所を「エピポーラ線」と定義する。これらの対応点が与えられると、カメラＡからの対応点を、カメラＢの画像フレーム内の対応点と交差することが保証されるカメラＢの視野内のエピポーラ線に正確にマッピングすることができる変換が決定される。多関節被写体について上記で収集された画像フレームを使用して、変換が生成される。この変換は非線形であることが当技術分野で知られている。更に、一般形態では、投影された空間へ及び投影された空間から移動する非線形座標変換と同様に、それぞれのカメラのレンズの半径方向の歪み補正が必要であることが知られている。外部カメラ較正では、理想的な非線形変換への近似が非線形最適化問題を解くことによって決定される。この非線形最適化機能は、重なり合う視野を有するカメラ１１４の画像を処理する様々な画像認識エンジン１１２ａ～１１２ｎの出力（関節データ構造の配列）内の同じ関節を識別するために、被写体追跡エンジン１１０によって使用される。内部カメラ較正及び外部カメラ較正の結果は、較正データベース１７０に格納される。 The multiple images collected above for an articulated subject can be used to determine corresponding points between cameras with overlapping fields of view. Consider two cameras A and B with overlapping fields of view. A plane passing through the camera centers of cameras A and B and the joint positions (also called feature points) of the 3D scene is called an "epipolar plane", and the intersection of the epipolar plane and the 2D image planes of cameras A and B is an "epipolar line". defined as Given these correspondence points, the correspondence points from camera A can be accurately mapped to epipolar lines in camera B's field of view that are guaranteed to intersect the corresponding points in camera B's image frame. A transform is determined. A transform is generated using the image frames collected above for the articulated subject. This transformation is known in the art to be non-linear. Furthermore, in general form, it is known that radial distortion correction of each camera lens is required, as well as non-linear coordinate transformations moving into and out of the projected space. In external camera calibration, an approximation to the ideal nonlinear transform is determined by solving a nonlinear optimization problem. This nonlinear optimization function is performed by the object tracking engine 110 to identify the same joints in the output (array of joint data structures) of the various image recognition engines 112a-112n processing images of the camera 114 with overlapping fields of view. used. The results of internal camera calibration and external camera calibration are stored in calibration database 170 .

実空間におけるカメラ１１４の画像内の点の相対位置を決定するための様々な手法を使用することができる。例えば、Ｌｏｎｇｕｅｔ－Ｈｉｇｇｉｎｓが、「Ａｃｏｍｐｕｔｅｒａｌｇｏｒｉｔｈｍｆｏｒｒｅｃｏｎｓｔｒｕｃｔｉｎｇａｓｃｅｎｅｆｒｏｍｔｗｏｐｒｏｊｅｃｔｉｏｎｓ」（Ｎａｔｕｒｅ、第２９３巻、１９８１年９月１０日）を公表している。本論文では、２つの投影間の空間的関係が未知であるとき、遠近投影の相関ペアからシーンの３次元構造を計算することが提示されている。Ｌｏｎｇｕｅｔ－Ｈｉｇｇｉｎｓの論文は、実空間での各カメラの他のカメラに対する位置を決定する手法を提示する。更に、その手法は、実空間における多関節被写体の三角測量を可能にし、重なり合う視野を有するカメラ１１４からの画像を使用してｚ座標の値（フロアからの高さ）を識別する。実空間の任意の点、例えば、実空間の一角の棚ユニットの端を、実空間の（ｘ，ｙ，ｚ）座標系上の（０，０，０）点とする。 Various techniques can be used to determine the relative positions of points in the image of camera 114 in real space. For example, Longuet-Higgins has published "A computer algorithm for reconstructing a scene from two projects" (Nature, Vol. 293, September 10, 1981). In this paper, it is presented to compute the 3D structure of a scene from correlated pairs of perspective projections when the spatial relationship between the two projections is unknown. The Longuet-Higgins paper presents a technique for determining the position of each camera relative to other cameras in real space. In addition, the technique allows triangulation of articulated objects in real space, using images from cameras 114 with overlapping fields of view to identify z-coordinate values (height above floor). An arbitrary point in the real space, for example, the end of a corner shelf unit in the real space, is defined as the (0, 0, 0) point on the (x, y, z) coordinate system in the real space.

本技術の一実施形態では、外部較正のパラメータが２つのデータ構造に格納される。第１のデータ構造は、固有パラメータを格納する。固有パラメータは、３Ｄ座標から２Ｄ画像座標への射影変換を表す。第１のデータ構造は以下に示すように、カメラ毎の固有パラメータを含む。データ値は全て浮動小数点数値である。このデータ構造は、「Ｋ」及び歪み係数として表される３×３固有行列を格納する。歪み係数は、６つの半径方向歪み係数と２つの接線方向歪み係数とを含む。半径方向の歪みは、光線がその光学的中心よりも、レンズの縁部の近傍でより大きく屈曲するときに生じる。接線方向の歪みは、レンズと像平面が平行でないときに生じる。以下のデータ構造は、第１のカメラのみの値を示す。同様のデータが全てのカメラ１１４に対して記憶される。

{
1: {
K: [[x, x, x], [x, x, x], [x, x, x]],
distortion _coefficients: [x, x, x, x, x, x, x, x]
},
......
} In one embodiment of the present technology, parameters for external calibration are stored in two data structures. A first data structure stores intrinsic parameters. The intrinsic parameters represent the projective transformation from 3D coordinates to 2D image coordinates. The first data structure contains specific parameters for each camera, as shown below. All data values are floating point numbers. This data structure stores a 3x3 eigenmatrix denoted as 'K' and the distortion coefficients. The distortion coefficients include 6 radial distortion coefficients and 2 tangential distortion coefficients. Radial distortion occurs when a ray of light bends more near the edge of the lens than at its optical center. Tangential distortion occurs when the lens and image plane are not parallel. The data structure below shows the values for the first camera only. Similar data is stored for all cameras 114 .

{
1: {
K: [[x, x, x], [x, x, x], [x, x, x]],
distortion_coefficients: [x, x, x, x, x, x, x, x]
},
......
}

第２のデータ構造は、カメラ・ペア毎に、３×３基本行列（Ｆ）、３×３必須行列（Ｅ）、３×４投影行列（Ｐ）、３×３回転行列（Ｒ）、及び３×１平行移動ベクトル（ｔ）を記憶する。このデータは、１つのカメラの基準フレーム内の点を別のカメラの基準フレームに変換するために使用される。カメラの各ペアについて、１つのカメラから別のカメラへフロア２２０の平面をマッピングするために、８つのホモグラフィ係数も記憶される。基本行列は、同じシーンの２つの画像間の関係であり、シーンからの点の投影が両方の画像において起こり得る場所を制約する。必須行列は、カメラが較正されている状態での、同じシーンの２つの画像間の関係でもある。投影行列は、３Ｄ実空間から部分空間へのベクトル空間投影を与える。回転行列は、ユークリッド空間における回転を実行するために使用される。平行移動ベクトル「ｔ」は、図形または空間の全ての点を所与の方向に同じ距離だけ移動させる幾何学的変形を表す。ホモグラフィ・フロア係数は、重なり合う視野を有するカメラによって見られるフロア２２０上の被写体の特徴の画像を結合するために使用される。第２のデータ構造を以下に示す。同様のデータが、全てのカメラ・ペアについて記憶される。前述のように、ｘは浮動小数点数値を表す。

{
1: {
2: {
F: [[x, x, x], [x, x, x], [x, x, x]],
E: [[x, x, x], [x, x, x], [x, x, x]],
P: [[x, x, x, x], [x, x, x, x], [x, x, x, x]],
R: [[x, x, x], [x, x, x], [x, x, x]],
t: [x, x, x],
homography_floor_coefficients: [x, x, x, x, x, x, x, x]
}
},
.......
}

［２次元マップ及び３次元マップ］ The second data structure contains, for each camera pair, a 3x3 fundamental matrix (F), a 3x3 essential matrix (E), a 3x4 projection matrix (P), a 3x3 rotation matrix (R), and Store the 3×1 translation vector (t). This data is used to transform points in one camera's frame of reference to another camera's frame of reference. Eight homography coefficients are also stored for each pair of cameras to map the plane of the floor 220 from one camera to another. A fundamental matrix is a relationship between two images of the same scene that constrains where projections of points from the scene can occur in both images. The essential matrix is also the relationship between two images of the same scene with the cameras calibrated. A projection matrix gives a vector space projection from the 3D real space to a subspace. Rotation matrices are used to perform rotations in Euclidean space. A translation vector 't' represents a geometric transformation that moves all points in a figure or space the same distance in a given direction. The homography floor coefficients are used to combine images of object features on the floor 220 viewed by cameras with overlapping fields of view. A second data structure is shown below. Similar data is stored for all camera pairs. As mentioned above, x represents a floating point value.

{
1: {
2: {
F: [[x, x, x], [x, x, x], [x, x, x]],
E: [[x, x, x], [x, x, x], [x, x, x]],
P: [[x, x, x, x], [x, x, x, x], [x, x, x, x]],
R: [[x, x, x], [x, x, x], [x, x, x]],
t: [x, x, x],
homography_floor_coefficients: [x, x, x, x, x, x, x, x]
}
},
.......
}

[Two-dimensional map and three-dimensional map]

ショッピングストア内の棚等の在庫位置は、固有識別子（例えば、棚ＩＤ）によって識別することができる。同様に、ショッピングストアは、固有識別子（例えば、店舗ＩＤ）によって識別することができる。２次元（２Ｄ）及び３次元（３Ｄ）マップ・データベース１４０は、それぞれの座標に沿った実空間のエリア内の在庫位置を識別する。例えば、２Ｄマップでは、マップ内の位置が、図３に示されるように、フロア２２０に垂直に形成された平面、すなわちＸＺ平面上の２次元領域を規定する。マップは、在庫商品が配置される在庫位置のエリアを規定する。図３において、棚ユニットＢ２０４内の棚１の２Ｄビュー３６０は、４つの座標位置（ｘ１，ｚ１）、（ｘ１，ｚ２）、（ｘ２，ｚ２）、及び（ｘ２，ｚ１）によって形成されるエリアを示し、在庫商品が棚１上に配置される２Ｄ領域を規定する。同様の２Ｄ領域が、ショッピングストア内の全ての棚ユニット（または他の在庫陳列構造）内の全ての在庫位置に対して規定される。この情報は、マップ・データベース１４０に記憶される。 An inventory location, such as a shelf within a shopping store, may be identified by a unique identifier (eg, shelf ID). Similarly, a shopping store may be identified by a unique identifier (eg, store ID). Two-dimensional (2D) and three-dimensional (3D) map databases 140 identify inventory locations within areas of real space along their respective coordinates. For example, in a 2D map, locations within the map define a two-dimensional area on a plane formed perpendicular to the floor 220, the XZ plane, as shown in FIG. The map defines areas of inventory locations in which inventory items are located. In FIG. 3, a 2D view 360 of shelf 1 in shelf unit B 204 is formed by four coordinate locations (x1,z1), (x1,z2), (x2,z2), and (x2,z1). Denotes an area and defines a 2D region in which inventory items are placed on the shelf 1 . A similar 2D region is defined for every inventory location within every shelf unit (or other inventory display structure) within the shopping store. This information is stored in map database 140 .

３Ｄマップでは、マップ内の位置が、Ｘ、Ｙ、及びＺ座標によって定義される３Ｄ実空間内の３次元領域を規定する。マップは、在庫商品が配置される在庫位置の容積を規定する。図３において、棚ユニットＢ２０４内の棚１の３Ｄビュー３５０は、３Ｄ領域を規定する８つの座標位置（ｘ１，ｙ１，ｚ１）、（ｘ１，ｙ１，ｚ２）、（ｘ１，ｙ２，ｚ１）、（ｘ１，ｙ２，ｚ２）、（ｘ２，ｙ１，ｚ１）、（ｘ２，ｙ１，ｚ２）、（ｘ２，ｙ２，ｚ１）、（ｘ２，ｙ２，ｚ２）によって形成される容積を示し、在庫商品は、棚１上のその３Ｄ領域内に配置される。同様の３Ｄ領域が、ショッピングストア内の全ての棚ユニット内の在庫位置について規定され、マップ・データベース１４０内に実空間（ショッピングストア）の３Ｄマップとして格納される。図３に示すように、３つの軸に沿った座標位置を使用して、在庫位置の長さ、深さ、及び高さを計算することができる。 In a 3D map, locations in the map define three-dimensional regions in real 3D space defined by X, Y, and Z coordinates. The map defines the inventory location volume in which inventory items are to be placed. In FIG. 3, a 3D view 350 of shelf 1 in shelf unit B 204 has eight coordinate locations (x1, y1, z1), (x1, y1, z2), (x1, y2, z1) defining a 3D region. , (x1, y2, z2), (x2, y1, z1), (x2, y1, z2), (x2, y2, z1), (x2, y2, z2); is placed within that 3D region on the shelf 1 . Similar 3D regions are defined for inventory locations within all shelf units within the shopping store and stored in map database 140 as a 3D map of the real space (shopping store). As shown in FIG. 3, coordinate locations along three axes can be used to calculate the length, depth, and height of an inventory location.

一実施形態では、マップが、実空間のエリア内の在庫陳列構造上の在庫位置の部分と相関する容積のユニットの構成を識別する。各部分は、実空間の３つの軸に沿った開始位置及び終了位置によって規定される。在庫位置の部分の同様の構成は、陳列構造の前面図を分割する２Ｄマップ在庫位置を使用して生成することもできる。

［関節データ構造］ In one embodiment, the map identifies configurations of units of volume that correlate with portions of inventory locations on an inventory display structure within an area of real space. Each portion is defined by a start and end position along three axes of real space. A similar configuration of the inventory location portion can also be generated using a 2D map inventory location that divides the front view of the display structure.

[Joint data structure]

画像認識エンジン１１２ａ～１１２ｎは、カメラ１１４からの画像シーケンスを受け取り、画像を処理して、関節データ構造の対応する配列を生成する。システムは、複数のカメラによって生成された画像シーケンスを使用して、実空間のエリア内の複数の被写体（またはショッピングストア内の顧客）の位置を追跡する処理ロジックを含む。一実施形態では、画像認識エンジン１１２ａ～１１２ｎが、在庫商品を取っているまたは置いている可能性のあるエリア内の被写体を識別するために使用可能な画像の各要素における被写体の１９個の可能な関節の内の１つを識別する。可能な関節は、足関節と非足関節の２つのカテゴリに分類することができる。関節分類の１９番目のタイプは、被写体の全ての非関節特徴（すなわち、関節として分類されない画像の要素）に対するものである。他の実施形態では、画像認識エンジンが特に手の位置を識別するように構成されてもよい。また、ユーザ・チェックイン手順またはバイオメトリック識別処理等の他の技法を、被写体を識別し、被写体が店舗内を移動する際に被写体の手の検出された位置と被写体をリンクさせる目的のために展開することができる。
足関節：
足首関節（左右）
非足関節：
首
鼻
眼（左右）
耳（左右）
肩（左右）
肘（左右）
手首（左右）
尻（左右）
膝（左右）
非関節
Image recognition engines 112a-112n receive the image sequence from camera 114 and process the images to generate corresponding arrays of joint data structures. The system includes processing logic that tracks the location of multiple objects (or customers in a shopping store) within an area of real space using image sequences generated by multiple cameras. In one embodiment, the image recognition engine 112a-112n identifies 19 possible objects in each element of the image that can be used to identify objects within areas that may be taking or placing inventory. identify one of the joints Possible joints can be classified into two categories: ankle joints and non-ankle joints. A nineteenth type of joint classification is for all non-joint features of the subject (ie, image elements that are not classified as joints). In other embodiments, the image recognition engine may be configured to specifically identify hand positions. We also use user check-in procedures or other techniques, such as biometric identification processes, for the purposes of identifying subjects and linking subjects with the detected positions of their hands as they move through the store. can be expanded.
Ankle:
Ankle joint (left and right)
Non-ankle:
neck
nose
eye (left and right)
ear (left and right)
shoulder (left and right)
Elbow (left and right)
wrist (left and right)
buttocks (left and right)
knee (left and right)
Non-joint

特定の画像の関節データ構造の配列は、関節タイプ、特定の画像の時間、及び特定の画像内の要素の座標によって、特定の画像の要素を分類する。一実施形態では画像認識エンジン１１２ａ～１１２ｎが畳み込みニューラル・ネットワーク（ＣＮＮ）であり、関節タイプは被写体の１９種類の関節のうちの１つ、特定の画像の時間は特定の画像についてソースカメラ１１４によって生成された画像のタイムスタンプであり、座標（ｘ，ｙ）は２Ｄ画像平面上の要素の位置を特定する。 An array of joint data structures for a particular image groups elements of a particular image by joint type, time of the particular image, and coordinates of the element within the particular image. In one embodiment, the image recognition engines 112a-112n are convolutional neural networks (CNNs), the joint type is one of 19 types of joints of the subject, and the time of a particular image is determined by the source camera 114 for a particular image. The timestamp of the generated image, where the coordinates (x,y) specify the position of the element on the 2D image plane.

ＣＮＮの出力は、カメラ当たりの各画像に対する信頼度配列の行列である。信頼度配列の行列は、関節データ構造の配列に変換される。図４に示すような関節データ構造４００は、各関節の情報を記憶するために使用される。関節データ構造６００は、画像が受信されるカメラの２Ｄ画像空間内の特定の画像内の要素のｘ位置及びｙ位置を識別する。関節番号は、識別された関節のタイプを識別する。例えば、一実施形態では、値は１～１９の範囲である。値１は関節が左足首であることを示し、値２は関節が右足首であることを示し、以下同様である。関節のタイプは、ＣＮＮの出力行列内のその要素に対する信頼度配列を使用して選択される。例えば、一実施形態では、左足首関節に対応する値がその画像要素の信頼度配列において最も高い場合、関節番号の値は「１」である。 The output of the CNN is a matrix of confidence arrays for each image per camera. The matrix of confidence arrays is converted to an array of joint data structures. A joint data structure 400 as shown in FIG. 4 is used to store information for each joint. The joint data structure 600 identifies the x- and y-positions of elements within a particular image within the 2D image space of the camera from which the image is received. The joint number identifies the type of joint identified. For example, in one embodiment, the values range from 1-19. A value of 1 indicates that the joint is the left ankle, a value of 2 indicates that the joint is the right ankle, and so on. The joint type is selected using the confidence array for that element in the CNN's output matrix. For example, in one embodiment, the value of the joint number is "1" if the value corresponding to the left ankle joint is the highest in the confidence array for that image element.

信頼度数は、その関節を予測する際のＣＮＮの信頼度の程度を示す。信頼度数の値が高ければ、ＣＮＮは自身の予想に確信していることになる。関節データ構造を一意に識別するために、関節データ構造に整数ＩＤが割り当てられる。上記マッピングに続いて、画像毎の信頼度配列の出力行列５４０は、画像毎の関節データ構造の配列に変換される。一実施形態では、関節分析が、各入力画像に対して、ｋ最近傍、ガウス混合、及び、様々な画像形態変換の組み合わせを実行することを含む。この結果は、各時点において画像数をビットマスクにマッピングするリング・バッファ内にビットマスクの形式で格納することができる関節データ構造の配列を含む。

［被写体追跡エンジン］ The confidence number indicates how confident the CNN is in predicting that joint. A high confidence number value indicates that the CNN is confident in its predictions. To uniquely identify a joint data structure, an integer ID is assigned to the joint data structure. Following the above mapping, the output matrix 540 of per-image confidence arrays is transformed into an array of per-image joint data structures. In one embodiment, joint analysis includes performing a combination of k-nearest neighbors, Gaussian mixtures, and various image morphology transformations on each input image. The result contains an array of joint data structures that can be stored in the form of bitmasks in a ring buffer that maps the number of images at each time point to a bitmask.

[Subject Tracking Engine]

追跡エンジン１１０は、重なり合う視野を有するカメラからの画像シーケンス内の画像に対応する、画像認識エンジン１１２ａ～１１２ｎによって生成された関節データ構造の配列を受信するように構成される。画像当たりの関節データ構造の配列は、画像認識エンジン１１２ａ～１１２ｎによってネットワーク１８１を介して追跡エンジン１１０に送られる。追跡エンジン１１０は、様々な画像シーケンスに対応する関節データ構造の配列内の要素の座標を、実空間内の座標を有する候補関節に変換する。実空間内の位置は、２つ以上のカメラの視野によってカバーされている。追跡エンジン１１０は、実空間における座標（関節のコンステレーション）を有する候補関節のセットを、実空間における被写体として識別するためのロジックを備える。一実施形態では、追跡エンジン１１０が、所与の時点における全てのカメラについて、画像認識エンジンからの関節データ構造の配列を蓄積し、候補関節のコンステレーションを識別するために使用されるように、この情報を辞書として被写体データベース１４０に格納する。辞書は、キー値ペアの形式で編成することができ、ここで、キーはカメラＩＤであり、値はカメラからの関節データ構造の配列である。斯かる実施形態では、この辞書が候補関節を決定し、関節を被写体に割り当てるために、ヒューリスティックス・ベースの分析で使用される。斯かる実施形態では、追跡エンジン１１０の高レベル入力、処理、及び出力が表１に示されている。候補関節を組み合わせて被写体を生成し、実空間のエリア内の被写体の動きを追跡する被写体追跡エンジン１１０によって適用されるロジックの詳細は、２０１８年８月２１日発行の米国特許第１０，０５５，８５３号、「画像認識エンジンを用いた被写体の認識及び追跡」に示されており、これは、参照により本明細書に組み込まれる。

表１：例示的な実施形態における被写体追跡エンジン１１０からの入力、処理、及び出力

［被写体データ構造］ Tracking engine 110 is configured to receive an array of joint data structures generated by image recognition engines 112a-112n corresponding to images in an image sequence from cameras having overlapping fields of view. An array of joint data structures per image is sent to tracking engine 110 over network 181 by image recognition engines 112a-112n. The tracking engine 110 transforms the coordinates of the elements in the array of joint data structures corresponding to the various image sequences into candidate joints having coordinates in real space. A position in real space is covered by the fields of view of two or more cameras. The tracking engine 110 comprises logic for identifying a set of candidate joints having coordinates in real space (a constellation of joints) as an object in real space. In one embodiment, the tracking engine 110 accumulates an array of joint data structures from the image recognition engine for all cameras at a given time, so that it can be used to identify a constellation of candidate joints. This information is stored in the object database 140 as a dictionary. The dictionary can be organized in the form of key-value pairs, where the key is the camera ID and the value is an array of joint data structures from the camera. In such embodiments, this dictionary is used in a heuristics-based analysis to determine candidate joints and assign joints to subjects. In such an embodiment, the high-level inputs, processing, and outputs of tracking engine 110 are shown in Table 1. Details of the logic applied by the object tracking engine 110 that combines candidate joints to generate an object and track the movement of the object within an area of real space are described in US Pat. 853, "Object Recognition and Tracking Using Image Recognition Engines", which is incorporated herein by reference.

Table 1: Inputs, Processing, and Outputs from Object Tracking Engine 110 in an Exemplary Embodiment

[Subject data structure]

被写体追跡エンジン１１０は、ヒューリスティックを用いて、画像認識エンジン１１２ａ～１１２によって識別された被写体の関節を接続する。その際、被写体追跡エンジン１１０は、新しい被写体を生成し、それぞれの関節位置を更新することによって既存の被写体の位置を更新する。被写体追跡エンジン１１０は、三角測量技法を用いて、関節位置を２Ｄ空間座標（ｘ，ｙ）から３Ｄ実空間座標（ｘ，ｙ，ｚ）へ投影する。図５は、被写体を格納するための被写体データ構造５００を示す。該データ構造５００は、被写体関連データをキー値辞書として格納する。キーはフレームＩＤであり、値は別のキー値辞書であり、ここでは、キーはカメラＩＤであり、値は（被写体の）１８個の関節と実空間内のそれらの位置のリストである。被写体データは、被写体データベースに格納される。新しい被写体毎に、被写体データベース内の被写体のデータにアクセスするために使用される固有識別子も割り当てられる。 The object tracking engine 110 uses heuristics to connect the joints of the objects identified by the image recognition engines 112a-112. In doing so, the object tracking engine 110 creates new objects and updates the positions of existing objects by updating their joint positions. The object tracking engine 110 uses triangulation techniques to project joint positions from 2D space coordinates (x,y) to 3D real space coordinates (x,y,z). FIG. 5 shows an object data structure 500 for storing objects. The data structure 500 stores subject-related data as a key-value dictionary. The key is the frame ID and the value is another key-value dictionary, where the key is the camera ID and the value is a list of the 18 joints (of the subject) and their positions in real space. Subject data is stored in a subject database. Each new subject is also assigned a unique identifier that is used to access the subject's data in the subject database.

一実施形態では、システムが被写体の関節を識別し、被写体の骨格を作成する。骨格は、実空間に投影され、実空間における被写体の位置及び向きを示す。これは、マシンビジョンの分野では「姿勢推定」とも呼ばれる。一実施形態では、システムがグラフィカル・ユーザ・インターフェース（ＧＵＩ）上に実空間内の被写体の向き及び位置を表示する。一実施形態では、被写体識別及び画像分析は匿名であり、すなわち、関節分析によって作成された被写体に割り当てられた固有識別子は、上述したように、被写体の個人識別情報を識別しない。 In one embodiment, the system identifies the joints of the subject and creates the skeleton of the subject. The skeleton is projected onto the real space and indicates the position and orientation of the subject in the real space. This is also called "pose estimation" in the field of machine vision. In one embodiment, the system displays the orientation and position of the object in real space on a graphical user interface (GUI). In one embodiment, the subject identification and image analysis is anonymous, ie, the unique identifier assigned to the subject produced by the joint analysis does not identify the subject's personal identifying information, as described above.

この実施形態では、関節データ構造の時系列分析によって生成された、識別された被写体の関節のコンステレーションを使用して、被写体の手の位置を見つけることができる。例えば、手首関節単独の位置、または手首関節と肘関節との組み合わせの投影に基づく位置を使用して、識別された被写体の手の位置を識別することができる。

［在庫イベント］ In this embodiment, a constellation of the identified subject's joints generated by time-series analysis of the joint data structure can be used to locate the subject's hand position. For example, the position of the hand of the identified subject can be identified using the position of the wrist joint alone, or the position based on the projection of the combination of the wrist and elbow joints.

[Inventory event]

図６は、通路１１６ａの上面図６１０において棚ユニットＢ２０４の棚から在庫商品を取り出す被写体２４０を示す。開示される技術は、複数のカメラ内の少なくとも２つのカメラによって生成される画像シーケンスを使用して、在庫イベントの位置を見つける。単一の被写体の関節は、それぞれの画像チャネル内の複数のカメラの画像フレーム内に現れ得る。ショッピングストアの例では、被写体は、実空間のエリア内を移動し、在庫位置から商品を取り出し、また、在庫位置に商品を置き戻す。一実施形態では、システムが、ＷｈａｔＣＮＮ及びＷｈｅｎＣＮＮと呼ばれる畳み込みニューラル・ネットワークのパイプラインを使用して、在庫イベント（置くことまたは取ること、プラスまたはマイナス・イベントとも呼ばれる）を予測する。 FIG. 6 shows subject 240 removing inventory from a shelf in shelving unit B 204 in top view 610 of aisle 116a. The disclosed technique uses image sequences generated by at least two cameras in a plurality of cameras to locate inventory events. A single subject's joints may appear in the image frames of multiple cameras in each image channel. In the shopping store example, the subject moves through an area in real space, picks up items from inventory locations, and places items back into inventory locations. In one embodiment, the system predicts inventory events (also called put or take, plus or minus events) using a pipeline of convolutional neural networks called WhatCNN and WhenCNN.

被写体データ構造中の関節によって識別された被写体と、カメラ当たりの画像フレームのシーケンスからの対応する画像フレームとを含むデータセットは、有界ボックス生成器への入力として与えられる。有界ボックス生成器は、データセットを処理して、画像シーケンス内の画像内の識別された被写体の手の画像を含む有界ボックスを指定するロジックを実装する。有界ボックス生成器は、例えば、それぞれのソース画像フレームに対応する多関節データ構造５００内の手首関節（それぞれの手に対する）と肘関節の位置を使用して、カメラ毎に各ソース画像フレーム内の手の位置を識別する。被写体データ構造内の関節の座標が３Ｄ実空間座標内の関節の位置を示す一実施形態では、有界ボックス生成器が、関節位置を３Ｄ実空間座標からそれぞれのソース画像の画像フレーム内の２Ｄ座標にマッピングする。 A data set containing the objects identified by the joints in the object data structure and the corresponding image frames from the sequence of image frames per camera is provided as input to the bounding box generator. A bounding box generator implements logic that processes the dataset to specify a bounding box containing an image of an identified subject's hand in the images in the image sequence. The bounding box generator generates, for each camera, in each source image frame using, for example, the positions of the wrist joints (relative to each hand) and elbow joints in the articulated data structure 500 corresponding to each source image frame. identify the hand position of the In one embodiment, where the coordinates of the joints in the object data structure indicate the positions of the joints in 3D real space coordinates, the bounding box generator converts the joint positions from 3D real space coordinates to 2D Map to coordinates.

有界ボックス生成器は、カメラ１１４毎に循環バッファ内の画像フレーム内の手のための有界ボックスを作成する。一実施形態では有界ボックスが、画像フレームの１２８ピクセル（幅）×１２８ピクセル（高さ）部分であり、手は有界ボックスの中心に位置する。他の実施形態では、有界ボックスのサイズが６４ピクセル×６４ピクセルまたは３２ピクセル×３２ピクセルである。カメラからの画像フレーム内のｍ個の被写体について、最大２ｍ個の手、従って２ｍ個の有界ボックスが存在し得る。しかしながら、実際には、他の被写体または他の物体による遮蔽のために、２ｍより少ない手が画像フレーム内で見える。１つの例示的な実施形態では、被写体の手の位置が肘関節及び手首関節の位置から推測される。例えば、被写体の右手の位置は、右肘の位置（ｐ１として識別される）及び右手首の位置（ｐ２として識別される）を用いて、外挿量×(ｐ２－ｐ１)＋p２として外挿される。ここで外挿量は０．４である。別の実施形態では、関節ＣＮＮ１１２ａ～１１２ｎが左手画像及び右手画像を使用してトレーニングされる。従って、斯かる実施形態では、関節ＣＮＮ１１２ａ～１１２ｎがカメラ当たりの画像フレーム内の手の位置を直接識別する。画像フレーム当たりの手の位置は、識別された手当たりの有界ボックスを生成するために有界ボックス生成器によって使用される。 A bounding box generator creates bounding boxes for the hands in the image frames in the circular buffer for each camera 114 . In one embodiment, the bounding box is a 128 pixel (width) by 128 pixel (height) portion of the image frame, and the hand is centered in the bounding box. In other embodiments, the size of the bounding box is 64 pixels by 64 pixels or 32 pixels by 32 pixels. For m objects in an image frame from a camera, there can be up to 2m hands and thus 2m bounding boxes. However, in practice, less than 2m hands are visible in the image frame due to occlusion by other subjects or other objects. In one exemplary embodiment, the subject's hand position is inferred from the elbow and wrist joint positions. For example, the position of the subject's right hand is extrapolated as the extrapolation amount×(p2−p1)+p2 using the position of the right elbow (identified as p1) and the position of the right wrist (identified as p2). . Here the extrapolation amount is 0.4. In another embodiment, joint CNNs 112a-112n are trained using left and right hand images. Thus, in such embodiments, the joint CNNs 112a-112n directly identify the hand position within the image frame per camera. The hand positions per image frame are used by the bounding box generator to generate a bounding box for the identified hand.

ＷｈａｔＣＮＮは、識別された被写体の手の分類を生成するために、画像内の指定された有界ボックスを処理するようにトレーニングされた畳み込みニューラル・ネットワークである。１つの訓練されたＷｈａｔＣＮＮは、１つのカメラからの画像フレームを処理する。ショッピングストアの例示的な実施形態では、各画像フレーム内の各手について、ＷｈａｔＣＮＮは手が空であるかどうかを識別する。ＷｈａｔＣＮＮは、また、手の中の在庫商品のＳＫＵ(在庫管理単位）番号、手の中の商品を示す信頼値が非ＳＫＵ商品（すなわち、ショッピングストア在庫に属さない）、及び画像フレーム内の手の位置の状況を識別する。 WhatCNN is a convolutional neural network trained to process a specified bounding box in an image to generate a class of hands for identified subjects. One trained WhatCNN processes image frames from one camera. In the shopping store example embodiment, for each hand in each image frame, WhatCNN identifies whether the hand is empty. WhatCNN also determines the SKU (Stock Keeping Unit) number of the item in hand, the confidence value indicating the item in hand is a non-SKU item (i.e., does not belong to shopping store inventory), and the hand in the image frame. to identify the situation of the location of

全てのカメラ１１４のＷｈａｔＣＮＮモデルの出力は、所定の時間帯の間、単一のＷｈｅｎＣＮＮモデルによって処理される。ショッピングストアの例では、ＷｈｅｎＣＮＮが被写体の両手について時系列分析を実行して、被写体が棚から店舗在庫商品を取るか、または店舗在庫商品を棚に置くかを識別する。開示された技術は、複数のカメラの内の少なくとも２つのカメラによって生成された画像シーケンスを用いて、在庫イベントの位置を見つける。ＷｈｅｎＣＮＮは、少なくとも２つのカメラからの画像シーケンスからのデータセットの分析を実行して、３次元における在庫イベントの位置を決定し、在庫イベントに関連する商品を識別する。ある期間にわたる被写体当たりのＷｈｅｎＣＮＮの出力の時系列分析が実行されて、在庫イベント及びそれらの発生時間が識別される。この目的のために、非最大抑制（ＮＭＳ）アルゴリズムが使用される。１つの在庫イベント（すなわち、被写体による商品を置くことまたは取ること）がＷｈｅｎＣＮＮによって複数回（同じカメラ及び複数のカメラの両方から）検出されると、ＮＭＳは、被写体に対する余分なイベントを除去する。ＮＭＳは、２つの主要なタスク、すなわち、余分な検出にペナルティを課す「マッチングロス」と、より良好な検出が手近に存在するかどうかを知るための近隣の「関節処理」とを含む再スコアリング技術である。 The outputs of the WhatCNN models for all cameras 114 are processed by a single WhenCNN model for a given period of time. In the shopping store example, WhenCNN performs a time series analysis on the subject's hands to identify whether the subject is taking the store inventory item from the shelf or placing the store inventory item on the shelf. The disclosed technique uses image sequences generated by at least two of the plurality of cameras to locate inventory events. WhenCNN performs analysis of datasets from image sequences from at least two cameras to determine the location of inventory events in three dimensions and to identify items associated with inventory events. A time series analysis of WhenCNN's output per subject over time is performed to identify inventory events and their time of occurrence. A non-maximum suppression (NMS) algorithm is used for this purpose. When one inventory event (i.e., placing or picking up an item by a subject) is detected multiple times by WhenCNN (both from the same camera and multiple cameras), the NMS eliminates redundant events for the subject. NMS rescores include two main tasks: 'matching loss', which penalizes redundant detections, and neighborhood 'joint processing' to know if there are better detections at hand. ring technology.

各被写体に対する取ること及び置くことの真のイベントは、真のイベントを有する画像フレームの前の３０画像フレームに対するＳＫＵロジットの平均を計算することによって更に処理される。最後に、最大値の引数(ａｒｇｍａｘまたはａｒｇｍａｘと略す）を使用して、最大値を決定する。ａｒｇｍａｘ値によって分類された在庫商品は、棚に置かれたまたは棚から取られた在庫商品を識別するために使用される。開示された技術は、在庫に関連する在庫イベントを被写体のログ・データ構造（または、ショッピングカート・データ構造）に割り当てることで、在庫イベントを被写体に帰属させる。在庫商品は、それぞれの被写体のＳＫＵ（ショッピングカートまたはバスケットとも呼ばれる）のログに追加される。在庫イベントの検出につながった画像フレームの画像フレーム識別子「フレームＩＤ」も、識別されたＳＫＵとともに格納される。在庫イベントを被写体に帰属させるロジックは、在庫イベントの位置と複数の顧客中の顧客の一人の位置とをマッチングさせる。例えば、画像フレームは、被写体データ構造５００を用いて在庫イベントとして分類されるシーケンス中の少なくとも１時点における被写体の手の位置によって表される在庫イベントの３Ｄ位置を識別するのに使用することができ、そして、商品が取り出されたか、または置かれた場所からの在庫位置を決定するのに使用することができる。開示された技術は、複数のカメラの内の少なくとも２つのカメラによって生成された画像シーケンスを用いて、在庫イベントの位置を見つけ、在庫イベント・データ構造を作成する。一実施形態では、在庫イベント・データ構造は、商品識別子、置くまたは取るインジケータ、実空間のエリアの３次元の座標、及びタイムスタンプを格納する。一実施形態では、在庫イベントは、在庫イベント・データベース１５０に格納される。 The true take and put events for each subject are further processed by calculating the average of the SKU logits for the 30 image frames preceding the image frame with the true event. Finally, the maximum value argument (abbreviated as arg max or argmax) is used to determine the maximum value. Inventory items sorted by argmax values are used to identify inventory items that have been shelved or taken off the shelf. The disclosed technology attributes inventory events to subjects by assigning inventory-related inventory events to the subject's log data structure (or shopping cart data structure). Inventory items are added to a log of SKUs (also called shopping carts or baskets) for each subject. The image frame identifier "frame ID" of the image frame that led to detection of the inventory event is also stored with the identified SKU. The inventory event attributed logic matches the location of the inventory event with the location of one of the customers. For example, an image frame can be used to identify the 3D location of an inventory event represented by the subject's hand position at at least one point in the sequence that is classified as an inventory event using the subject data structure 500. , and can be used to determine the inventory location from where the item was picked or placed. The disclosed technique uses image sequences generated by at least two of the cameras to locate inventory events and create an inventory event data structure. In one embodiment, an inventory event data structure stores an item identifier, a put or take indicator, three-dimensional coordinates of an area in real space, and a timestamp. In one embodiment, inventory events are stored in inventory event database 150 .

在庫イベント（空間のエリア内の被写体による在庫商品の置くこと及び取ること）の位置は、被写体が商品を取り出した、または商品を置いた棚等の在庫位置を識別するために、店舗のプラノグラムまたは他のマップと比較することができる。例示６６０は、在庫イベントに関連付けられた手の位置６４０からの最短距離を計算することによる、棚ユニット内の棚の判定を示す。次に、この棚の判定は、棚の在庫データ構造を更新するために使用される。図７に、例示的な在庫データ構造７００（ログ・データ構造とも呼ばれる）が示されている。この在庫データ構造は、被写体、棚または店舗の在庫をキー値辞書として記憶する。キーは、被写体、棚または店舗の固有識別子であり、値は、別のキー値辞書であり、この場合、キーが在庫管理単位（ＳＫＵ）のような商品識別子であり、値が在庫イベント予測をもたらした画像フレームの「フレームＩＤ」と共に商品の数量を識別する番号である。フレーム識別子（「フレームＩＤ」）は、在庫商品と被写体、棚、または店舗との関連をもたらす在庫イベントの識別をもたらした画像フレームを識別するために使用することができる。他の実施形態では、ソースカメラを識別する「カメラＩＤ」をフレームＩＤと組み合わせて、在庫データ構造７００内に格納することもできる。一実施形態では、フレームが被写体の手を有界ボックス内に有するので、「フレームＩＤ」は被写体識別子である。他の実施形態では、実空間のエリア内の被写体を明示的に識別する「被写体ＩＤ」のような他のタイプの識別子を、被写体を識別するために使用することができる。 The location of inventory events (placement and removal of inventory by a subject within an area of space) can be used to identify inventory locations, such as shelves, where the subject has picked or placed an item, in store planograms. Or you can compare it with other maps. Illustration 660 shows determining a shelf within a shelving unit by calculating the shortest distance from the hand position 640 associated with the inventory event. This shelf determination is then used to update the shelf inventory data structure. An exemplary inventory data structure 700 (also called a log data structure) is shown in FIG. This inventory data structure stores subject, shelf or store inventory as a key-value dictionary. The key is a unique identifier for an object, shelf or store and the value is another key-value dictionary where the key is a product identifier such as a stock keeping unit (SKU) and the value is an inventory event prediction. Along with the "frame ID" of the resulting image frame is a number that identifies the quantity of the item. A frame identifier (“frame ID”) can be used to identify the image frame that resulted in the identification of the inventory event that resulted in the association of the inventory item with the subject, shelf, or store. In other embodiments, a “camera ID” that identifies the source camera may be combined with the frame ID and stored within inventory data structure 700 . In one embodiment, the "frame ID" is the subject identifier, since the frame has the subject's hands within the bounding box. In other embodiments, other types of identifiers, such as "subject IDs" that explicitly identify the subject within an area of real space, can be used to identify the subject.

棚在庫データ構造が、被写体のログ・データ構造と統合されると、棚在庫は、顧客が棚から取り出した商品の数量を反映するように減少される。顧客が商品を棚に置くか、または、従業員が商品を棚にストックした場合、商品は、それぞれの在庫位置の在庫データ構造に追加される。ある期間にわたって、この処理は、ショッピングストア内の全ての在庫位置についての棚在庫データ構造の更新をもたらす。実空間のエリア内の在庫位置の在庫データ構造を統合して、その時点における店舗内の各ＳＫＵの商品の総数を示す実空間のエリアの在庫データ構造を更新する。一実施形態では、そのような更新が各在庫イベントの後に実行される。別の実施形態では、店舗在庫データ構造は定期的に更新される。 Once the shelf inventory data structure is integrated with the subject's log data structure, the shelf inventory is decremented to reflect the quantity of items the customer has picked up from the shelf. When a customer places an item on a shelf or an employee stocks an item on a shelf, the item is added to the inventory data structure for each inventory location. Over time, this process results in updating the shelf inventory data structure for all inventory locations within the shopping store. Integrate the inventory data structures of the inventory locations in the real space area to update the real space area inventory data structure showing the total number of items for each SKU in the store at that point in time. In one embodiment, such updates are performed after each inventory event. In another embodiment, the store inventory data structure is updated periodically.

在庫イベントを検出するＷｈａｔＣＮＮ及びＷｈｅｎＣＮＮの実施態様の詳細は、２０１８年２月２７日出願の米国特許出願第１５／９０７，１１２号、「画像認識を用いた商品を置くこと及び取ることの検出」に示されており、これは、本明細書に完全に記載されているかのように、参照により本明細書に組み込まれる。

［リアルタイム棚及び店舗在庫更新］ Details of implementations of WhatCNN and WhenCNN that detect inventory events can be found in US patent application Ser. , which is incorporated herein by reference as if fully set forth herein.

[Real-time shelf and store inventory update]

図８は、実空間のエリア内の棚在庫構造を更新する処理ステップを示すフローチャートである。処理はステップ８０２から始まる。ステップ８０４では、システムが、実空間のエリア内での取るまたは置くイベントを検出する。在庫イベントは、在庫イベント・データベース１５０に記憶される。在庫イベント・レコードは、ＳＫＵ等の商品識別子、タイムスタンプ、３次元ｘ、ｙ、及びｚに沿った位置を示す実空間の３次元エリア内のイベントの位置を含む。在庫イベントは、また、置くまたは取るインジケータを含み、被写体が商品を棚に置いたかどうか（プラス在庫イベントとも呼ばれる）、または商品を棚から取り出したかどうか（マイナス在庫イベントとも呼ばれる）を識別する。在庫イベント情報は、被写体追跡エンジン１１０からの出力と組み合わされて、この在庫イベントに関連する被写体を識別する。次に、この分析の結果を使用して、在庫データベース１６０内の被写体のログ・データ構造（ショッピングカート・データ構造とも呼ばれる）を更新する。一実施形態では、被写体識別子（例えば、「被写体ＩＤ」）が在庫イベント・データ構造に格納される。 FIG. 8 is a flowchart showing the processing steps for updating the shelf inventory structure within an area in real space. Processing begins at step 802 . At step 804, the system detects a take or put event within an area of real space. Inventory events are stored in inventory event database 150 . An inventory event record includes an item identifier, such as an SKU, a timestamp, and the location of the event within a three-dimensional area of real space indicating the location along three dimensions x, y, and z. Inventory events also include a put or take indicator to identify whether the subject has put the item on the shelf (also called a plus inventory event) or removed the item from the shelf (also called a minus inventory event). The inventory event information is combined with the output from object tracking engine 110 to identify the object associated with this inventory event. The results of this analysis are then used to update the subject log data structure (also called the shopping cart data structure) in inventory database 160 . In one embodiment, a subject identifier (eg, "subject ID") is stored in the inventory event data structure.

システムは、在庫イベントに関連する被写体の手の位置（ステップ８０６）を使用して、ステップ８０８において、在庫陳列構造（上記の棚ユニットとも呼ばれる）内の最も近い棚の位置を見つけることができる。店舗在庫エンジン１８０は、ショッピングストア内の在庫位置のｘｚ平面（フロア２２０に垂直）上の２次元（２Ｄ）領域またはエリアまでの手の距離を計算する。在庫位置の２Ｄ領域は、ショッピングストアのマップ・データベース１４０内に格納される。手が実空間内の点Ｅ(ｘ_event，ｙ_event，ｚ_event）で表されるとする。実空間内の点Ｅから該平面上の任意の点Ｐまでの最短距離Ｄは、該平面に対する法線ベクトルｎ上にベクトルＰＥを射影することによって決定することができる。既存の数学的技法を使用して、在庫位置の２Ｄ領域を表す全ての平面に対する手の距離を計算することができる。 Using the subject's hand position (step 806) associated with the inventory event, the system can locate the nearest shelf within the inventory display structure (also referred to as the shelving unit above) in step 808. Store inventory engine 180 calculates hand distances to two-dimensional (2D) regions or areas on the xz plane (perpendicular to floor 220) of inventory locations within the shopping store. The 2D regions of inventory locations are stored in the shopping store's map database 140 . Let the hand be represented by a point E(x _event , y _event , z _event ) in real space. The shortest distance D from a point E in real space to any point P on the plane can be determined by projecting the vector PE onto the normal vector n to the plane. Existing mathematical techniques can be used to calculate the distance of the hand to all planes representing the 2D area of inventory locations.

一実施形態では、開示される技術は、在庫イベントの位置から在庫陳列構造上の在庫位置までの距離を計算すること、及び計算された距離に基づいて在庫イベントを在庫位置とマッチングさせることを含む手順を実行することによって、在庫イベントの位置を在庫位置とマッチングさせる。例えば、在庫イベントの位置から最短距離にある在庫位置（棚等）が選択され、この棚の在庫データ構造がステップ８１０で更新される。一実施形態では、在庫イベントの位置が、実空間の３つの座標に沿った被写体の手の位置によって決定される。在庫イベントが、ケチャップのボトルが被写体によって取られたことを示す取るイベント（またはマイナスイベント）である場合、棚の在庫は、ケチャップボトルの数を１つ減らすことによって更新される。同様に、在庫イベントが、被写体が棚にケチャップのボトルを置くことを示す置くイベントである場合は、ケチャップボトル数を１つ増やして、棚の在庫を更新する。同様に、店舗の在庫データ構造もそれに応じて更新される。在庫位置に置かれた商品の数量は、店舗在庫データ構造において同じ数だけ増分される。同様に、在庫位置から取られた商品の数量は、在庫データベース１６０内の店舗の在庫データ構造から差し引かれる。 In one embodiment, the disclosed technique includes calculating the distance from the location of the inventory event to the inventory location on the inventory display structure and matching the inventory event to the inventory location based on the calculated distance. A procedure is performed to match inventory event locations with inventory locations. For example, the closest inventory location (eg, shelf) from the location of the inventory event is selected and the inventory data structure for that shelf is updated in step 810 . In one embodiment, the position of the inventory event is determined by the position of the subject's hand along three coordinates in real space. If the inventory event is a take event (or a negative event) indicating that a bottle of ketchup has been taken by the subject, the shelf inventory is updated by decreasing the number of ketchup bottles by one. Similarly, if the inventory event is a put event indicating that the subject places a bottle of ketchup on the shelf, the number of ketchup bottles is incremented by one and the inventory on the shelf is updated. Similarly, the store's inventory data structure is updated accordingly. The quantity of the item placed in the inventory location is incremented by the same number in the store inventory data structure. Similarly, the quantity of items taken from the inventory location is subtracted from the store's inventory data structure in inventory database 160 .

ステップ８１２において、プラノグラムがショッピングストアに対し利用可能であるか、或いは、プラノグラムが利用可能であること知り得るかがチェックされる。プラノグラムは、ショッピングストア内の在庫位置に在庫商品をマッピングするデータ構造であり、これは、店舗内の在庫商品の配置のための計画に基づくことができる。ショッピングストアに対しプラノグラムが利用可能である場合、ステップ８１４で、被写体によって棚に置かれた商品が、プラノグラム内の棚上の商品と比較される。一実施形態では、開示される技術は、在庫イベントがプラノグラムとマッチングしない在庫位置とマッチングする場合に、誤配置された商品を決定するロジックを含む。例えば、在庫イベントに関連付けられた商品のＳＫＵが、在庫位置における在庫商品の配置とマッチングする場合、商品の位置は正しく（ステップ８１６）、マッチングしない場合、商品は誤配置されている。一実施形態では、ステップ８１８において、誤配置された商品を現在庫位置（棚等）から取り出し、プラノグラムに従ってその正しい在庫位置に移動させるための通知が、従業員に送信される。システムは、ステップ８２０において、被写体がショッピングストアから出て行くかどうかを、店舗の出口までのスピード、向き、及び近さを用いてチェックする。被写体が店舗から出て行こうとしていない場合（ステップ８２０）、処理はステップ８０４において再開する。そうでない場合、被写体が店舗を出て行くと判定された場合、ステップ８２２で、被写体のログ・データ構造（またはショッピングカート・データ構造）及び店舗の在庫データ構造が統合される。 At step 812, it is checked if a planogram is available for the shopping store or if it is known that a planogram is available. A planogram is a data structure that maps inventory items to inventory locations within a shopping store, which can be based on a plan for placement of inventory items within the store. If a planogram is available for the shopping store, at step 814 the items placed on the shelves by the subject are compared to the items on the shelves in the planogram. In one embodiment, the disclosed technology includes logic to determine misplaced items when an inventory event matches an inventory location that does not match the planogram. For example, if the SKU of the item associated with the inventory event matches the placement of the inventory item in the inventory location, then the item's location is correct (step 816), otherwise the item is misplaced. In one embodiment, at step 818, a notification is sent to the employee to remove the misplaced item from its current inventory location (eg, shelf) and move it to its correct inventory location according to the planogram. The system checks at step 820 whether the subject is leaving the shopping store using the speed, orientation and proximity to the store exit. If the subject is not leaving the store (step 820 ), processing resumes at step 804 . Otherwise, if the subject is determined to be leaving the store, then at step 822 the subject's log data structure (or shopping cart data structure) and the store's inventory data structure are merged.

一実施形態では、ステップ８１０において、被写体のショッピングカート・データ構造内の商品が店舗在庫から差し引かれない場合、統合はこれらの商品を店舗在庫データ構造から差し引くことを含む。このステップにおいて、システムは、低識別信頼度スコアを有する被写体のショッピングカート・データ構造内の商品を識別し、店舗出口付近に位置する店舗従業員に通知を送信することもできる。次いで、従業員は、顧客のショッピングカート内の低識別信頼度スコアの商品を確認することができる。この処理は、顧客のショッピングカート内の全ての商品を顧客のショッピングカート・データ構造と比較することを店舗従業員に要求せず、信頼度スコアが低い商品のみがシステムによって店舗従業員に対して識別され、店舗従業員によって確認される。処理は、ステップ８２４で終了する。

［リアルタイム棚及び店舗在庫更新のためのアーキテクチャ］ In one embodiment, at step 810, if the items in the subject's shopping cart data structure are not deducted from the store inventory, the consolidation includes deducting those items from the store inventory data structure. In this step, the system may also identify items in the subject's shopping cart data structure that have low identification confidence scores and send notifications to store employees located near the store exit. The employee can then review the low identity confidence score items in the customer's shopping cart. This process does not require the store employee to compare all items in the customer's shopping cart to the customer's shopping cart data structure, only items with low confidence scores are sent to the store employee by the system. Identified and verified by store personnel. Processing ends at step 824 .

[Architecture for real-time shelf and store inventory updates]

顧客在庫、在庫位置（例えば棚）在庫、及び、店舗在庫（例えば店舗全体）データ構造が、ショッピングストア内の顧客による商品を置くこと及び取ることを使用して更新されるシステムのアーキテクチャ例を、図９Ａに示す。図９Ａはアーキテクチャ図であるため、説明の明確性を向上させるために、特定の詳細は省略されている。図９Ａに示すシステムは、複数のカメラ１１４から画像フレームを受信する。上述のように、一実施形態では、カメラ１１４が、画像が同時に、または時間的に近く、かつ同じ画像キャプチャレートで取得されるように、互いに時間的に同期させることができる。同時にまたは時間的に近い実空間のエリアをカバーする全てのカメラにおいて取得された画像は、同期された画像が実空間において固定された位置を有する被写体のある時点での様々な光景を表すものとして処理エンジンにおいて識別されることができるという意味で同期される。画像は、カメラ毎に画像フレームの循環バッファ９０２内に格納される。 An example architecture of a system in which customer inventory, inventory location (e.g., shelf) inventory, and store inventory (e.g., store-wide) data structures are updated using placing and picking items by customers within a shopping store: Shown in FIG. 9A. Since FIG. 9A is an architectural diagram, certain details have been omitted to improve the clarity of the description. The system shown in FIG. 9A receives image frames from multiple cameras 114 . As described above, in one embodiment, the cameras 114 can be temporally synchronized with each other such that images are acquired at the same time, or close in time and at the same image capture rate. Images acquired at all cameras covering an area of real space at the same time or close in time are assumed to represent different views at a point in time of an object having a fixed position in real space. Synchronized in the sense that they can be identified in the processing engine. Images are stored in a circular buffer 902 of image frames for each camera.

「被写体識別」サブシステム９０４（第１の画像プロセッサとも呼ばれる）は、カメラ１１４から受け取った画像フレームを処理して、実空間内の被写体を識別し追跡する。第１の画像プロセッサは、実空間内の被写体の関節を検出する被写体画像認識エンジンを含む。関節は接続され被写体を形成し、そして、実空間内での移動として追跡される。被写体は匿名であり、内部の識別子「被写体ＩＤ」を使用して追跡される。 An “object identification” subsystem 904 (also referred to as a first image processor) processes image frames received from camera 114 to identify and track objects in real space. The first image processor includes a subject image recognition engine that detects joints of the subject in real space. The joints are connected to form an object and tracked as it moves in real space. Subjects are anonymous and tracked using an internal identifier "Subject ID".

「領域提案」サブシステム９０８（第３の画像プロセッサとも呼ばれる）は、前景画像認識エンジンを含み、複数のカメラ１１４から対応する画像シーケンスを受信し、前景内の意味的に重要な物体（すなわち、顧客、顧客の手、及び在庫商品）が、各カメラからの画像において経時的に、在庫商品を置くこと及び取ることに関連するときに、当該物体を認識する。領域提案サブシステム９０８は、また、被写体識別サブシステム９０４の出力を受信する。第３の画像プロセッサは、カメラ１１４からの画像シーケンスを処理して、対応する画像シーケンス内の画像に表される前景変化を識別し且つ分類する。第３の画像プロセッサは、識別された前景変化を処理して、識別された被写体による在庫商品を取ることの検出、及び、識別された被写体による在庫陳列構造上の在庫商品を置くことの第１の検出セットを作成する。一実施形態では、第３の画像プロセッサは、上述のＷｈａｔＣＮＮ等の畳み込みニューラル・ネットワーク（ＣＮＮ）モデルを備える。第１の検出セットは、在庫商品を置くこと及び取ることの前景検出とも呼ばれる。この実施形態では、ＷｈａｔＣＮＮの出力が第２の畳み込みニューラル・ネットワーク（ＷｈｅｎＣＮＮ）で処理されて、在庫位置に在庫商品の置くイベント、及び、顧客及び店舗の従業員による在庫陳列構造内の在庫位置上の在庫商品の取るイベントを識別する第１の検出セットを作成する。領域提案サブシステムの詳細は、２０１８年２月２７日出願の米国特許出願第１５／９０７，１１２号、「画像認識を用いた商品を置くこと及び取ることの検出」に示されており、これは、本明細書に完全に記載されているかのように、参照により本明細書に組み込まれる。 A “Region Proposal” subsystem 908 (also called a third image processor), which includes a foreground image recognition engine, receives corresponding image sequences from multiple cameras 114 and identifies semantically significant objects in the foreground (i.e., The customer, the customer's hand, and the inventory) recognize the object as it relates to placing and taking inventory over time in the images from each camera. Region suggestion subsystem 908 also receives the output of object identification subsystem 904 . A third image processor processes the image sequences from camera 114 to identify and classify foreground changes represented in images in the corresponding image sequences. A third image processor processes the identified foreground changes to detect removal of inventory by the identified subject and placement of inventory on the inventory display structure by the identified subject. Create a detection set for . In one embodiment, the third image processor comprises a convolutional neural network (CNN) model, such as the WhatCNN mentioned above. The first set of detections is also called foreground detection of inventory put and take. In this embodiment, the output of WhatCNN is processed in a second convolutional neural network (WhenCNN) to determine the events of placing inventory items in inventory locations and the events of inventory placement within the inventory display structure by customers and store employees. create a first set of detections that identify take events for inventory items in . Details of the Region Proposal Subsystem are set forth in U.S. Patent Application Serial No. 15/907,112, entitled "Item Placement and Pickup Detection Using Image Recognition," filed February 27, 2018, which describes: is incorporated herein by reference as if fully set forth herein.

別の実施形態では、アーキテクチャが、在庫商品を置くこと及び取ることを検出し、これらの置くこと及び取ることをショッピングストア内の被写体に関連付けるために、第３の画像プロセッサと並列に使用することができる「意味的差分抽出」サブシステム（第２の画像プロセッサとも呼ばれる）を含む。この意味的差分抽出サブシステムは、背景画像認識エンジンを含み、背景画像認識エンジンは、複数のカメラから対応する画像シーケンスを受信し、例えば、背景（すなわち、棚のような在庫陳列構造）内の意味的に重要な差異が、各カメラからの画像において経時的に、在庫商品を置くこと及び取ることに関連するときに、当該差異を認識する。第２の画像プロセッサは、被写体識別サブシステム９０４の出力と、カメラ１１４からの画像フレームとを入力として受け取る。「意味的差分抽出」サブシステムの詳細は、２０１８年４月４日出願の米国特許第１０，１２７，４３８号、「意味的差分抽出を使用した在庫イベントの予測」、及び、２０１８年４月４日出願の米国特許出願第１５／９４５，４７３号、「前景／背景処理を使用した在庫イベントの予測」に示されており、これらの両方は本明細書に完全に記載されているかのように、参照により本明細書に組み込まれる。第２の画像プロセッサは、識別された背景変化を処理して、識別された被写体による在庫商品を取ることの検出、及び、識別された被写体による在庫陳列構造上の在庫商品を置くことの第２の検出セットを作成する。第２の検出セットは、在庫商品を置くこと及び取ることの背景検出とも呼ばれる。ショッピングストアの例では、第２の検出が、在庫位置から取られた在庫商品、または、顧客または店舗従業員によって在庫位置上に置かれた在庫商品を識別する。意味的差分抽出サブシステムは、識別された背景変化を識別された被写体に関連付けるロジックを含む。 In another embodiment, the architecture for use in parallel with a third image processor to detect the placing and picking of inventory items and associate these placing and pickings with objects in the shopping store. a "semantic difference extraction" subsystem (also called a second image processor) that can The semantic difference extraction subsystem includes a background image recognition engine that receives corresponding image sequences from multiple cameras and extracts, for example, images in a background (i.e., an inventory display structure such as a shelf). Semantically significant differences are recognized when they relate to placing and taking inventory items over time in the images from each camera. A second image processor receives as inputs the output of object identification subsystem 904 and the image frames from camera 114 . Details of the "Semantic Differencing" subsystem can be found in US Pat. No. 15/945,473, filed May 4, entitled "Predicting Inventory Events Using Foreground/Background Processing," both of which are fully set forth herein. , incorporated herein by reference. A second image processor processes the identified background change to detect removal of inventory by the identified subject and placement of inventory on the inventory display structure by the identified subject. Create a detection set for . The second set of detections is also referred to as background detection of inventory put and take. In the shopping store example, the second detection identifies inventory items taken from an inventory location or placed on an inventory location by a customer or store employee. The semantic difference extraction subsystem includes logic for associating the identified background change with the identified object.

図９Ａに記載されるシステムは、第１及び第２の検出セットを処理して、識別された被写体についての在庫商品のリストを含むログ・データ構造を生成するための選択ロジックを含む。実空間内の置くこと及び取ることのために、選択ロジックは、意味的差分抽出サブシステムまたは領域提案サブシステム９０８の何れかからの出力を選択する。一実施形態では、選択ロジックが、第１の検出セットについて意味的差分抽出サブシステムによって生成された信頼度スコアと、第２の検出セットについて領域提案サブシステムによって生成された信頼度スコアとを使用して、選択を行う。特定の検出に対するより高い信頼度スコアを有するサブシステムの出力が選択され、識別された被写体に関連付けられた在庫商品及びその数量のリストを含むログ・データ構造７００（ショッピングカート・データ構造とも呼ばれる）を生成するために使用される。棚及び店舗在庫データ構造は、上述したように被写体のログ・データ構造を用いて更新される。 The system described in FIG. 9A includes selection logic for processing the first and second detection sets to generate a log data structure containing a list of inventory items for the identified subject. For putting and taking in real space, the selection logic selects the output from either the semantic difference extraction subsystem or the region proposal subsystem 908 . In one embodiment, the selection logic uses a confidence score generated by the semantic difference extraction subsystem for the first detection set and a confidence score generated by the region proposal subsystem for the second detection set. to make a selection. The output of the subsystem with the higher confidence score for a particular detection is selected and a log data structure 700 (also called a shopping cart data structure) containing a list of inventory items and their quantities associated with the identified subject. used to generate The shelf and store inventory data structures are updated using the subject log data structure as described above.

被写体出口検出エンジン９１０は、顧客が出口ドアに向かって移動しているかどうかを判定し、信号を店舗在庫エンジン１９０に送信する。店舗在庫エンジンは、顧客のログ・データ構造７００内の１以上の商品が第２または第３のイメージ・プロセッサによって判定された低識別信頼度スコアを有するかどうかを判定する。もし有する場合、在庫統合エンジンは、顧客によって購入された商品を確認するために、出口近くに位置する店舗従業員に通知を送る。被写体、在庫位置、及びショッピングストアの在庫データ構造は、在庫データベース１６０に記憶される。 Object exit detection engine 910 determines whether the customer is moving towards the exit door and sends a signal to store inventory engine 190 . The store inventory engine determines whether one or more items in the customer's log data structure 700 have low identification confidence scores determined by the second or third image processors. If so, the inventory consolidation engine notifies the store employee located near the exit to confirm the items purchased by the customer. Objects, inventory locations, and shopping store inventory data structures are stored in inventory database 160 .

図９Ｂは、顧客在庫、在庫位置（例えば棚）在庫、及び、店舗在庫（例えば店舗全体）データ構造が、ショッピングストア内の顧客による商品を置くこと及び取ることを使用して更新される、システムの別のアーキテクチャを示す。図９Ａは、アーキテクチャ図であるため、説明の明確性を向上させるために、特定の詳細は省略されている。上述したように、システムは、複数の同期されたカメラ１１４から画像フレームを受信する。ＷｈａｔＣＮＮ９１４は、画像認識エンジンを使用して、実空間（ショッピングストア等）のエリア内の顧客の手の中にある商品を判定する。一実施形態では、カメラ１１４毎に１つのＷｈａｔＣＮＮがあり、それぞれのカメラによって生成された画像フレームのシーケンスの画像処理を実行する。ＷｈｅｎＣＮＮ９１２は、ＷｈａｔＣＮＮの出力の時系列分析を実行して、置くまたは取るイベントを識別する。商品及び手の情報と共に在庫イベントがデータベース９１８に記憶される。次いで、この情報は、個人・商品属性コンポーネント９２０によって、顧客追跡エンジン１１０（上記では被写体追跡エンジン１１０とも呼ばれる）によって生成された顧客情報と組み合わされる。ショッピングストア内の顧客のログ・データ構造７００は、データベース９１８に記憶された顧客情報をリンクすることによって生成される。 FIG. 9B shows a system in which customer inventory, inventory location (e.g., shelf) inventory, and store inventory (e.g., store-wide) data structures are updated using placing and taking items by customers within a shopping store. shows another architecture of Since FIG. 9A is an architectural diagram, certain details have been omitted to improve the clarity of the description. As described above, the system receives image frames from multiple synchronized cameras 114 . WhatCNN 914 uses an image recognition engine to determine items in a customer's hand within an area of real space (such as a shopping store). In one embodiment, there is one WhatCNN per camera 114 that performs image processing on the sequence of image frames produced by the respective camera. WhenCNN 912 performs a time series analysis of WhatCNN's output to identify put or take events. Inventory events are stored in database 918 along with item and hand information. This information is then combined with the customer information generated by the customer tracking engine 110 (also referred to above as the object tracking engine 110) by the person and product attributes component 920. FIG. A log data structure 700 of customers in the shopping store is generated by linking customer information stored in database 918 .

開示される技術は、複数のカメラによって生成される画像のシーケンスを使用して、実空間のエリアからの顧客の離脱を検出する。顧客の離脱の検出に応答して、開示された技術は、顧客に起因する在庫イベントに関連する商品について、メモリ内の店舗在庫を更新する。出口検出エンジン９１０が、ショッピングストアからの顧客「Ｃ」の離脱を検出すると、ログ・データ構造９２２に示されるように顧客「Ｃ」によって購入された商品は店舗の在庫データ構造９２４と統合されて、更新された店舗在庫データ構造９２６を生成する。例えば、図９Ｂに示されるように、顧客は２個の商品１と４個の商品３と１個の商品４を購入した。顧客のログ・データ構造９２２に示されるように顧客「Ｃ」によって購入されたそれぞれの商品の数量は店舗在庫９２４から差し引かれて、商品１の数量が４８から４６に減少し、同様に、商品３及び４の数量が顧客「Ｃ」によって購入された商品３及び商品４のそれぞれの数量のだけ減少したことを示す更新された店舗在庫データ構造９２６を生成する。顧客「Ｃ」は商品２を購入しなかったため、更新された店舗在庫データ構造９２６において、商品２の数量は、現在の店舗在庫データ構造９２４の以前のものと同じままである。 The disclosed technology uses a sequence of images generated by multiple cameras to detect customer departure from an area of real space. In response to detecting customer churn, the disclosed technology updates store inventory in memory for items associated with the customer-attributed inventory event. When exit detection engine 910 detects the exit of customer "C" from the shopping store, the items purchased by customer "C" as shown in log data structure 922 are integrated with store inventory data structure 924. , generates an updated store inventory data structure 926 . For example, as shown in FIG. 9B, a customer purchased two items 1, four items 3, and one item 4. The quantity of each item purchased by customer "C" as shown in customer log data structure 922 is subtracted from store inventory 924, reducing the quantity of item 1 from 48 to 46; Generate an updated store inventory data structure 926 indicating that the quantity of 3 and 4 has been reduced by the respective quantity of item 3 and item 4 purchased by customer "C". Customer “C” did not purchase Item 2, so in the updated store inventory data structure 926, the quantity of Item 2 remains the same as before in the current store inventory data structure 924.

一実施形態では、顧客の離脱検出は、また、顧客が商品を取り出した（ショッピングストアの棚等の）在庫位置の在庫データ構造の更新を始動させる。斯かる実施形態では、在庫位置の在庫データ構造は、上述したように、取るまたは置く在庫イベントの直後には更新されない。この実施形態では、システムが顧客の離脱を検出すると、顧客に関連付けられた在庫イベントがトラバースされ、在庫イベントをショッピングストア内のそれぞれの在庫位置にリンクする。この処理によって決定された在庫位置の在庫データ構造が更新される。例えば、顧客が在庫位置２７から２個の商品１を取った場合、在庫位置２７の在庫データ構造は、商品１の数量を２つ減らすことによって更新される。在庫商品は、ショッピングストア内の複数の在庫位置にストックされ得ることに留意されたい。システムは在庫イベントに対応する在庫位置を識別し、従って、商品が取り出される在庫位置が更新される。

［店舗リアログラム］ In one embodiment, customer abandonment detection also triggers an update of the inventory data structure of the inventory location (eg, shopping store shelf) from which the customer picked up the item. In such embodiments, the inventory data structure for the inventory location is not updated immediately after the take or put inventory event, as described above. In this embodiment, when the system detects customer abandonment, the inventory events associated with the customer are traversed and linked to their respective inventory locations within the shopping store. This process updates the inventory data structure for the determined inventory location. For example, if a customer takes two items 1 from inventory location 27, the inventory data structure for inventory location 27 is updated by decreasing the quantity of item 1 by two. Note that inventory items may be stocked in multiple inventory locations within the shopping store. The system identifies the inventory location corresponding to the inventory event, and the inventory location from which the item is picked is updated accordingly.

[Store rearogram]

ショッピングストアの在庫位置を含む、店舗内の実空間全体にわたる在庫商品の位置は、顧客が在庫位置から商品を取り出し、購入したくない商品を、商品が取り出された同じ棚上の同じ位置に戻すか、商品が取り出された同じ棚上の異なる場所に戻すか、または異なる棚上に置くときに、ある期間にわたって変化する。開示された技術は、複数のカメラ内の少なくとも２つのカメラによって生成された画像シーケンスを使用して、在庫イベントを識別し、在庫イベントに応答して、実空間のエリア内の在庫商品の位置を追跡する。幾つかの実施形態では、ショッピングストア内の商品は、特定の商品が配置されることが計画されている（棚等の）在庫位置を識別するプラノグラムに従って配置される。例えば、図１０の例示９１０に示すように、棚３及び棚４の左半分は、商品（缶の形状でストックされている）に指定されている。一日の始めまたは他の在庫追跡間隔（時間ｔ＝０によって識別される）で、プラノグラムに従って在庫位置がストックされると考える。 The location of inventory items throughout the physical space in the store, including the inventory locations in the shopping store, where the customer picks up the items from the inventory location and returns the items they don't want to buy to the same location on the same shelf from which the item was picked up. or change over time when the item is returned to a different location on the same shelf from which it was taken, or placed on a different shelf. The disclosed technology uses image sequences generated by at least two cameras in a plurality of cameras to identify inventory events and, in response to inventory events, locate inventory items within areas of real space. Chase. In some embodiments, items in a shopping store are arranged according to a planogram that identifies inventory locations (eg, shelves) where particular items are planned to be placed. For example, as shown in illustration 910 of FIG. 10, the left halves of shelves 3 and 4 are designated for merchandise (stocked in the shape of cans). Consider that at the beginning of the day or other inventory tracking interval (identified by time t=0), inventory locations are stocked according to the planogram.

開示される技術は、実空間のエリア内の在庫商品の位置のリアルタイム・マップである任意の時間「ｔ」におけるショッピングストアの「リアログラム」を計算することができ、これは、幾つかの実施形態では更に、店舗内の在庫位置と相関させることができる。リアログラムは、在庫商品及び店舗内の位置を識別し、それらを在庫位置にマッピングすることによって、プラノグラムを作成するために使用することができる。一実施形態では、システムまたは方法が実空間のエリア内に座標を有する複数のセルを規定するデータセットを作成することができる。システムまたは方法は、実空間の座標に沿ったセルの長さを入力パラメータとして使用して、実空間を複数のセルを規定するデータセットに分割することができる。一実施形態では、セルは、実空間のエリア内に座標を有する２次元グリッドとして表される。例えば、セルは、図１０の例示９６０に示されるように、棚ユニット（在庫陳列構造とも呼ばれる）における在庫位置の前面図の２Ｄグリッド（例えば、１フィート間隔で）と相関することができる。各グリッドは、図１０に示すように、ｘ座標やｚ座標のような２次元平面の座標上で、その開始位置と終了位置によって規定される。この情報は、マップ・データベース１４０に記憶される。 The disclosed technique can compute a shopping store's "rearogram" at any time "t", which is a real-time map of the location of inventory items within an area of real space, which can be used in several implementations. The form can also be correlated with inventory locations within the store. Realogograms can be used to create a planogram by identifying inventory items and locations within a store and mapping them to inventory locations. In one embodiment, a system or method may create a data set defining a plurality of cells having coordinates within an area of real space. The system or method may use the length of the cell along the coordinates of the real space as an input parameter to divide the real space into data sets defining multiple cells. In one embodiment, cells are represented as a two-dimensional grid having coordinates within an area of real space. For example, the cells can be correlated with a 2D grid (eg, at 1-foot intervals) of front view inventory locations in a shelving unit (also called an inventory display structure), as shown in illustration 960 of FIG. As shown in FIG. 10, each grid is defined by its start position and end position on two-dimensional plane coordinates such as x-coordinate and z-coordinate. This information is stored in map database 140 .

別の実施形態では、セルが実空間のエリア内に座標を有する３次元（３Ｄ）グリッドとして表される。一例では、セルは、図１１Ａに示すように、ショッピングストア内の棚ユニットの在庫位置（または在庫位置の一部）上の容積と相関することができる。この実施形態では、実空間のマップが、実空間のエリア内の在庫陳列構造上の在庫位置の部分と相関することができる容量の単位の構成を識別する。この情報は、マップ・データベース１４０に記憶される。店舗リアログラム・エンジン１９０は、在庫イベント・データベース１５０を使用して、時間「ｔ」におけるショッピングストアのリアログラムを計算し、それをリアログラム・データベース１７０に格納する。ショッピングストアのリアログラムは、在庫イベント・データベース１５０に記憶された在庫イベントのタイムスタンプを使用することによって、任意の時間ｔにおいて、それらの位置によってセルにマッチングされた在庫イベントに関連付けられた在庫商品を示す。在庫イベントは、商品識別子、置くまたは取るインジケータ、実空間のエリアの３つの軸に沿った位置によって表される在庫イベントの位置、及び、タイムスタンプを含む。 In another embodiment, the cells are represented as a three-dimensional (3D) grid with coordinates within an area of real space. In one example, a cell can be correlated with a volume on an inventory location (or portion of an inventory location) of a shelf unit within a shopping store, as shown in FIG. 11A. In this embodiment, a map of real space identifies a configuration of units of capacity that can be correlated with portions of inventory locations on an inventory structure within an area of real space. This information is stored in map database 140 . Store realog engine 190 uses inventory event database 150 to calculate a realogogram of the shopping store at time “t” and stores it in realog database 170 . The shopping store re-alogram identifies the inventory items associated with the inventory events matched to the cell by their location at any time t by using the inventory event timestamps stored in the inventory event database 150. indicate. An inventory event includes an item identifier, a put or take indicator, the location of the inventory event represented by a position along three axes of an area in real space, and a timestamp.

図１１Ａ中の例示は、ｔ＝０における１日目の始めに、第１の棚ユニット（列方向配列を形成する）における在庫位置の左側部分が「ケチャップ」ボトルを収納することを示す。セル（またはグリッド）の列は、図式的な視覚化において黒色で示され、セルは暗緑色等の他の色でレンダリングすることができる。他の全てのセルは空白のままであり、これらが商品を収納していないことを示す色で塗りつぶされていない。一実施形態では、リアログラム内のセル内の商品の視覚化は、店舗内の（セル内の）その位置を示す一度に１つの商品について生成される。別の実施形態では、リアログラムは、区別するために異なる色を使用して、在庫位置上の商品のセットの位置を表示する。斯かる実施形態では、セルがそのセルにマッチングする在庫イベントに関連付けられた幾つかの商品に対応する複数の色を有することができる。別の実施形態では、他の図式的な視覚化またはテキストベースの視覚化を使用して、セル内のＳＫＵまたは名前を列挙すること等によって、セル内の在庫商品を示す。 The illustration in FIG. 11A shows that at the beginning of day 1 at t=0, the left portion of the inventory positions in the first shelf unit (forming the columnar array) contains "ketchup" bottles. Columns of cells (or grids) are shown in black in the schematic visualization, and cells can be rendered in other colors such as dark green. All other cells are left blank and not filled with a color indicating that they do not contain items. In one embodiment, a visualization of the items within the cell in the realogram is generated one item at a time showing its location (within the cell) within the store. In another embodiment, the realogram displays the location of a set of items on an inventory location using different colors to distinguish. In such embodiments, a cell may have multiple colors corresponding to several items associated with inventory events matching the cell. In another embodiment, other graphical or text-based visualizations are used to show the inventory items in the cell, such as by listing the SKUs or names in the cell.

システムは、在庫イベントのそれぞれのカウントを使用して、特定のセルにマッチングする位置を有する在庫商品について、スコアリング時にＳＫＵスコア（スコアとも呼ばれる）を計算する。セルに対するスコアの計算は、取ること及び置くことのタイムスタンプとスコアリング時との間の分離によって重み付けされた在庫商品を取ること及び置くことの合計を使用する。一実施形態では、スコアはＳＫＵ当たりの在庫イベントの加重平均である。他の実施形態では、ＳＫＵ当たりの在庫イベントのカウント等、様々なスコアリング計算を使用することができる。一実施形態では、システムは、リアログラムを、複数のセル内のセル、及び、該セルのスコアを表す画像として表示する。例えば、図１１Ａ内の例示として、スコアリング時ｔ＝１（例えば、１日後）を考える。時間ｔ＝１のリアログラムは、「ケチャップ」商品のスコアを黒色の異なる濃淡で表している。時間ｔ＝１の店舗リアログラムは、１番目の棚ユニットと２番目の棚ユニット（１番目の棚ユニットの後ろ）の４列全てが「ケチャップ」商品を収納している。「ケチャップ」ボトルのＳＫＵスコアが高いセルは、「ケチャップ」ボトルのスコアが低いセルに比べて、濃い灰色でレンダリングされている。ケチャップのスコア値がゼロのセルは、空白のままではなく、色で塗りつぶされていない。従って、リアログラムは、時間ｔ＝１の後（例えば、１日後）に、ショッピングストア内の在庫位置上のケチャップボトルの位置に関するリアルタイム情報を提示する。リアログラムの生成頻度は、ショッピングストアの管理によって、その要求に応じて設定することができる。また、リアログラムは、店舗管理によってオンデマンドで生成することもできる。一実施形態では、リアログラムによって生成された商品位置情報が、誤配置された商品を識別するために、店舗プラノグラムと比較される。通知は、誤配置された在庫商品を、店舗のプラノグラムに示されるように、指定された在庫位置に戻すことができる店舗従業員に送信することができる。 The system uses each count of inventory events to calculate SKU scores (also called scores) during scoring for inventory items that have a matching position in a particular cell. The calculation of the score for a cell uses the total pick and put inventory weighted by the separation between the pick and put timestamp and the time of scoring. In one embodiment, the score is a weighted average of inventory events per SKU. In other embodiments, various scoring calculations may be used, such as counting inventory events per SKU. In one embodiment, the system displays the realogram as an image representing a cell within a plurality of cells and the score of the cell. For example, as an illustration in FIG. 11A, consider scoring time t=1 (eg, 1 day later). The re-alogram at time t=1 represents the scores for the "ketchup" product in different shades of black. In the store rearogram at time t=1, all four rows of the 1st shelf unit and the 2nd shelf unit (behind the 1st shelf unit) contain the "ketchup" product. Cells with higher SKU scores for "Ketchup" bottles are rendered darker gray compared to cells with lower scores for "Ketchup" bottles. Cells with zero score values for ketchup are not left blank and are not filled with color. Thus, the realogram presents real-time information about the location of ketchup bottles on inventory locations within the shopping store after time t=1 (eg, one day later). The generation frequency of the re-logogram can be set by the shopping store administration according to its requirements. Realog can also be generated on demand by store management. In one embodiment, the item location information generated by the rearogram is compared to the store planogram to identify misplaced items. Notifications can be sent to store personnel who can return misplaced inventory items to designated inventory locations as indicated in the store planogram.

一実施形態では、システムが、複数のセル内のセル及び該セルのスコアを表す表示画像をレンダリングする。図１１Ｂは、ユーザ・インタフェース・ディスプレイ１１０２上に図１１Ａのリアログラムがレンダリングされたコンピューティング・デバイスを示す。リアログラムは、タブレット、モバイル・コンピューティング・デバイス等の他のタイプのコンピューティング・デバイス上に表示することができる。システムは、セルを表す表示画像内において色の変化を使用して、セルのスコアを示すことができる。例えば、図１１Ａでは、ｔ＝０で「ケチャップ」を収納するセルの列をその列の暗緑色のセルによって表すことができる。ｔ＝１では、「ケチャップ」ボトルがセルの１番目の列を越えて複数のセルに分散されている。システムは、セルのスコアを示すために、緑色の異なる濃淡を使用することによって、これらのセルを表すことができる。緑色の濃い色調はより高いスコアを示し、明るい緑色のセルはより低いスコアを示す。ユーザ・インタフェースには、生成されたその他の情報が表示され、機能を呼び出したり表示したりするためのツールが用意されている。

［店舗リアログラムの計算］ In one embodiment, the system renders a display image representing a cell and the score of the cell within a plurality of cells. FIG. 11B shows a computing device with the realogram of FIG. 11A rendered on user interface display 1102 . Realog can be displayed on other types of computing devices such as tablets, mobile computing devices, and the like. The system can indicate the cell's score using color changes in the displayed image representing the cell. For example, in FIG. 11A, the column of cells containing "Ketchup" at t=0 can be represented by the dark green cells in that column. At t=1, the "ketchup" bottles are distributed over multiple cells beyond the first row of cells. The system can represent these cells by using different shades of green to indicate the cell's score. A darker shade of green indicates a higher score and a lighter green cell indicates a lower score. A user interface displays other generated information and provides tools for invoking and viewing functions.

[Calculation of store rearogram]

図１２は、他のタイプの在庫陳列構造に適合させ得る、時間ｔにおける実空間のエリア内における棚のリアログラムを計算するための処理ステップを提示するフローチャートである。処理はステップ１２０２で開始する。ステップ１２０４では、システムは、実空間のエリア内の在庫イベントを在庫イベント・データベース１５０から検索する。在庫イベント・レコードは、商品識別子、置くまたは取るインジケータ、実空間のエリアの３次元（ｘ、ｙ、及びｚ等）内の位置によって表される在庫イベントの位置、及びタイムスタンプを含む。置くまたは取るインジケータは、顧客（被写体とも呼ばれる）が商品を棚に置いたか、棚から商品を取り出したかを識別する。置くイベントはプラス在庫イベントとも呼ばれ、取るイベントはマイナス在庫イベントとも呼ばれる。ステップ１２０６において、在庫イベントは、被写体追跡エンジン１１０からの出力と組み合わされて、この在庫イベントに関連する被写体の手を識別する。 FIG. 12 is a flow chart presenting the processing steps for computing a re-alogram of shelves within an area of real space at time t, which may be adapted to other types of inventory display structures. Processing begins at step 1202 . At step 1204 , the system retrieves inventory events within the area of real space from the inventory event database 150 . An inventory event record includes an item identifier, a put or take indicator, the location of the inventory event represented by its position in three dimensions (such as x, y, and z) of an area in real space, and a timestamp. A put or pick indicator identifies whether a customer (also called a subject) has placed an item on the shelf or taken the item from the shelf. A put event is also called a positive inventory event and a take event is also called a negative inventory event. At step 1206, the inventory event is combined with the output from the subject tracking engine 110 to identify the subject's hand associated with this inventory event.

このシステムは、在庫イベントに関連する被写体の手の位置（ステップ１２０６）を使用して、位置を決定する。幾つかの実施形態では、在庫イベントが、ステップ１２０８において、棚ユニットまたは在庫陳列構造において、最も近い棚、またはその他の可能性のある在庫位置とマッチングさせることができる。図８のフローチャートにおける処理ステップ８０８は、手の位置に最も近い棚上の位置を決定するために使用され得る手法の詳細を示す。ステップ８０８中の手法で説明されているように、実空間内の点Ｅから平面上の任意の点Ｐ（ｘｚ平面上の棚の前面領域を表す）までの最短距離Ｄは、ベクトルＰＥを該平面に対する法線ベクトルｎ上に射影することによって決定され得る。平面に対するベクトルＰＥの交点は、手に対して棚上の最も近い点を与える。この点の位置は「ポイントクラウド」データ構造（ステップ１２１０）に、実空間のエリア内の点の３Ｄ位置、商品のＳＫＵ、及びタイムスタンプを含むタプルとして格納され、後の２つは、在庫イベント・レコードから取得される。在庫イベント・データベース１５０に在庫イベント・レコード（ステップ１２１１）が更に存在する場合、処理ステップ１２０４～１２１０が繰り返される。更に存在しない場合は、処理はステップ１２１４に進む。 The system uses the subject's hand position (step 1206) associated with the inventory event to determine the position. In some embodiments, inventory events may be matched to the nearest shelf or other potential inventory location in a shelving unit or inventory display structure at step 1208 . Process step 808 in the flow chart of FIG. 8 details a technique that may be used to determine the position on the shelf that is closest to the hand position. As described in the method in step 808, the shortest distance D from a point E in real space to any point P on the plane (representing the front area of the shelf on the xz plane) is the vector PE. It can be determined by projecting onto the normal vector n to the plane. The intersection of vector PE with the plane gives the closest point on the shelf to the hand. The location of this point is stored in a "point cloud" data structure (step 1210) as a tuple containing the 3D location of the point in the area of real space, the SKU of the item, and the timestamp, the latter two being the inventory event • Obtained from a record. If there are more inventory event records (step 1211) in inventory event database 150, process steps 1204-1210 are repeated. If there are no more, processing proceeds to step 1214 .

開示された技術は、実空間のエリア内に座標を有する複数のセルを規定するメモリに記憶されたデータセットを含む。セルは、座標軸に沿った開始位置及び終了位置によって境界付けられた実空間のエリアを規定する。実空間のエリアは、複数の在庫位置を含み、複数のセル内のセルの座標は、複数の在庫位置内の在庫位置と相関させることができる。開示された技術は、在庫イベントに関連する在庫商品の位置をセルの座標とマッチングさせ、複数のセル内のセルとマッチングされた在庫商品を表すデータを維持する。一実施形態では、システムは、在庫イベントの位置からデータセット内のセルまでの距離を計算し、計算された距離に基づいて在庫イベントをセルとマッチングさせるための手順（図８のフローチャートのステップ８０８に記載されているように）を実行することによって、在庫イベントの位置に基づいてデータセット内の最も近いセルを決定する。イベント位置と最も近いセルとのこのマッチングは、ポイントクラウド・データが存在するセルを識別するポイントクラウド・データの位置を与える（ステップ１２１２）。一実施形態では、セルが、在庫陳列構造内の在庫位置（棚等）の部分にマッピングすることができる。従って、このマッピングを使用することによって、棚の部分も識別される。上述のように、セルは、実空間のエリアの２Ｄグリッドまたは３Ｄグリッドとして表すことができる。システムは、特定のセルにマッチングする位置を有する在庫商品についてスコアリング時にスコアを計算するロジックを含む。一実施形態では、スコアは在庫イベントのカウントに基づいている。この実施形態では、セルのスコアは、取ること及び置くことのタイムスタンプとスコアリング時との間の分離によって重み付けされた在庫商品を取ること及び置くことの合計を使用する。例えば、スコアは、ＳＫＵ（ＳＫＵスコアとも呼ばれる）毎の加重移動平均とすることができ、セルにマッピングされた「ポイントクラウド」のデータ・ポイントを使用してセル毎に計算される：

The disclosed technique includes a data set stored in memory defining a plurality of cells having coordinates within an area of real space. A cell defines an area of real space bounded by a start position and an end position along a coordinate axis. An area of real space includes a plurality of inventory locations, and coordinates of cells within the plurality of cells can be correlated with inventory locations within the plurality of inventory locations. The disclosed technique matches the locations of inventory items associated with an inventory event with the coordinates of the cells and maintains data representing the cell-matched inventory items in a plurality of cells. In one embodiment, the system calculates the distance from the location of the inventory event to the cell in the dataset, and performs a procedure (step 808 in the flowchart of FIG. 8) for matching the inventory event to the cell based on the calculated distance. ) to determine the closest cell in the dataset based on the location of the inventory event. This matching of the event location with the closest cell provides the location of the point cloud data identifying the cell in which the point cloud data resides (step 1212). In one embodiment, a cell may map to a portion of an inventory location (such as a shelf) within an inventory display structure. Therefore, by using this mapping, the portion of the shelf is also identified. As mentioned above, a cell can be represented as a 2D grid or a 3D grid of areas in real space. The system includes logic that calculates scores during scoring for inventory items that have a matching position in a particular cell. In one embodiment, the score is based on counting inventory events. In this embodiment, the cell's score uses the sum of pick and put inventory weighted by the separation between the pick and put timestamp and the time of scoring. For example, the score can be a weighted moving average per SKU (also called SKU score), calculated for each cell using a "point cloud" of data points mapped to the cell:

式（１）によって計算されるＳＫＵスコアは、セル内のＳＫＵの全てのポイントクラウドのデータ・ポイントのスコアの合計であり、各データ・ポイントは、置く及び取るイベントのタイムスタンプからの日数での時間のポイント＿ｔによって重み付けされる。グリッド内の「ケチャップ」商品に２つのポイントクラウドのデータ・ポイントがあるとする。第１のデータ・ポイントにはリアログラムが計算される時間「ｔ」の２日前にこの在庫イベントが発生したことを示すタイムスタンプがある。従って、ポイント＿ｔの値は「２」になる。第２のデータ・ポイントは時間「ｔ」の１日前に発生した在庫イベントに対応するため、ポイント＿ｔは「１」になる。（棚ＩＤによって識別される棚にマッピングするセルＩＤによって識別される）セルのケチャップのスコアは、以下のように計算される：

The SKU score calculated by formula (1) is the sum of the scores of the data points of all point clouds of the SKU in the cell, where each data point is the number of days since the timestamp of the put and take events. Weighted by time point_t. Suppose there are two point cloud data points for the "ketchup" product in the grid. The first data point has a timestamp indicating that this inventory event occurred two days before the time "t" at which the realogram is calculated. Therefore, the value of point_t is "2". Point_t becomes "1" because the second data point corresponds to an inventory event that occurred one day before time "t". The ketchup score for a cell (identified by the cell ID that maps to the shelf identified by the shelf ID) is calculated as follows:

在庫イベントに対応するポイントクラウドのデータ・ポイントが古くなる（すなわち、イベントからより多くの日が経過する）につれて、ＳＫＵスコアに対するそれらの寄与は減少する。ステップ１２１６では、上位「Ｎ」個のＳＫＵが、ＳＫＵスコアの最も高いセルに対して選択される。一実施形態では、システムは、スコアに基づいてセル毎の在庫商品のセットを選択するロジックを含んでいる。例えば、「Ｎ」の値は、ＳＫＵスコアに基づいてセル毎の上位１０商品を選択するために１０として選択することができる。本実施形態では、リアログラムがセル当たり上位１０商品を記憶する。時間ｔにおける更新されたリアログラムが、ステップ１２１８において、時間ｔにおける棚内のセル当たりの上位「Ｎ」個のＳＫＵを示すリアログラム・データベース１７０に記憶される。処理はステップ１２２０で終了する。 As the point cloud data points corresponding to inventory events age (ie, more days have passed since the event), their contribution to the SKU score decreases. At step 1216, the top 'N' SKUs are selected for the cell with the highest SKU score. In one embodiment, the system includes logic to select a set of inventory items for each cell based on the score. For example, a value of 'N' may be chosen as 10 to select the top 10 products per cell based on SKU score. In this embodiment, the realogram stores the top 10 products per cell. The updated realogram at time t is stored in step 1218 in realogram database 170 indicating the top "N" SKUs per cell in the shelf at time t. Processing ends at step 1220 .

別の実施形態では、開示された技術は、在庫イベントに対応する棚の部分におけるポイントクラウド・データを計算するために、マップ・データベース１４０に格納された棚の部分の２Ｄまたは３Ｄマップを使用しない。この実施形態では、ショッピングストアを表す３Ｄ実空間が３Ｄ立方体（例えば、１フィート立方体）として表されるセルに区分される。３Ｄの手の位置は、（３つの軸に沿ったそれぞれの位置を使用して）セルにマッピングされる。全ての商品に対するＳＫＵスコアは、上述の式（１）を用いてセル毎に計算される。結果として得られるリアログラムは、店舗内の棚の位置を必要とせずに、店舗を表す実空間内のセル内の商品を示す。この実施形態では、ポイントクラウドのデータ・ポイントが、実空間内の座標上で在庫イベントに対応する手の位置と同じ位置にあってもよいし、或いは、手の位置に近いか、または手の位置を包含するエリア内のセルの位置にあってもよい。これは、棚のマップがない可能性があり、従って、手の位置が最も近い棚にマッピングされないためである。このため、この実施形態におけるポイントクラウドのデータ・ポイントは、必ずしも同一平面上にある必要はない。実空間における容積の単位（例えば、１立方フィート）内の全てのポイントクラウドのデータ・ポイントは、ＳＫＵスコアの計算に含まれる。 In another embodiment, the disclosed technology does not use 2D or 3D maps of shelf portions stored in map database 140 to compute point cloud data on shelf portions corresponding to inventory events. . In this embodiment, a 3D physical space representing a shopping store is partitioned into cells represented as 3D cubes (eg, one-foot cubes). 3D hand positions are mapped to cells (using respective positions along three axes). The SKU score for all products is calculated cell by cell using equation (1) above. The resulting re-alogram shows the products in the cells in the real space representing the store, without requiring the locations of the shelves in the store. In this embodiment, the data points of the point cloud may be at the same position on coordinates in real space as the hand position corresponding to the inventory event, or may be closer to the hand position or closer to the hand position. It may be at a cell location within the area containing the location. This is because there may not be a shelf map and therefore the hand position is not mapped to the nearest shelf. Thus, the data points of the point cloud in this embodiment are not necessarily coplanar. All point cloud data points within a unit of volume in real space (eg, 1 cubic foot) are included in the calculation of the SKU score.

幾つかの実施形態では、リアログラムが反復的に計算され、店舗内の活動の時刻分析のために使用されるか、または、店舗内の在庫商品の移動を経時的に表示するためのアニメーション（ストップモーション・アニメーションのような）を生成するために使用され得る。

［店舗リアログラムの応用例］ In some embodiments, a re-logogram is computed iteratively and used for temporal analysis of activity in the store, or an animation ( (such as stop motion animation).

[Example of application of store rearogram]

店舗リアログラムは、ショッピングストアの多くの業務で使用することができる。リアログラムの幾つかの応用例を以下のパラグラフに示す。

［在庫商品の再ストック］ A store realogram can be used in many operations of a shopping store. Some applications of realograms are given in the following paragraphs.

[Restocking stock items]

図１３Ａは、在庫商品が在庫位置（棚等）に再ストックされる必要があるかどうかを決定するための店舗リアログラムの１つのそのような応用例を提示する。処理は、ステップ１３０２で開始する。ステップ１３０４で、システムは、リアログラム・データベース１７０からスコアリング時「ｔ」でリアログラムを検索する。一例では、これは、つい最近に生成されたリアログラムである。リアログラム内の全てのセルのＳＫＵスコアが、ステップ１３０６で閾値スコアと比較される。ＳＫＵスコアが閾値より上の場合（ステップ１３０８）、処理は、次の在庫商品「ｉ」に対してステップ１３０４及び１３０６を繰り返す。プラノグラムを含む実施形態、または、プラノグラムが利用可能な場合では、商品「ｉ」のＳＫＵスコアが、プラノグラム内の在庫商品「ｉ」の配置にマッチングするセルに対する閾値と比較される。別の実施形態では、在庫商品のＳＫＵスコアが、「置く」在庫イベントをフィルタリングすることによって計算される。この実施形態では、ＳＫＵスコアは、閾値と比較し得るリアログラム内のセル当たりの在庫商品「ｉ」の「取る」イベントを反映する。別の実施形態では、セル当たりの「取る」在庫イベントのカウントは、在庫商品「ｉ」の再ストックを決定するための閾値と比較するためのスコアとして、使用することができる。この実施形態では、閾値は、在庫位置にストックされる必要がある在庫商品の最小カウントである。 FIG. 13A presents one such application of a store re-log for determining whether an inventory item needs to be restocked in an inventory location (such as a shelf). Processing begins at step 1302 . At step 1304, the system retrieves the realogram from the realogram database 170 at the scoring time "t". In one example, this is the most recently generated realogram. The SKU scores of all cells in the realogram are compared to a threshold score at step 1306 . If the SKU score is above the threshold (step 1308), the process repeats steps 1304 and 1306 for the next inventory item "i". In embodiments that include a planogram, or if a planogram is available, the SKU score of item "i" is compared to a threshold for cells that match the placement of inventory item "i" in the planogram. In another embodiment, an inventory item's SKU score is calculated by filtering "put on" inventory events. In this embodiment, the SKU score reflects the "take" event of inventory item "i" per cell in the realogram that can be compared to a threshold. In another embodiment, the count of "take" inventory events per cell can be used as a score for comparison with a threshold for determining the restocking of inventory item "i". In this embodiment, the threshold is the minimum count of inventory items that must be stocked in the inventory location.

在庫商品「ｉ」のＳＫＵスコアが閾値未満である場合、在庫商品「ｉ」を再ストックする必要があることを示す警告通知が、店長または他の指定された従業員に送信される（ステップ１３１０）。システムは、また、ＳＫＵスコアが閾値未満のセルを在庫位置とマッチングさせることによって、在庫商品を再ストックする必要のある在庫位置を識別することができる。他の実施形態では、システムは、ショッピングストアのストック・ルーム内の在庫商品「ｉ」の在庫レベルをチェックして、在庫商品「ｉ」を卸業者から注文する必要があるかどうかを判定することができる。処理は、ステップ１３１２で終了する。図１３Ｂは、在庫商品についての再ストック警告通知を表示する、例示的なユーザ・インタフェースを提示する。警告通知は、タブレット及びモバイル・コンピューティング・デバイス等の他のタイプのデバイスのユーザ・インタフェース上に表示することができる。警告は、電子メール、携帯電話上のＳＭＳ（ショート・メッセージ・サービス）、または、モバイル・コンピューティング・デバイスにインストールされたアプリケーションを保存するための通知を介して、指定された受信者に送信することもできる。

［誤配置された在庫商品］ If the SKU score for inventory item "i" is below the threshold, a warning notification is sent to the store manager or other designated employee indicating that inventory item "i" should be restocked (step 1310). ). The system can also identify inventory locations where inventory items need to be restocked by matching cells with SKU scores below a threshold to inventory locations. In another embodiment, the system checks the inventory level of inventory item "i" in the stock room of the shopping store to determine if inventory item "i" needs to be ordered from the wholesaler. can be done. Processing ends at step 1312 . FIG. 13B presents an exemplary user interface displaying restock warning notifications for inventory items. Alert notifications can be displayed on the user interface of other types of devices such as tablets and mobile computing devices. Alerts are sent to designated recipients via email, SMS (Short Message Service) on mobile phones, or notifications to save applications installed on mobile computing devices can also

[Misplaced Inventory Items]

プラノグラムを含む実施形態では、または、店舗のプラノグラムが他の方法で利用可能である場合、プラノグラムのコンプライアンスのために、リアログラムが、誤配置された商品を識別することによって、プラノグラムと比較される。斯かる実施形態では、システムが、実空間のエリア内の在庫位置における在庫商品の配置を指定するプラノグラムを含む。システムは、複数のセル内のセルとマッチングする在庫商品を表すデータを維持するロジックを含む。システムは、セルとマッチングした在庫を表すデータを、プラノグラムで指定された在庫位置における在庫商品の配置と比較することによって、誤配置された商品を決定する。図１４は、リアログラムを使用してプラノグラムのコンプライアンスを判定するためのフローチャートを示す。処理はステップ１４０２で開始する。ステップ１４０４で、システムは、スコアリング時「ｔ」で在庫商品「ｉ」についてのリアログラムを検索する。リアログラム内の全てのセル内の在庫商品「ｉ」のスコアが、プラノグラム内の在庫商品「ｉ」の配置と比較される（ステップ１４０６）。リアログラムが、プラノグラム内の在庫商品「ｉ」の配置とマッチングしないセルにおける閾値を超える在庫商品「ｉ」に対するＳＫＵスコアを示す場合（ステップ１４０８）、システムは、これらの商品を誤配置として識別する。プラノグラム内の在庫商品の配置とマッチングしない商品についての警告または通知が、誤配置された商品を現在の位置から取って、指定された在庫位置に置き戻すことができる店舗従業員に送信される（ステップ１４１０）。ステップ１４０８で、誤配置された商品が識別されない場合、処理ステップ１４０４及び１４０６が、次の在庫商品「ｉ」について繰り返される。 In embodiments that include a planogram, or if a store planogram is otherwise available, for planogram compliance, the realog may identify misplaced items and thereby allow the planogram to be processed. is compared with In such embodiments, the system includes a planogram that specifies the placement of inventory items at inventory locations within an area of real space. The system includes logic to maintain data representing inventory items matching cells in a plurality of cells. The system determines misplaced items by comparing the data representing the inventory that matched the cell to the placement of the inventory items in the inventory locations specified in the planogram. FIG. 14 shows a flow chart for determining planogram compliance using a realogram. Processing begins at step 1402 . At step 1404, the system retrieves the realogram for inventory item "i" at scoring time "t". The scores of inventory item "i" in all cells in the realogogram are compared to the placement of inventory item "i" in the planogram (step 1406). If the rearogram shows SKU scores for inventory item "i" that exceed the threshold in cells that do not match the placement of inventory item "i" in the planogram (step 1408), the system identifies these items as misplacements. do. Warnings or notifications about items that do not match the placement of inventory items in the planogram are sent to store personnel who can take the misplaced items from their current locations and place them back in designated inventory locations. (Step 1410). If at step 1408 no misplaced items are identified, process steps 1404 and 1406 are repeated for the next inventory item "i".

一実施形態では、店舗アプリが店舗マップ上に商品の位置を表示し、店舗従業員を誤配置された商品に導く。これに続いて、店舗アプリは店舗マップ上の商品の正しい位置を表示し、従業員を正しい棚の部分に導いて、商品を指定された位置に置くことができる。別の実施形態では、店舗アプリが、店舗アプリに入力されたショッピング・リストに基づいて、顧客を在庫商品に案内することもできる。店舗アプリは、リアログラムを使用して在庫商品のリアルタイム位置を使用し、顧客をマップ上の在庫商品に最も近い在庫位置に案内することができる。この例では、在庫商品の最も近い位置が、店舗のプラノグラムに従って在庫位置に配置されていない誤配置された商品の位置であり得る。図１４Ｂは、誤配置された在庫商品「ｉ」の警告通知をユーザ・インタフェース・ディスプレイ１１０２上に表示する例示的なユーザ・インタフェースを示す。図１３Ｂで上述したように、この情報を店舗従業員に送信するために、様々なタイプのコンピューティング・デバイス及び警告通知メカニズムが使用できる。

［在庫商品予測精度の向上］ In one embodiment, the store app displays the location of items on a store map and guides store employees to misplaced items. Following this, the store app can display the correct location of the item on the store map and guide the employee to the correct shelf section to place the item in the specified location. In another embodiment, the store app may direct customers to inventory based on shopping lists entered into the store app. Store apps can use the real-time location of inventory items using re-logograms to guide customers to the closest inventory location to an inventory item on a map. In this example, the closest location of an inventory item may be the location of a misplaced item that has not been placed in the inventory location according to the store's planogram. FIG. 14B shows an exemplary user interface displaying a misplaced inventory item “i” warning notification on user interface display 1102 . Various types of computing devices and alert notification mechanisms can be used to transmit this information to store personnel, as described above in FIG. 13B.

[Improved inventory product prediction accuracy]

リアログラムの別の応用例は、画像認識エンジンによる在庫商品の予測を改善することである。図１５のフローチャートは、リアログラムを使用して在庫商品予想を調整するための例示的な処理ステップを示す。処理は、ステップ１５０２で開始する。ステップ１５０４で、システムは、画像認識エンジンから商品「ｉ」の予測信頼度スコア確率を受け取る。上述のように、ＷｈａｔＣＮＮは、被写体（または顧客）の手にある在庫商品を識別する例示的な画像認識エンジンである。ＷｈａｔＣＮＮが予測された在庫商品の信頼度スコア（または信頼値）確率を出力する。ステップ１５０６で、信頼度スコア確率が閾値と比較される。確率値が閾値を超え、予測のより高い信頼度を示す場合（ステップ１５０８）、処理は次の在庫商品「ｉ」について繰り返される。そうでなく、信頼度スコア確率が閾値未満である場合は、処理はステップ１５１０に続く。 Another application of Realog is to improve inventory prediction by image recognition engines. The flowchart of FIG. 15 illustrates exemplary processing steps for adjusting inventory forecasts using a realogram. Processing begins at step 1502 . At step 1504, the system receives the predicted confidence score probability for item "i" from the image recognition engine. As noted above, WhatCNN is an exemplary image recognition engine that identifies inventory items in the hands of a subject (or customer). WhatCNN outputs the confidence score (or confidence value) probability of the predicted inventory item. At step 1506, the confidence score probability is compared to a threshold. If the probability value exceeds the threshold, indicating greater confidence in the prediction (step 1508), the process is repeated for the next inventory item "i". Otherwise, if the confidence score probability is less than the threshold, processing continues at step 1510 .

スコアリング時「ｔ」での在庫商品「ｉ」のリアログラムは、ステップ１５１０において検索される。一例において、これは最新のリアログラムであってもよく、別の例においては、在庫イベント時と時間的にマッチングまたはより近いスコアリング時「ｔ」におけるリアログラムが、リアログラム・データベース１７０から検索され得る。ステップ１５１２で、在庫イベントの位置における在庫商品「ｉ」のＳＫＵスコアが閾値と比較される。ＳＫＵスコアが閾値を上回る場合（ステップ１５１４）、画像認識による在庫商品「ｉ」の予測を受け付ける（ステップ１５１６）。在庫イベントに関連付けられた顧客のログ・データ構造をそれに応じて更新する。在庫イベントが「取る」イベントである場合、在庫商品「ｉ」を顧客のログ・データ構造に追加する。在庫イベントが「置く」イベントである場合、在庫商品「ｉ」を顧客のログ・データ構造から除去する。ＳＫＵスコアが閾値を下回る場合（ステップ１５１４）、画像認識エンジンの予測は拒絶される（ステップ１５１８）。在庫イベントが「取る」イベントである場合、結果として、在庫商品「ｉ」は顧客のログ・データ構造に追加されない。同様に、在庫イベントが「置く」である場合、結果として、在庫商品「ｉ」は顧客のログ・データ構造から除去されない。処理は、ステップ１５２０で終了する。別の実施形態では、在庫商品「ｉ」のＳＫＵスコアを使用して、商品予測信頼度スコアを決定するための画像認識エンジンに対する入力パラメータを調整することができる。畳み込みニューラル・ネットワーク（ＣＮＮ）であるＷｈａｔＣＮＮは、在庫商品を予測する画像認識エンジンの一例である。

［ネットワーク構成］ The realogram for inventory item “i” at scoring time “t” is retrieved in step 1510 . In one example, this may be the most recent realoggram, in another example, a realoggram at a scoring time "t" that matches or is closer in time to the inventory event time is retrieved from the realoggram database 170. can be At step 1512, the SKU score of inventory item "i" at the location of the inventory event is compared to a threshold. If the SKU score is above the threshold (step 1514), accept the image recognition prediction for inventory item "i" (step 1516). Update the customer's log data structure associated with the inventory event accordingly. If the inventory event is a "take" event, add inventory item "i" to the customer's log data structure. If the inventory event is a "put" event, then remove inventory item "i" from the customer's log data structure. If the SKU score is below the threshold (step 1514), the image recognition engine prediction is rejected (step 1518). If the inventory event is a "take" event, then inventory item "i" is not added to the customer's log data structure as a result. Similarly, if the inventory event is "place", then inventory item "i" is not removed from the customer's log data structure as a result. Processing ends at step 1520 . In another embodiment, the SKU score of inventory item "i" can be used to adjust the input parameters to the image recognition engine for determining the item prediction confidence score. WhatCNN, a convolutional neural network (CNN), is an example of an image recognition engine that predicts inventory.

[Network configuration]

図１６は、ネットワーク・ノード１０６上でホストされる店舗リアログラム・エンジン１９０をホストするネットワークのアーキテクチャを示す。システムは、図示された実施形態では複数のネットワーク・ノード１０１ａ、１０１ｂ、１０１ｎ、及び１０２を含む。斯かる実施形態では、ネットワーク・ノードは処理プラットフォームとも呼ばれる。処理プラットフォーム（ネットワーク・ノード）１０３，１０１ａ～１０１ｎ，及び１０２、並びに、カメラ１６１２，１６１４，１６１６，・・・１６１８は、ネットワーク１６８１に接続される。同様のネットワークは、ネットワーク・ノード１０４上でホストされる店舗在庫エンジン１８０をホストする。 FIG. 16 shows the architecture of the network hosting the store realog engine 190 hosted on the network node 106 . The system includes a plurality of network nodes 101a, 101b, 101n, and 102 in the illustrated embodiment. In such embodiments, network nodes are also referred to as processing platforms. Processing platforms (network nodes) 103, 101a-101n, and 102, and cameras 1612, 1614, 1616, . A similar network hosts a store inventory engine 180 hosted on network node 104 .

図１３は、ネットワークに接続された複数のカメラ１６１２，１６１４，１６１６，・・・１６１８を示す。多数のカメラを特定のシステムに配備することができる。一実施形態では、カメラ１６１２～１６１８が、イーサネット（登録商標）ベースのコネクタ１６２２，１６２４，１６２６，及び１６２８をそれぞれ使用して、ネットワーク１６８１に接続される。斯かる実施形態では、イーサネット・ベースのコネクタがギガビット・イーサネットとも呼ばれる１ギガビット／秒のデータ転送速度を有する。他の実施形態では、カメラ１１４がギガビット・イーサネット（登録商標）よりも高速または低速のデータ転送速度を有することができる他のタイプのネットワーク接続を使用してネットワークに接続されることを理解されたい。また、代替の実施形態では、１組のカメラを各処理プラットフォームに直接接続することができ、処理プラットフォームをネットワークに結合することができる。 FIG. 13 shows a plurality of cameras 1612, 1614, 1616, . . . 1618 connected to a network. Multiple cameras can be deployed in a particular system. In one embodiment, cameras 1612-1618 are connected to network 1681 using Ethernet-based connectors 1622, 1624, 1626, and 1628, respectively. In such embodiments, the Ethernet-based connector has a data transfer rate of 1 Gigabit/second, also called Gigabit Ethernet. It should be appreciated that in other embodiments, camera 114 is connected to the network using other types of network connections that may have faster or slower data transfer rates than Gigabit Ethernet. . Also, in alternate embodiments, a set of cameras can be directly connected to each processing platform, and the processing platforms can be coupled to a network.

記憶サブシステム１６３０は、本発明の特定の実施形態の機能を提供する基本的なプログラミング及びデータ構成を記憶する。例えば、店舗リアログラム・エンジン１９０の機能を実施する様々なモジュールを、記憶サブシステム１６３０に格納することができる。記憶サブシステム１６３０は、非一時的データ記憶媒体を備えるコンピュータ可読メモリの一例であり、本明細書で説明される処理によって実空間のエリアのリアログラムを計算するロジックを含む、本明細書で説明されるデータ処理機能及び画像処理機能の全てまたは任意の組み合わせを実行するための、コンピュータによって実行可能なメモリに記憶されるコンピュータ命令を備える。他の例では、コンピュータ命令は、コンピュータ可読の非一時的データ記憶媒体または媒体を含む、携帯用メモリを含む他のタイプのメモリに記憶することができる。 Storage subsystem 1630 stores the basic programming and data structures that provide the functionality of certain embodiments of the present invention. For example, various modules that implement the functionality of store realog engine 190 may be stored in storage subsystem 1630 . Storage subsystem 1630 is an example of a computer readable memory comprising a non-transitory data storage medium and includes logic for computing a realogram of an area of real space according to the processes described herein. computer instructions stored in a computer-executable memory for performing all or any combination of data processing functions and image processing functions performed. In other examples, the computer instructions may be stored in other types of memory, including portable memory, including non-transitory computer-readable data storage media or media.

これらのソフトウェア・モジュールは一般に、プロセッサ・サブシステム１６５０によって実行される。ホスト・メモリ・サブシステム１６３２は、通常、プログラム実行中の命令及びデータの記憶のためのメイン・ランダム・アクセス・メモリ（ＲＡＭ）１６３４と、固定命令が記憶される読み出し専用メモリ（ＲＯＭ）１６３６とを含む幾つかのメモリを含む。一実施形態では、ＲＡＭ１６３４が店舗リアログラム・エンジン１９０によって生成されたポイントクラウド・データ構造タプルを格納するためのバッファとして使用される。 These software modules are generally executed by processor subsystem 1650 . The host memory subsystem 1632 typically includes a main random access memory (RAM) 1634 for storage of instructions and data during program execution, and a read only memory (ROM) 1636 in which fixed instructions are stored. contains several memories including In one embodiment, RAM 1634 is used as a buffer to store the point cloud data structure tuples generated by store realog engine 190 .

ファイル記憶サブシステム１６４０は、プログラム及びデータ・ファイルのための永続的記憶を提供する。例示的な一実施形態では、記憶サブシステム１６４０が、番号１６４２によって識別されるＲＡＩＤ０（独立ディスクの冗長配列）構成内に４つの１２０ギガバイト（ＧＢ）ソリッド・ステート・ディスク（ＳＳＤ）を含む。該例示的な実施形態では、マップ・データベース１４０内のデータ、在庫イベント・データベース１５０内の在庫イベント・データ、在庫データベース１６０内の在庫データ、及びＲＡＭにないリアログラム・データベース１７０内のリアログラム・データが、ＲＡＩＤ０に記憶される。該例示的な実施形態では、ハードディスク・ドライブ１６４６はＲＡＩＤ０１６４２ストレージよりもアクセス速度が遅い。ソリッド・ステート・ディスク（ＳＳＤ）１６４４は、店舗リアログラム・エンジン１９０のためのオペレーティング・システム及び関連ファイルを含む。 File storage subsystem 1640 provides persistent storage for program and data files. In one exemplary embodiment, storage subsystem 1640 includes four 120 gigabyte (GB) solid state disks (SSDs) in a RAID 0 (Redundant Array of Independent Disks) configuration identified by number 1642 . In the exemplary embodiment, data in map database 140, inventory event data in inventory event database 150, inventory data in inventory database 160, and realogram data in realogram database 170 not in RAM. Data is stored in RAID0. In the exemplary embodiment, hard disk drives 1646 have slower access speeds than RAID 0 1642 storage. A Solid State Disk (SSD) 1644 contains the operating system and related files for the Store Realog Engine 190 .

例示的な構成では、４つのカメラ１６１２，１６１４，１６１６，１６１８が、処理プラットフォーム（ネットワーク・ノード）１０３に接続される。各カメラは、カメラによって送られた画像を処理するために、専用グラフィックス処理ユニットＧＰＵ１１６６２，ＧＰＵ２１６６４，ＧＰＵ３１６６６，及びＧＰＵ４１６６８を有する。１つの処理プラットフォームにつき、３つより少ないまたは多いカメラを接続することができると理解される。従って、各カメラが、カメラから受信した画像フレームを処理するための専用ＧＰＵを有するように、より少ないまたはより多いＧＰＵがネットワーク・ノード内に構成される。プロセッサ・サブシステム１６５０、記憶サブシステム１６３０、及びＧＰＵ１６６２，１６６４、及び１６６６は、バス・サブシステム１６５４を使用して通信する。 In the exemplary configuration, four cameras 1612 , 1614 , 1616 , 1618 are connected to processing platform (network node) 103 . Each camera has a dedicated graphics processing unit GPU1 1662, GPU2 1664, GPU3 1666, and GPU4 1668 to process the images sent by the camera. It is understood that less or more than three cameras can be connected per processing platform. Fewer or more GPUs are therefore configured in the network node such that each camera has a dedicated GPU for processing image frames received from the camera. Processor subsystem 1650 , storage subsystem 1630 , and GPUs 1662 , 1664 and 1666 communicate using bus subsystem 1654 .

ネットワーク・インタフェース・サブシステム１６７０は、処理プラットフォーム（ネットワーク・ノード）１０４の一部を形成するバス・サブシステム１６５４に接続される。ネットワーク・インタフェース・サブシステム１６７０は、他のコンピュータ・システムにおける対応するインタフェース・デバイスへのインタフェースを含む、外部ネットワークへのインタフェースを提供する。ネットワーク・インタフェース・サブシステム１６７０は、ケーブル（または配線）を使用して、またはワイヤレスで、処理プラットフォームがネットワークを介して通信することを可能にする。ユーザ・インタフェース出力デバイス及びユーザ・インタフェース入力デバイスのような幾つかの周辺デバイスも、処理プラットフォーム１０４の一部を形成するバス・サブシステム１６５４に接続されている。これらのサブシステム及びデバイスは、説明の明確性を改善するために、図１３には意図的に示されていない。バス・サブシステム１６５４は、単一のバスとして概略的に示されているが、バス・サブシステムの代わりの実施形態は複数のバスを使用することができる。 Network interface subsystem 1670 is connected to bus subsystem 1654 forming part of processing platform (network node) 104 . Network interface subsystem 1670 provides interfaces to external networks, including interfaces to corresponding interface devices in other computer systems. The network interface subsystem 1670 allows the processing platform to communicate over a network using cables (or wires) or wirelessly. A number of peripheral devices such as user interface output devices and user interface input devices are also connected to bus subsystem 1654 forming part of processing platform 104 . These subsystems and devices are intentionally not shown in FIG. 13 to improve clarity of illustration. Bus subsystem 1654 is shown schematically as a single bus, but alternate embodiments of the bus subsystem may use multiple buses.

一実施形態では、カメラ１１４が、１２８８×９６４の解像度、３０ＦＰＳのフレームレート、及び１．３メガピクセル／イメージで、３００ｍｍ～無限大の作動距離を有する可変焦点レンズ、９８．２°～２３．８°の１／３インチセンサによる視野を有するＣｈａｍｅｌｅｏｎ３１．３ＭＰＣｏｌｏｒＵＳＢ３Ｖｉｓｉｏｎ（ＳｏｎｙＩＣＸ４４５）を使用して実装することができる。 In one embodiment, the camera 114 has a resolution of 1288×964, a frame rate of 30 FPS, and a varifocal lens with a working distance of 300 mm to infinity, 98.2° to 23.5°, at 1.3 megapixels/image. It can be implemented using a Chameleon3 1.3 MP Color USB3 Vision (Sony ICX445) with a 1/3 inch sensor field of view of 8°.

本明細書に記載された技術は、また、在庫陳列構造を含む実空間のエリア内の在庫商品を追跡するためのシステムを含み、対応する方法及びコンピュータ・プログラム製品と共に、在庫陳列構造の上方に配置された複数のカメラまたはその他のセンサと、実空間内の対応する視野内の在庫陳列構造のそれぞれの画像シーケンスを生成する複数のセンサ内のセンサと、但し、各センサの視野は、複数のセンサ内の少なくとも１つの他のセンサの視野と重なり合っており、データセットを記憶するメモリと、但し、該データセットは実空間のエリア内の座標を有する複数のセルを規定し、複数のセンサに結合され処理システムとを備え、該処理システムは、複数のセンサによって生成された画像シーケンスを処理して、実空間のエリア内の３次元における在庫イベントの位置を見つけ、在庫イベントに応答して、在庫イベントの位置に基づいてデータセット内の最も近いセルを決定するロジックを含み、該処理システムは、在庫イベントのそれぞれのカウントを使用して特定のセルにマッチングする位置を有する在庫イベントと関連する在庫商品について、スコアリング時にスコアを計算するロジックを含む。システムは、スコアに基づいてセル毎の在庫商品のセットを選択するロジックを含むことができる。在庫イベントは、商品識別子、置くまたは取るインジケータ、実空間のエリアの３つの軸に沿った位置によって表される位置、及びタイムスタンプを含むことができる。システムは、実空間のエリア内に座標を有する２次元グリッドとして表される複数のセルを規定するデータセットを含むことができ、セルは在庫位置の前面図の部分と相関し、処理システムは、在庫イベントの位置に基づいて最も近いセルを決定するロジックを含む。システムは、実空間のエリア内に座標を有する３次元グリッドとして表される複数のセルを規定するデータセットを含むことができ、セルは在庫位置上の容積の部分と相関し、処理システムは、在庫イベントの位置に基づいて最も近いセルを決定するロジックを含む。置くインジケータは商品が在庫位置に置かれたことを識別することができ、置くインジケータは、商品が在庫位置から取り出されたことを識別する。複数のセンサによって生成された画像シーケンスを処理するロジックは、手に対応する画像内の要素を表すデータセットを生成する画像認識エンジンを備え、少なくとも２つのセンサからの画像シーケンスからのデータセットの分析を実行して、３次元における在庫イベントの位置を判断することができる。画像認識エンジンは、畳み込みニューラル・ネットワークを含むことができる。セルのスコアを計算するロジックは、置くこと及び取ることのタイムスタンプとスコアリング時との間の分離によって重み付けされた在庫商品の置くこと及び取ることの合計を使用し、スコアをメモリに記憶することができる。在庫イベントの位置に基づいてデータセット内の最も近いセルを決定するロジックは、在庫イベントの位置からデータセット内のセルまでの距離を計算することと、計算された距離に基づいて在庫イベントをセルとマッチングさせることとを含む手順を実行することができる The technology described herein also includes a system for tracking inventory items within an area of real space that includes an inventory display structure, along with corresponding methods and computer program products, to track inventory items above the inventory display structure. A plurality of cameras or other sensors positioned and sensors within the plurality of sensors that generate respective image sequences of inventory display structures within corresponding fields of view in real space, where the field of view of each sensor comprises a plurality of sensors. a memory overlapping the field of view of at least one other sensor in the sensor and storing a data set, wherein the data set defines a plurality of cells having coordinates within an area of real space; a processing system coupled to process the image sequences generated by the plurality of sensors to locate the inventory event in three dimensions within an area of real space; including logic for determining the closest cell in the data set based on the location of the inventory event, the processing system using the count of each of the inventory events to associate inventory events with locations that match the particular cell. Contains the logic to calculate the score when scoring for an inventory item. The system can include logic to select a set of inventory items for each cell based on the score. An inventory event can include an item identifier, a put or take indicator, a position represented by a position along three axes of an area in real space, and a timestamp. The system can include a data set defining a plurality of cells represented as a two-dimensional grid having coordinates within an area of real space, the cells correlated with front view portions of inventory locations, the processing system comprising: Contains logic to determine the closest cell based on inventory event location. The system can include a data set defining a plurality of cells represented as a three-dimensional grid having coordinates within an area of real space, the cells correlated with portions of the volume over inventory locations, the processing system comprising: Contains logic to determine the closest cell based on inventory event location. A put indicator can identify when an item has been placed in an inventory location, and a put indicator identifies when an item has been removed from an inventory location. Logic for processing image sequences generated by multiple sensors includes an image recognition engine that generates datasets representing elements in images corresponding to the hand, and analyzing datasets from the image sequences from at least two sensors. can be performed to determine the location of the inventory event in three dimensions. Image recognition engines may include convolutional neural networks. The logic for calculating the cell score uses the total put and take inventory weighted by the separation between the put and take time stamps and the time of scoring and stores the score in memory. be able to. The logic for determining the closest cell in the dataset based on the location of the inventory event consists of calculating the distance from the location of the inventory event to the cell in the dataset and adding the inventory event to the cell based on the calculated distance. can perform a procedure including matching with

上述または上記で参照された任意のデータ構造及びコードは、多くの実施態様に従って、コンピュータ・システムによって使用されるコード及び／またはデータを記憶することができる任意のデバイスまたは媒体であり得る、非一時的なコンピュータ可読記憶媒体を含むコンピュータ可読メモリに記憶される。これには、揮発性メモリ、不揮発性メモリ、特定用途向け集積回路（ＡＳＩＣ）、フィールド・プログラマブル・ゲートアレイ（ＦＰＧＡ）、ディスク・ドライブ、磁気テープ、ＣＤ（コンパクトディスク）、ＤＶＤ（デジタル・バーサタイル・ディスクまたはデジタル・ビデオ・ディスク）等の磁気及び光記憶デバイス、または、現在知られているまたは今後開発されるコンピュータ可読媒体を記憶することができる他の媒体が含まれるが、これらに限定されない。 Any data structures and code referred to above or above may be any device or medium capable of storing code and/or data for use by a computer system, according to many embodiments. computer readable memory, including any computer readable storage medium. This includes volatile memory, non-volatile memory, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), disk drives, magnetic tapes, CDs (compact disks), DVDs (digital versatile (including, but not limited to, magnetic and optical storage devices such as discs or digital video discs) or other media capable of storing computer readable media now known or hereafter developed.

先行する説明は、開示された技術の使用及び実施を可能にするために提示されている。開示された実施態様に対する種々の変形は明らかであり、本明細書で規定された原理は、開示された技術の趣旨及び範囲から逸脱することなく、他の実施態様及び応用例に適用され得る。従って、開示された技術は、示された実施態様に限定されることを意図するものではなく、本明細書で開示された原理及び特徴と一致する最も広い範囲が与えられるべきである。開示される技術の範囲は、添付の特許請求の範囲によって規定される。 The preceding description is presented to enable the use and practice of the disclosed techniques. Various modifications to the disclosed embodiments will be apparent, and the principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the disclosed technology. Accordingly, the disclosed technology is not intended to be limited to the embodiments shown, but is to be accorded the broadest scope consistent with the principles and features disclosed herein. The scope of the disclosed technology is defined by the appended claims.

Claims

A system for tracking inventory items within an area of real space, comprising:
a plurality of sensors and a processing system coupled to the plurality of sensors;
A sensor of the plurality of sensors produces a respective image sequence of a corresponding field of view in the real space, wherein the field of view of each sensor is matched with the field of view of at least one other sensor in the plurality of sensors. overlapping,
The processing system uses the image sequences generated by at least two of the plurality of sensors to identify an inventory event and, in response to the inventory event, determines inventory within an area of the real space. Logic for tracking the location of an item and matching the location of an inventory item to the coordinates of a cell in a plurality of cells having coordinates within an area of said real space to represent an inventory item that matches a cell in said plurality of cells. A system that includes logic to maintain data and, when scoring, for an inventory item that has a matching position in a particular cell, calculate a score based on inventory event counts.

2. The system of claim 1, wherein the inventory event includes an item identifier, a put or take indicator, a position represented by a position in three dimensions of the real space area, and a timestamp.

the area of real space includes a plurality of inventory locations;
2. The system of claim 1 , including logic for matching inventory locations with coordinates of cells within a plurality of cells.

Said logic for calculating the score of a cell claims to use said put and take totals weighted by the separation between said put and take time stamps and said scoring time. Item 1. The system according to Item 1 .

2. The system of claim 1 , wherein the processing system includes logic for rendering display images representing cells in the plurality of cells and the scores of the cells.

2. The system of claim 1 , wherein the processing system includes logic for selecting a set of inventory items per cell based on the score.

the area of real space includes a plurality of inventory locations, the coordinates of cells within the plurality of cells are correlated with inventory locations within the plurality of inventory locations;
including in memory a planogram specifying placement of inventory items at inventory locations within the area of the real space;
Logic maintaining the data representing inventory items that match cells in the plurality of cells aligns the data representing inventory items that match cells with placement of inventory items within specified inventory locations within the planogram. 2. The system of claim 1 , including logic for determining misplaced items by comparison.

the area of real space includes a plurality of inventory locations, the coordinates of cells within the plurality of cells are correlated with inventory locations within the plurality of inventory locations;
logic maintaining said data representing inventory items matching cells within said plurality of cells, wherein for a particular inventory item matched with a particular cell, the count of said particular inventory item within that cell determines said 2. The system of claim 1 , including logic for determining whether the inventory on an inventory location correlated with a particular cell is below a threshold for restocking.

A method , executed by a computer system, for tracking inventory within an area of real space, comprising:
using a plurality of sensors to generate an image sequence of each corresponding said field of view in said real space, wherein the field of view of each sensor overlaps said field of view of at least one other sensor;
identifying inventory events using the image sequences generated by at least two of the plurality of sensors;
tracking the location of inventory items within the real space area in response to the identification of the inventory event;
matching locations of inventory items to coordinates of cells in a plurality of cells having coordinates within an area of the real space; and maintaining data representing inventory items that match cells in the plurality of cells; as well as,
A method comprising calculating a score using a count of inventory events during scoring for inventory items that have a matching position in a particular cell.

10. The method of claim 9 , wherein the inventory event includes an item identifier, a put or take indicator, a position represented by a position along three axes of the real space area, and a timestamp.

the area of real space includes a plurality of inventory locations;
10. The method of claim 9 , wherein coordinates of cells within said plurality of cells correlate to inventory locations within said plurality of inventory locations.

calculating a cell's score using the sum of the put and take inventory weighted by the separation between the put and take take time stamps of the inventory item and the time of the scoring; and 10. The method of claim 9 , further comprising: storing the score in memory.

10. The method of claim 9 , further comprising rendering a display image representing a cell in the plurality of cells and the score of the cell.

10. The method of claim 9 , further comprising selecting a set of inventory items for each cell based on said score.

the area of real space includes a plurality of inventory locations, the coordinates of cells within the plurality of cells are correlated with inventory locations within the plurality of inventory locations ;
designating as a planogram the arrangement of inventory items at inventory locations within the real space area, and storing the planogram in memory;
10. The method of claim 9 , further comprising determining misplaced items by comparing the data representing inventory items that match a cell to placement of inventory items within the inventory locations.

the area of real space includes a plurality of inventory locations, the coordinates of cells within the plurality of cells are correlated with inventory locations within the plurality of inventory locations;
For a particular inventory item that matched a particular cell, the count of said particular inventory item in that cell is below a threshold for restocking said inventory item on an inventory location correlated with said particular cell. 10. The method of claim 9, further comprising determining whether.

A non-transitory computer-readable storage medium storing computer program instructions for tracking inventory items within an area of real space, comprising:
A method, implemented when said instructions are executed on a processor, comprising:
using a plurality of sensors to generate an image sequence of each corresponding said field of view in said real space, wherein the field of view of each sensor overlaps said field of view of at least one other sensor;
identifying inventory events using the image sequences generated by at least two of the plurality of sensors ;
tracking the location of inventory items within the real space area in response to the identification of the inventory event ;
matching locations of inventory items to coordinates of cells in a plurality of cells having coordinates within an area of the real space; and maintaining data representing inventory items that match cells in the plurality of cells; as well as,
calculating a score using a count of inventory events during scoring for inventory items having a location matching a particular cell.

18. The non-transitory computer-readable storage medium of claim 17 , wherein the inventory event comprises an item identifier, a put or take indicator, a position represented by a position along three axes of the real space area, and a timestamp. .

the area of real space includes a plurality of inventory locations;
18. The non-transitory computer-readable storage medium of claim 17 , wherein coordinates of cells within said plurality of cells correlate to inventory locations within said plurality of inventory locations.

Carrying out the method comprises:
calculating a cell's score using the sum of the put and take inventory weighted by the separation between the put and take take time stamps of the inventory item and the time of the scoring; and 18. The non-transitory computer-readable storage medium of claim 17 , further comprising: storing the score in memory.

Carrying out the method comprises:
18. The non-transitory computer-readable storage medium of claim 17 , further comprising rendering a display image representing a cell in the plurality of cells and the score of the cell.

Carrying out the method comprises:
18. The non-transitory computer-readable storage medium of claim 17 , further comprising selecting a set of inventory items for each cell based on said score.

the area of real space includes a plurality of inventory locations, the coordinates of cells within the plurality of cells are correlated with inventory locations within the plurality of inventory locations;
Carrying out the method comprises :
designating as a planogram the arrangement of inventory items at inventory locations within the real space area, and storing the planogram in memory;
18. The non-transitory computer readable of claim 17 , further comprising determining misplaced items by comparing the data representing inventory items that match a cell to placement of inventory items within the inventory location. storage medium.

the area of real space includes a plurality of inventory locations, the coordinates of cells within the plurality of cells are correlated with inventory locations within the plurality of inventory locations;
Carrying out the method comprises:
For a particular inventory item that matched a particular cell, the count of said particular inventory item in that cell is below a threshold for restocking said inventory item on an inventory location correlated with said particular cell. 18. The non-transitory computer-readable storage medium of claim 17, further comprising determining whether.