TWI787536B

TWI787536B - Systems and methods to check-in shoppers in a cashier-less store

Info

Publication number: TWI787536B
Application number: TW108126626A
Authority: TW
Inventors: 喬丹費雪; 瓦倫格林; 丹尼爾菲奇帝
Original assignee: 美商標準認知公司
Priority date: 2018-07-26
Filing date: 2019-07-26
Publication date: 2022-12-21
Also published as: EP3827408A1; WO2020023801A1; EP3827408A4; CA3112512A1; TW202008249A

Abstract

Systems and techniques are provided for linking subjects in an area of real space with user accounts. The user accounts are linked with client applications executable on mobile computing devices. A plurality of cameras are disposed above the area. The cameras in the plurality of cameras produce respective sequences of images in corresponding fields of view in the real space. A processing system is coupled to the plurality of cameras. The processing system includes logic to determine locations of subjects represented in the images. The processing system further includes logic to match the identified subjects with user accounts by identifying locations of the mobile computing devices executing client applications in the area of real space and matching locations of the mobile computing devices with locations of the subjects.

Description

System and method for checking in shoppers in a cashierless store

本申請案請求2018年7月26日申請的美國臨時專利申請案第62/703,785號(代理人案號第STCG 1006-1號)和2019年1月23日申請的美國非臨時申請案第16/255,573號(代理人案號第STCG 1009-1號)的優先權利，該美國非臨時申請案第16/255,573號為2018年4月4日申請的美國專利申請案第15/945,473號(代理人案號第STCG 1005-1號)的部分延續案，該美國專利申請案第15/945,473號為2018年2月27日申請的美國專利申請案第15/907,112號(代理人案號第STCG 1002-1號)之部分延續案(現為2018年11月20日公告的美國專利第10,133,933號)，該美國專利申請案第15/907,112號為2017年12月19日申請的美國專利申請案第15/847,796號(代理人案號第STCG 1001-1號)之部分延續案(現為8月21日公告的美國專利第10,055,853號)，其請求2017年8月07日申請的美國臨時專利申請案第62/542,077號(代理人案件第STCG 1000-1號)的優先權利，於此合併參考該些申請案。This application petitions U.S. Provisional Patent Application No. 62/703,785 (Attorney Docket No. STCG 1006-1), filed July 26, 2018, and U.S. Nonprovisional Application No. 16, filed January 23, 2019 /255,573 (Attorney Docket No. STCG 1009-1), U.S. nonprovisional application Ser. Docket No. STCG 1005-1), a continuation-in-part of U.S. Patent Application No. 15/945,473, filed February 27, 2018, U.S. Patent Application No. 15/907,112 (Attorney Docket No. STCG 1002-1) (now U.S. Patent No. 10,133,933 published on November 20, 2018), the U.S. patent application No. 15/907,112 is a U.S. patent application filed on December 19, 2017 Continuation-in-Part of Ser. No. 15/847,796 (Attorney Docket No. STCG 1001-1), now U.S. Patent No. 10,055,853 issued Aug. 21, requesting U.S. Provisional Patent filed Aug. 07, 2017 Priority to Application No. 62/542,077 (Attorney's Case No. STCG 1000-1), which applications are hereby incorporated by reference.

本案係關於將在現實空間的區域中的主體與使用者帳戶連結的系統，該使用者帳戶與在行動計算裝置上執行的客戶端應用連結。This case relates to a system for associating a subject in a region of real space with a user account associated with a client application executing on a mobile computing device.

相關前案的說明Explanation of related previous cases

在現實空間之區域內識別主體(像是在購物商店中的人物)、唯一將識別的主體與現實人物關聯或者將關聯於責任方的認證帳戶關聯會出現許多技術挑戰。例如，考量在具有多個顧客活動於購物商店內貨架和開放空間之間的走道上佈署於購物商店中的這類影像處理系統。顧客從貨架拿取品項且將他們放在他們分別的購物推車或籃中。顧客亦可將品項放在貨架上，若他們不想要該品項的話。雖然系統可識別在影像中的主體以及主體拿取的品項，但系統必需準確地識別對於由該主體之拿取的品項有責任的真實使用者帳戶。Many technical challenges arise in identifying a subject within a region of real space, such as a person in a shopping store, uniquely associating an identified subject with a real person, or associating an authentication account with a responsible party. For example, consider such an image processing system deployed in a shopping store with multiple customers moving in the aisles between shelves and open spaces within the shopping store. Customers take items from the shelves and place them in their respective shopping carts or baskets. Customers can also place items on the shelf if they do not want the item. While the system can identify the subject in the image and the item the subject is taking, the system must accurately identify the real user account responsible for the item being taken by the subject.

在一些系統中，臉部辨識(facial recognition)或他們的生物特徵辨識(biometric recognition)技術可能被使用來識別在影像中的主體並且將他們與帳戶連結。然而，此方法需要由影像處理系統對儲存個人識別生物特徵資訊的資料庫(其與帳戶連結)存取。這在許多設定中從安全和隱私的立場上是不理想的。In some systems, facial recognition or their biometric recognition techniques may be used to identify subjects in images and link them to accounts. However, this approach requires access by the image processing system to a database storing personal identifying biometric information (which is linked to an account). This is undesirable from a security and privacy standpoint in many settings.

理想的是，提供能更有效率且自動將在現實空間的區域中的主體連結到對系統是已知的使用者，該系統係用於對主體提供服務。也同樣理想的是，提供一種影像處理系統，由其使用大空間來識別主體而不需要主體的個人識別生物特徵資訊。It would be desirable to provide a system that more efficiently and automatically connects agents in regions of real space to users who are known to the system used to provide services to the agents. It would also be desirable to provide an image processing system that uses a large volume to identify a subject without requiring personally identifying biometric information from the subject.

提供一種系統和用於操作系統的方法以用於將主體，像是在現實空間之區域中的個人與使用者帳戶連結。系統可以使用影像處理以識別在現實空間之區域中的主體，而不需要個人識別生物特徵資訊(biometric information)。使用者帳戶係與在行動計算裝置上可執行的客戶端應用連結。此將識別的主體藉由影像和信號處理連結至使用者帳戶的功能出現了計算工程的複雜問題，其關於要被處理的該類影像和信號資料、要進行何種影像及信號資料的處理以及如何從具有高可靠度的影像及信號資料決定行動。A system and method for an operating system is provided for associating subjects, such as individuals in regions of real space, with user accounts. The system can use image processing to identify subjects in regions of real space without requiring personally identifying biometric information. A user account is associated with a client application executable on the mobile computing device. This function of linking an identified subject to a user account via image and signal processing presents complex problems of computational engineering regarding the type of image and signal data to be processed, what image and signal data processing is to be performed and How to decide actions from image and signal data with high reliability.

提供一種用於將在現實空間之區域中的主體與使用者帳戶連結的系統和方法。使用者帳戶係與在行動計算裝置上可執行的客戶端應用連結。複數個攝像機或其它感測器在現實空間中對應的視域中生成分別影像序列。使用這些影像之序列，系統和方法被描述以用於決定在影像中代表的識別主體之位置，並且藉由識別在現實空間之區域中執行客戶端應用的行動裝置之位置來匹配識別的主體與使用者帳戶以及匹配行動裝置之位置與主體的位置。A system and method are provided for linking a subject in a region of real space with a user account. A user account is associated with a client application executable on the mobile computing device. A plurality of cameras or other sensors generate respective image sequences in corresponding fields of view in real space. Using these sequences of images, systems and methods are described for determining the locations of identified subjects represented in the images, and matching identified subjects with User account and match the location of the mobile device with the location of the subject.

在於此說明的一實施例中，行動裝置發射可使用來指示在現實空間之區域中行動裝置之位置的信號。系統藉由使用發射的信號來識別行動裝置之位置來將識別的主體與使用者帳戶匹配。In one embodiment described herein, a mobile device emits a signal that can be used to indicate the location of the mobile device in an area of real space. The system matches identified subjects with user accounts by using the emitted signals to identify the location of the mobile device.

在一實施例中，由行動裝置發射的信號包含影像。在說明的實施例中，在行動裝置上的客戶端應用在現實空間之區域中行動裝置上引起顯示可以如特定色彩一樣簡單的號誌(semaphore)影像。系統藉由藉著使用決定顯示號誌影像的行動裝置之位置的影像辨識引擎來識別行動裝置之位置而將識別的主體與使用者帳戶匹配。系統包括成組的號誌影像。在將使用者帳戶匹配至在現實空間之區域中識別的主體之前，系統接受來自在識別使用者帳戶的行動裝置上客戶端應用的登入通訊。在接受登入通訊之後，系統從該組號誌影像發送選定的號誌影像到在行動裝置上的客戶端應用中。系統將選定的號誌影像之狀態設定為已指定(assigned)。系統接收選定號誌影像之顯示的影像、辨識顯示的影像以及將辨識的影像與來自該組號誌影像之已指定的影像匹配。系統將顯示位於現實空間之區域中辨識的號誌影像的行動裝置之位置與還未連結識別的主體匹配。在將使用者帳戶匹配至識別的主體之後，系統將辨識的號誌影像之狀態設定為可用的(available)。In one embodiment, the signal transmitted by the mobile device includes images. In the illustrated embodiment, the client application on the mobile device causes the display on the mobile device to display a semaphore image that can be as simple as a specific color in an area of real space. The system matches identified subjects to user accounts by identifying the location of the mobile device by using an image recognition engine that determines the location of the mobile device displaying the sign image. The system includes sets of signal images. The system accepts a login communication from the client application on the mobile device identifying the user account prior to matching the user account to a subject identified in the real-space region. After accepting the login communication, the system sends the selected sign image from the set of sign images to the client application on the mobile device. The system sets the status of the selected sign image to assigned. The system receives displayed images of selected sign images, identifies the displayed images, and matches the identified images with designated images from the set of sign images. The system matches the location of the mobile device displaying the image of the identified sign located in the area of real space with the subject not yet linked to the identification. After matching the user account to the identified subject, the system sets the status of the identified sign image to available.

在一實施例中，由行動裝置發射的信號包含指示行動裝置之服務位置的射頻信號。系統接收由在行動裝置上之客戶端應用傳送的位置資料。系統使用從行動裝置傳送的位置資料來將識別的主體與使用者帳戶匹配。系統使用在現實空間之區域中經一段時間間隔從來自複數個位置的行動裝置傳送的位置資料，用以將識別主體與使用者帳戶匹配。將經識別未匹配的主體與在行動裝置上執行的客戶端應用之使用者帳戶匹配的步驟包括：確定傳送未匹配使用者帳戶之位置資訊的所有其它行動裝置與行動裝置分開了預定距離，以及確定對於行動裝置最近的未匹配經識別的主體。In one embodiment, the signal transmitted by the mobile device includes a radio frequency signal indicating the service location of the mobile device. The system receives location data sent by the client application on the mobile device. The system uses location data transmitted from mobile devices to match identified subjects with user accounts. The system uses location data transmitted from mobile devices from a plurality of locations over time intervals in an area of real space to match identifiers to user accounts. The step of matching the identified unmatched subject with a user account of a client application executing on the mobile device includes determining that all other mobile devices transmitting location information for the unmatched user account are separated from the mobile device by a predetermined distance, and The nearest unmatched identified subject for the mobile device is determined.

在一實施例中，由行動裝置發射的信號包含指示行動裝置之加速或定向的射頻信號。在一實施例中，這類加速資料係由行動計算裝置之加速度計(accelerometer)產生。在另一實施例中，除了加速度計資料之外，來自在行動裝置上羅盤的方向資料亦由處理系統所接收。系統接收來自在行動裝置上之客戶端應用的加速度計資料。系統使用從行動裝置傳送的加速度計資料來將識別的主體與使用者帳戶匹配。在此實施例中，系統使用在現實空間之區域中經一段時間間隔來自複數個位置的從行動裝置傳送的加速度計資料，和使用在現實空間之區域中經該時間間隔指示識別主體之位置的資料之導出(derivative)，用以將識別的主體與使用者帳戶匹配。In one embodiment, the signal transmitted by the mobile device includes a radio frequency signal indicating the acceleration or orientation of the mobile device. In one embodiment, such acceleration data is generated by an accelerometer of the mobile computing device. In another embodiment, in addition to accelerometer data, orientation data from a compass on the mobile device is also received by the processing system. The system receives accelerometer data from the client application on the mobile device. The system uses accelerometer data transmitted from the mobile device to match identified subjects with user accounts. In this embodiment, the system uses accelerometer data transmitted from a mobile device from a plurality of locations over a time interval in a region of real space and indicates the location of an identified subject over the time interval in a region of real space. Derivative of data used to match identified subjects with user accounts.

在一實施例中，系統使用經訓練的網路將識別的主體與使用者帳戶匹配，用以基於由行動裝置發射的信號識別在現實空間之區域中行動裝置的位置。在這樣的實施例中，由行動裝置發射的信號包括位置資料和加速度計資料。In one embodiment, the system matches identified subjects with user accounts using a network trained to identify the location of the mobile device in an area of real space based on signals emitted by the mobile device. In such an embodiment, the signal transmitted by the mobile device includes location data and accelerometer data.

在一實施例中，系統包括登入資料結構，該登入資料結構包括用於識別的主體的一系列庫存品項(inventory item)。系統將用於經匹配識別的主體的登入資料結構關聯至用於識別的主體的使用者帳戶。In one embodiment, the system includes a login data structure including a series of inventory items for an identified subject. The system associates the login data structure for the matched identified principal to the user account for the identified principal.

在一實施例中，系統從在連結至識別的主體的使用者帳戶中識別的支付方法來針對該識別的主體處理用於該系列的庫存品項的支付。In one embodiment, the system processes payment for the series of inventory items for the identified subject from the payment method identified in the user account linked to the identified subject.

在一實施例中，系統將識別的主體與使用者帳戶匹配而不使用與使用者帳戶關聯的個人識別生物特徵資訊。In one embodiment, the system matches identified subjects to user accounts without using personally identifying biometric information associated with the user accounts.

於此亦說明能由電腦系統執行的方法及電腦程式產品。A method and a computer program product executable by a computer system are also described herein.

本發明之其它態樣及益處亦可以在隨後的圖式之檢視、詳細的發明說明以及申請專利範圍上見到。Other aspects and benefits of the present invention can also be found in the inspection of the subsequent drawings, detailed description of the invention and claims.

提出下列說明以使任何本領域具有通常知識者能完成及使用本發明，並且以特定申請案及其要件的脈絡來提供。對揭露的實施例之各種修改將對於本領域具有通常知識之該些者是顯而易見的，並且於此定義的一般原則在不悖離本發明之精神及範圍下可應用到其它實施例及應用中。因此，本發明並不打算受限於繪示的實施例，但要符合與於此揭示的原則和特徵一致的最寬廣範圍。系統概觀The following description is presented to enable any person of ordinary skill in the art to make and use the invention, and is presented in the context of a specific application and its elements. Various modifications to the disclosed embodiments will be apparent to those having ordinary knowledge in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention . Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein. System overview

參考圖1~11來說明主體技術的系統及各種實行。參考圖1來說明系統和過程、依據實行的系統之架構級示意圖。因為圖1為架構圖，故省略某些細節以改善本發明說明的簡明。The system and various implementations of the subject technology are described with reference to FIGS. 1-11 . Referring to FIG. 1 to illustrate the system and process, an architectural level schematic diagram of the implemented system. Since FIG. 1 is an architectural diagram, certain details are omitted to improve clarity of description of the present invention.

圖1的討論係組織如下。首先，說明了系統的元件，隨後是他們的互連。接著，更詳細地說明在系統中元件的使用。The discussion lines of Figure 1 are organized as follows. First, the elements of the system are described, followed by their interconnections. Next, the use of the elements in the system is described in more detail.

圖1提供系統100的方塊圖級圖示。系統100包括攝像機114、主控影像辨識引擎112a、112b及112n的網路節點、佈署在網路上的網路節點102(或節點)中的主體追蹤引擎110、行動計算裝置118a、118b、118m(統稱為行動計算裝置120)、訓練資料庫130、主體資料庫140、使用者帳戶資料庫150、影像資料庫160、佈署在網路節點或節點中匹配引擎170(亦已知為處理平台)103以及通訊網路或網路181。網路節點可以僅主控一個影像辨識引擎或數個影像辨別引擎。系統亦能包括庫存資料庫和其它支援資料。FIG. 1 provides a block diagram level illustration of a system 100 . The system 100 includes a camera 114, a network node controlling image recognition engines 112a, 112b, and 112n, a subject tracking engine 110 deployed in the network node 102 (or nodes) on the network, and mobile computing devices 118a, 118b, 118m (collectively referred to as mobile computing devices 120), training database 130, subject database 140, user account database 150, image database 160, matching engine 170 (also known as processing platform) deployed in network nodes or nodes ) 103 and a communication network or network 181. The network node can only host one image recognition engine or several image recognition engines. The system can also include an inventory database and other supporting information.

如於此所使用的，網路節點為可定址硬體裝置或附接到網路的虛擬裝置，並且其能夠在通訊通道上發送、接收或轉傳資訊至其它網路節點或從其它網路節點發送、接收或轉傳資訊。可以被佈署為硬體網路節點的電子裝置之範例包括所有各種電腦、工作站、膝上型電腦、手持電腦及智慧電話。網路節點可以在基於雲端的(cloud-based)伺服系統上實行。組構為網路節點的多於一個的虛擬裝置可以使用單一實體裝置來實行。As used herein, a network node is an addressable hardware device or virtual device attached to a network and capable of sending, receiving, or forwarding information over a communication channel to or from other network nodes Nodes send, receive or relay information. Examples of electronic devices that can be deployed as hardware network nodes include all kinds of computers, workstations, laptops, handheld computers, and smart phones. Network nodes can be implemented on cloud-based server systems. More than one virtual device configured as a network node can be implemented using a single physical device.

為了簡明的緣故，在系統100中僅繪示三個主控影像辨識引擎的網路節點。然而，任何數目的主控影像辨識引擎的網路節點可以透過網路181連接至主體追蹤引擎110。同樣的，在系統100中繪示了三個行動計算裝置。也同樣的，任何數目的行動計算裝置可以透過網路181連接到主控匹配引擎170的網路節點103。也同樣的，於此描述的影像辨識引擎、主體追蹤引擎、匹配引擎或其它處理引擎可以使用多於一的網路節點在分佈式架構中執行。For the sake of simplicity, only three network nodes mastering the image recognition engine are shown in the system 100 . However, any number of network nodes hosting the image recognition engine can be connected to the subject tracking engine 110 through the network 181 . Likewise, three mobile computing devices are shown in system 100 . Likewise, any number of mobile computing devices can be connected to the network node 103 hosting the matching engine 170 through the network 181 . Likewise, the image recognition engine, subject tracking engine, matching engine, or other processing engine described herein may be implemented in a distributed architecture using more than one network node.

現將說明系統100之元件的互連。網路181分別將網路節點101a、101b及101n，主控影像辨識引擎112a、112b及112n、主控主體追蹤引擎110的網路節點102、行動計算裝置118a、118b及118m、訓練資料庫130、主體資料庫140、使用者帳戶資料庫150、影像資料庫160以及主控匹配引擎170的網路節點103耦接。攝像機114係透過主控影像辨識引擎112a、112b及112n的網路節點來連接至主體追蹤引擎110。在一實施例中，攝像機114係安裝於購物商店中使得具有重疊視域(fields of view)的成組攝像機114(二個以上)定位於各個走道之上用以捕捉在商店中現實空間之影像。在圖1中，兩個攝像機佈設在走道116a之上、兩個攝像機佈設在走道116b之上以及三個攝像機佈設在走道116n之上。攝像機114以重疊視域來安裝在走道之上。在這樣的實施例中，攝像機係以顧客在購物商店中移動係在任何時間上的時刻二個以上攝像機的視域中出現的目標來組構。The interconnection of the elements of system 100 will now be described. The network 181 respectively connects the network nodes 101a, 101b and 101n, the image recognition engines 112a, 112b and 112n, the network node 102 which controls the subject tracking engine 110, the mobile computing devices 118a, 118b and 118m, and the training database 130 , the subject database 140 , the user account database 150 , the image database 160 and the network node 103 hosting the matching engine 170 are coupled. The camera 114 is connected to the subject tracking engine 110 through a network node hosting the image recognition engines 112a, 112b, and 112n. In one embodiment, the cameras 114 are installed in a shopping store such that groups of cameras 114 (two or more) with overlapping fields of view are positioned above each aisle to capture images of the real space in the store . In FIG. 1 , two cameras are arranged on the walkway 116a, two cameras are arranged on the walkway 116b, and three cameras are arranged on the walkway 116n. Cameras 114 are mounted above the walkway with overlapping fields of view. In such an embodiment, the cameras are configured with the customer moving through the shopping store and objects appearing in the fields of view of more than two cameras at any one time.

攝像機114可以在時間上彼此同步，使得在相同時間或在時間上接近地並且在相同影像捕捉率上來捕捉影像。攝像機114可以在預定的速率將影像的分別持續串流發送至主控影像辨識引擎112a~112n的網路節點。在相同時間或在時間上接近涵蓋現實空間之區域的所有攝像機中捕捉的影像係在同步的影像可以在處理引擎中被識別為代表在現實空間中具有固定位置的主體之不同視點的意義上進行同步。例如，在一實施例中，攝像機在每秒30訊框的速率(fps)發送影像訊框給主控影像辨識引擎112a~112n的分別的網路節點。各個訊框連同影像資料具有時間戳(timestamp)、攝像機之識別(簡寫為「camera_id」)以及訊框識別(簡寫為「frame_id」)。揭示的技術之其它實施例可以使用不同類型的感測器，像是紅外線或RF影像感測器、超音波感測器、熱感測器、光學雷達(Lidar)等，用以產生此資料。可以使用多個類型的感測器，例如包括除了產生RGB色彩輸出的攝像機114以外的超音波或RF感測器。多個感測器可以在時間上彼此同步，使得在相同時間或在時間上接近地由感測器並且在相同影像捕捉率上捕捉訊框。在於此描述的所有的實施例中，排除攝像機的感測器、或是多個類型的感測器可以使用來生成利用的影像之序列。The cameras 114 may be temporally synchronized with each other such that imagery is captured at the same time or close in time and at the same image capture rate. The camera 114 may send respective continuous streams of images at a predetermined rate to the network nodes hosting the image recognition engines 112a-112n. Images captured in all cameras at the same time or close in time to an area covering real space are performed in the sense that the synchronized images can be recognized in the processing engine as representing different viewpoints of a subject with a fixed position in real space Synchronize. For example, in one embodiment, the camera sends image frames at a rate of 30 frames per second (fps) to respective network nodes that host the image recognition engines 112a-112n. Each frame together with the image data has a time stamp (timestamp), a camera identification (abbreviated as "camera_id"), and a frame identification (abbreviated as "frame_id"). Other embodiments of the disclosed technology may use different types of sensors, such as infrared or RF image sensors, ultrasonic sensors, thermal sensors, Lidar, etc., to generate this data. Multiple types of sensors may be used including, for example, ultrasonic or RF sensors in addition to the camera 114 producing RGB color output. Multiple sensors may be synchronized in time with each other such that frames are captured by the sensors at the same time or in close proximity in time and at the same image capture rate. In all of the embodiments described herein, sensors other than cameras, or multiple types of sensors, may be used to generate the sequence of images utilized.

安裝在走道之上的攝像機連接至分別的影像辨識引擎。例如，在圖1中，安裝在走道116a之上的兩個攝像機係連接至主控影像辨識引擎112a的網路節點101a。類似的，安裝在走道116b之上的兩個攝像機係連接至主控影像辨識引擎112b的網路節點101b。主控網路節點或節點101a~101n的各個影像辨識引擎112a~112n分開地處理從在示出的範例各者中的一攝像機接收的影像訊框。The cameras installed on the walkways are connected to the respective image recognition engines. For example, in FIG. 1, two cameras installed on the walkway 116a are connected to the network node 101a that hosts the image recognition engine 112a. Similarly, the two cameras installed on the walkway 116b are connected to the network node 101b that hosts the image recognition engine 112b. Each image recognition engine 112a-112n of the host network node or nodes 101a-101n separately processes image frames received from a camera in each of the illustrated examples.

在一實施例中，各個影像辨識引擎112a、112b及112n被實行為深度學習演算法(deep learning algorithm)，像是卷積神經網路(簡稱CNN(convolutional neural network))。在這類實施例中，CNN係使用訓練資料庫130來訓練。在於此說明的實施例中，在現實空間中主體之影像辨識係基於將在影像中可辨識的關節(joint)識別且分組，其中關節的分組可歸因於個別主體。為了此基於關節的分析，訓練資料庫130具有對於用於主體之不同類型關節之各者的大量收集的影像。在購物商店之範例實施例中，主體為在貨架之間的走道中移動的顧客。在範例實施例中，於CNN的訓練期間，系統100係稱為「訓練系統」。在使用訓練資料庫130訓練CNN之後，CNN被切換到生產模式，用以處理在實時中於購物商店中的顧客的影像。In one embodiment, each of the image recognition engines 112a, 112b, and 112n is implemented as a deep learning algorithm, such as a convolutional neural network (CNN (convolutional neural network) for short). In such embodiments, the CNN is trained using the training database 130 . In the embodiments described herein, image recognition of subjects in real space is based on identifying and grouping joints identifiable in the images, where the grouping of joints is attributable to individual subjects. For this joint-based analysis, the training database 130 has a large collection of images for each of the different types of joints of the subject. In an exemplary embodiment of a shopping store, the subjects are customers moving in the aisles between the shelves. In the exemplary embodiment, during the training of a CNN, the system 100 is referred to as a "training system". After training the CNN using the training database 130, the CNN is switched to production mode to process images of customers in a shopping store in real time.

在範例實施例中，於生產(production)期間，系統100被稱為執行時系統(runtime system )(亦稱為推論系統(inference system))。在各個影像辨識引擎中的CNN在其分別影像之串流中生成用於影像的關節資料結構。如於此所描述的實施例中，針對各個處理的影像生成關節資料結構之陣列，使得各個影像辨識引擎112a~112n生成關節資料結構之陣列的輸出串流。進一步處理來自具有重疊視域之攝像機的該些關節資料之陣列用以形成成群的關節，並且用以將這類成群的關節識別為主體。這些成群的關節可能不會唯一識別在影像中的個體或是在影像中用於個體的真實使用者帳戶(authentic user account)，但是能被使用來追蹤在區域中的主體。主體可以在他們在現實空間的區域中出現的期間使用識別符「subject_id」由系統識別及追蹤。In an exemplary embodiment, during production, system 100 is referred to as a runtime system (also referred to as an inference system). The CNN in each image recognition engine generates the joint data structure for the image in its stream of respective images. As in the embodiment described herein, an array of joint data structures is generated for each processed image such that each image recognition engine 112a-112n generates an output stream of the array of joint data structures. The arrays of joint data from cameras with overlapping views are further processed to form groups of joints and to identify such groups of joints as subjects. These groups of joints may not uniquely identify the individual in the image or an authentic user account for the individual in the image, but can be used to track subjects in the area. Subjects can be identified and tracked by the system during their presence in regions of real space using the identifier "subject_id".

例如，當顧客進入購物商店時，系統使用如上所述的關節分析來識別顧客並且被指定為「subject_id」。然而，識別符並未連結至主體之現實世界身分，像是使用者帳戶、名字、駕駛執照、電子郵件位址、郵寄地址、信用卡號碼、銀行帳戶號碼、駕駛執照號碼等，或是識別生物特徵認證(biometric identification)，像是指紋(finger prints)、臉部辨識、掌形(hand geometry)、視網膜掃描(retina scan)、虹膜掃描(iris scan)、語音辨識等。因此，經識別的主體為匿名的。對於主體認證且追蹤的範例技術的細節係提出於2018年8月21日公告之名稱為「Subject Identification and Tracking Using Image Recognition Engine」的美國專利第10,055,853號，其於此藉參考併入，即如其完全於此提出一樣。For example, when a customer enters a shopping store, the system uses joint analysis as described above to identify the customer and is assigned a "subject_id". However, an identifier is not linked to a subject's real-world identity, such as a user account, name, driver's license, email address, mailing address, credit card number, bank account number, driver's license number, etc., or to identify biometrics Authentication (biometric identification), such as fingerprints (finger prints), facial recognition, hand geometry, retina scan, iris scan, voice recognition, etc. Therefore, identified subjects are anonymous. Details of an example technique for subject authentication and tracking are set forth in U.S. Patent No. 10,055,853, issued August 21, 2018, entitled "Subject Identification and Tracking Using Image Recognition Engine," which is hereby incorporated by reference as if Exactly the same as proposed here.

在此範例中，主控於網路節點102上的主體追蹤引擎110從影像辨識引擎112a~112n接收用於主體的關節資料結構之陣列的持續串流。主體追蹤引擎110處理關節資料結構之陣列且將對應於在不同序列中之影像的在關節資料結構之陣列中元件之座標轉譯成在現實空間中具有座標的候選關節。對於各組同步的影像，為了類比的目的，可以考慮遍及現實空間識別的候選關節之結合而像是候選關節的星系(galaxy)。對於在時間上的各個後續點，記錄了候選關節之行動使得星系隨時間改變。主體追蹤引擎110之輸出係儲存於主體資料庫140中。In this example, subject tracking engine 110 hosted on network node 102 receives a continuous stream of arrays of joint data structures for subjects from image recognition engines 112a-112n. The subject tracking engine 110 processes the array of joint data structures and translates coordinates of elements in the array of joint data structures corresponding to images in different sequences into candidate joints with coordinates in real space. For each set of synchronized images, for analogy purposes, the combination of candidate joints identified throughout the real space can be considered like a galaxy of candidate joints. For each subsequent point in time, the actions of the candidate joints causing the galaxy to change over time are recorded. The output of subject tracking engine 110 is stored in subject database 140 .

主體追蹤引擎110使用用以將在現實空間中具有座標之成群或成組的候選關節識別為在現實空間中的主體的邏輯。為了類比的目的，各組候選點像是在各個時間上的點處候選關節的星座圖(constellation)。候選關節的星座圖可以隨時間移動。The subject tracking engine 110 uses logic to identify groups or groups of candidate joints having coordinates in real space as subjects in real space. For analogy purposes, each set of candidate points is like a constellation of candidate joints at various points in time. The constellation of candidate joints can be shifted over time.

在範例實施例中，用以識別成組的候選關節的邏輯包含基於在現實空間中主體之關節之間實體關係的試探函數(heuristic function)。這些試探函數被用來將成組的候選關節識別為主體。成組的候選關節包含依據試探參數具有與其它個別候選關節關係的個別候選關節和在已被識別為或可以被識別為個別主體的給定組/集合中候選關節的子集。In an example embodiment, the logic to identify the set of candidate joints includes a heuristic function based on physical relationships between joints of the subject in real space. These heuristic functions are used to identify groups of candidate joints as subjects. The set of candidate joints includes individual candidate joints that have relationships to other individual candidate joints according to heuristic parameters and a subset of candidate joints in a given group/collection that have been identified or can be identified as individual subjects.

在購物商店的範例中，當顧客完成購物且移動到商店外時，系統處理由顧客購買的品項之支付。在無收銀員商店中，系統必需將顧客與包含由顧客提供之喜好的支付方法的「使用者帳戶」連結。In the example of a shopping store, the system processes payment for items purchased by the customer when the customer finishes shopping and moves out of the store. In a cashierless store, the system must link the customer to a "user account" that contains the preferred payment method provided by the customer.

如上所述，因為關於關節的資訊及關節之間的關係未被儲存為連結至個體或使用者帳戶的生物特徵識別資訊，故「識別的主體」為匿名的。As noted above, an "identified subject" is anonymous because information about joints and relationships between joints is not stored as biometric information linked to an individual or user account.

系統包括匹配引擎170(受主控於網路節點103上)，用以處理從行動計算裝置120(由主體攜載)接收的信號來將識別的主體與使用者帳戶匹配。可以藉由識別在現實空間(例如，購物商店)之區域中執行客戶端應用的行動裝置之位置以及將行動裝置之位置與主體的位置匹配而不使用來自影像的個人識別生物特徵資訊來進行匹配。The system includes a matching engine 170 (hosted on the network node 103) for processing signals received from the mobile computing device 120 (onboard the subject) to match identified subjects with user accounts. Matching can be performed by identifying the location of the mobile device executing the client application in an area of real space (e.g., a shopping store) and matching the location of the mobile device to the location of the subject without using personally identifying biometric information from the imagery .

透過網路181到主控匹配引擎170的網路節點103的實際通訊路徑可以為在公眾及/或私人網路之上的點對點。通訊可以在各種網路181之上發生，例如私人網路、VPN、MPLS電路或網際網路，並且可以使用適當的應用程式化介面(API; application programming interface)以及資料交換格式，例如表現層狀態轉換(REST; Representational State Transfer)、JavaScript^TM 物件表示法(JSON; JavaScript^TM Object Notation)、可延伸標記式語言(XML; Extensible Markup Language)、簡單物件存取協定(SOAP; Simple Object Access Protocol)、Java^TM 訊息服務(JMS; Java^TM Message Service)及/或Java平台模組系統。可以加密所有的通訊。通訊一般是在網路之上，網路像是LAN(區域網路)、WAN(寬域網路)、電話網路(公眾交換電話網路(PSTN; Public Switched Telephone Network ))、對話啟動協定(SIP; Session Initiation Protocol)、無線網路、點對點網路、星狀網路(star network)、符記環形網路(token ring network)、集線網路(hub network)、網際網路，包括經由像是EDGE、3G、4G LTE、Wi-Fi及WiMAX的協定的行動網際網路的網際網路。此外，各種授權和認證技術，像是使用者名稱/密碼、開放授權(OAuth; Open Authorization)、Kerberos、SecureID、數位憑證及更多者，可以被使用來保全通訊。The actual communication path through network 181 to network node 103 hosting matching engine 170 may be point-to-point over public and/or private networks. Communications can take place over various networks 181, such as private networks, VPNs, MPLS circuits, or the Internet, and can use appropriate application programming interfaces (APIs; application programming interfaces) and data exchange formats, such as presentation layer status Conversion (REST; Representational State Transfer), JavaScript ^TM Object Notation (JSON; JavaScript ^TM Object Notation), Extensible Markup Language (XML; Extensible Markup Language), Simple Object Access Protocol (SOAP; Simple Object Access Protocol), Java ^TM Message Service (JMS; Java ^TM Message Service) and/or the Java Platform Module System. All communication can be encrypted. Communication is generally over a network such as LAN (Local Area Network), WAN (Wide Area Network), telephone network (Public Switched Telephone Network (PSTN; Public Switched Telephone Network )), session initiation protocol (SIP; Session Initiation Protocol), wireless network, point-to-point network, star network, token ring network, hub network, Internet, including via Mobile Internet of protocols such as EDGE, 3G, 4G LTE, Wi-Fi and WiMAX. Additionally, various authorization and authentication techniques, such as username/password, Open Authorization (OAuth; Open Authorization), Kerberos, SecureID, digital certificates, and more, can be used to secure communications.

於此揭示的技術可在任何電腦實行系統的脈絡中實行，包括資料庫系統、多租戶環境，或是關聯式資料庫(relational database)實行(像是Oracle™相容資料庫實行、IBM DB2 Enterprise Server™ 相容關聯式資料庫實行、MySQL™或PostgreSQL™ 相容關聯式資料庫實行或Microsoft SQL Server™相容關聯式資料庫實行)，或是NoSQL™非關聯式資料庫實行(諸如Vampire™相容非關聯式資料庫實行、Apache Cassandra™相容非關聯式資料庫實行、BigTable™相容非關聯式資料庫實行或是HBase™或DynamoDB™相容非關聯式資料庫實行)。此外，揭露的技術可以使用不容的程式化模型來實行，像是MapReduce™、整體同步程式化(bulk synchronous programming)、MPI基元等，或是不同可擴充批次(batch)，以及串流管理系統，其像是Apache Storm™、Apache Spark™、Apache Kafka™、Apache Flink™、Truviso™、Amazon Elasticsearch Service™、Amazon Web Services™ (AWS)、IBM Info-Sphere™、Borealis™以及Yahoo! S4™。攝像機佈設The techniques disclosed herein can be implemented in the context of any computer implementation system, including database systems, multi-tenant environments, or relational database implementations (such as Oracle™ compatible database implementations, IBM DB2 Enterprise Server™ compatible relational database implementation, MySQL™ or PostgreSQL™ compatible relational database implementation, or Microsoft SQL Server™ compatible relational database implementation), or a NoSQL™ non-relational database implementation (such as Vampire™ Compatible non-relational database implementation, Apache Cassandra™ compatible non-relational database implementation, BigTable™ compatible non-relational database implementation, or HBase™ or DynamoDB™ compatible non-relational database implementation). Furthermore, the disclosed techniques can be implemented using incompatible programming models such as MapReduce™, bulk synchronous programming, MPI primitives, etc., or different scalable batch and stream management Systems such as Apache Storm™, Apache Spark™, Apache Kafka™, Apache Flink™, Truviso™, Amazon Elasticsearch Service™, Amazon Web Services™ (AWS), IBM Info-Sphere™, Borealis™, and Yahoo! S4™ . Camera layout

攝像機114係佈設以追蹤在三維(簡稱為3D)現實空間中的多關節主體(或實體)。在購物商店的範例實施例中，現實空間可以包括其中出售的品項堆疊於貨架中的購物商店之區域。在現實空間中的點可以由(x, y, z)座標系統表示。系統所對之佈署的現實空間之區域中的各個點由二個以上的攝像機114的視域所涵蓋。The camera 114 is arranged to track a multi-joint body (or entity) in a three-dimensional (abbreviated as 3D) real space. In an example embodiment of a shopping store, the physical space may include an area of the shopping store where items for sale are stacked in shelves. A point in real space can be represented by an (x, y, z) coordinate system. Each point in the region of real space for which the system is deployed is covered by the fields of view of two or more cameras 114 .

在購物商店中，貨架及其它庫存顯示結構可以用各種方式來佈設，像是沿著購物商店之牆壁，或是在形成走道的行列中，或是兩種佈設的結合。圖2繪示從走道116a之一端觀看形成走道116a的貨架之佈設。兩台攝像機，攝像機A 206和攝像機B 208係與在庫存顯示結構(像是貨架)上的購物商店之屋頂230及地板220之預定距離處定位於走道116a之上。攝像機114包含配置於在現實空間中庫存顯示結構及地板區域之分別部分之上且具有其視域的攝像機。在成組候選關節之成員(識別為主體)的現實空間中的座標識別在地板區域中的主體之位置。在圖2中，主體240正握持行動計算裝置118a且站在走道116a中的地板220上。行動計算裝置可以透過無線網路181發送及接收信號。在一實施例中，行動計算裝置120透過使用例如Wi-Fi協定或其它無線協定(像是藍牙、超頻寬(ultra-wideband)和群蜂(Zigbee))的無線網路、透過無線存取點(WAP; wireless access point)250和252來進行通訊。In a shopping store, shelves and other inventory display structures can be arranged in various ways, such as along the walls of the shopping store, or in rows forming aisles, or a combination of the two. FIG. 2 shows the arrangement of shelves forming the aisle 116a viewed from one end of the aisle 116a. Two cameras, Camera A 206 and Camera B 208, are positioned above the aisle 116a at a predetermined distance from the roof 230 and floor 220 of the shopping store on an inventory display structure such as a shelf. Cameras 114 include cameras disposed over and having their fields of view over respective portions of the inventory display structure and floor area in real space. The coordinates in real space of the members of the set of candidate joints (identified as subjects) identify the positions of the subjects in the floor area. In FIG. 2, body 240 is holding mobile computing device 118a and standing on floor 220 in walkway 116a. The mobile computing device can send and receive signals through the wireless network 181 . In one embodiment, the mobile computing device 120 is connected via a wireless access point through a wireless network using, for example, the Wi-Fi protocol or other wireless protocols such as Bluetooth, ultra-wideband, and Zigbee. (WAP; wireless access point) 250 and 252 for communication.

在購物商店的範例實施例中，現實空間可以包括庫存能從其存取的購物商店中地板220之全部。將攝像機114使得地板220之區域及貨架可以被至少兩台攝像機所見到來放置且定向。攝像機114亦涵蓋貨架202及204和在貨架202及204前面的地板空間的至少部分。選定攝像機角度以具有陡峭透視(steep perspective)、筆直向下及給予比顧客身體影像更完全的斜角透視(angled perspective)兩者。在一範例實施例中，攝像機114係組構以八(8)呎高處或遍及購物商店的更高處。In the example embodiment of a shopping store, the real space may include the entirety of the floor 220 in the shopping store from which inventory can be accessed. The cameras 114 are positioned and oriented such that the area of the floor 220 and the shelves are visible to at least two cameras. Camera 114 also covers shelves 202 and 204 and at least a portion of the floor space in front of shelves 202 and 204 . The camera angles are selected to both have a steep perspective, straight down and give a fuller angled perspective than the customer's body image. In an exemplary embodiment, the cameras 114 are configured at eight (8) feet high or higher throughout the shopping store.

在圖2中，攝像機206及208具有重疊視域，涵蓋於分別具有重疊視域216和218的貨架A 202和貨架B 204之間的空間。在現實空間的位置表示為現實空間座標系統的(x, y, z)點。「x」和「y」表示在二維(2D)平面上的位置，其可以為購物商店的地板220。值「z」為在一組構中地板220處的2D平面上方之點的高度。In FIG. 2, cameras 206 and 208 have overlapping fields of view covering the space between shelf A 202 and shelf B 204 having overlapping fields of view 216 and 218, respectively. A position in real space is represented as an (x, y, z) point of the real space coordinate system. "x" and "y" represent locations on a two-dimensional (2D) plane, which may be the floor 220 of a shopping store. The value "z" is the height of a point above the 2D plane at floor 220 in a configuration.

圖3示出從圖2頂部觀看的走道116a，進一步繪示在走道116a之上攝像機206及208的位置之範例佈設。攝像機206及208係定位於較接近走道116a的相對端。攝像機A 206係定位於距貨架A 202的預定距離處而攝像機B 208係定位於距貨架B 204的預定距離處。在其中二台以上的攝像機係定位於走道之上的另一實施例中，攝像機係定位於彼此等距離處。在這樣的實施例中，兩台攝像機係位於接近相對端處而第三台攝像機係位於走道的中間。要了解的是，若干個攝像機佈設是可能的。關節資料結構FIG. 3 shows the walkway 116a viewed from the top of FIG. 2, further illustrating an example arrangement of the locations of the cameras 206 and 208 above the walkway 116a. Cameras 206 and 208 are located closer to opposite ends of walkway 116a. Camera A 206 is positioned at a predetermined distance from shelf A 202 and camera B 208 is positioned at a predetermined distance from shelf B 204 . In another embodiment where more than two cameras are positioned above the walkway, the cameras are positioned equidistant from each other. In such an embodiment, two cameras are located near opposite ends and a third camera is located in the middle of the aisle. It is to be appreciated that several camera arrangements are possible. Joint Data Structure

影像辨識引擎112a~112n從攝像機114接收影像之序列並且處理影像以產生對應的關節資料結構之陣列。在一實施例中，影像辨識引擎112a~112n識別在各個影像之元件處各個主體之19個可能的關節其中一者。可以將可能的關節分組在兩個種類：足關節及非足關節。第19類型的關節分類係用於主體之所有非關節特徵(亦即，影像之元件未分類為關節)。足關節：踝關節(左和右) 非足關節：頸鼻眼(左和右) 耳(左和右) 肩(左和右) 肘(左和右) 腕(左和右) 髖(左和右) 膝(左和右) 非關節The image recognition engines 112a-112n receive a sequence of images from the camera 114 and process the images to generate an array of corresponding joint data structures. In one embodiment, the image recognition engines 112a-112n identify one of 19 possible joints for each subject at each element of the image. The possible joints can be grouped in two categories: foot joints and non-foot joints. Type 19 joint classification is used for all non-joint features of the subject (ie, elements of the image not classified as joints). Foot joints: Ankle (left and right) Non-foot joints: neck nose eye (left and right) ear (left and right) shoulders (left and right) elbow (left and right) Wrist (left and right) hip (left and right) knee (left and right) non-joint

用於特定影像的關節資料結構之陣列藉由關節類別、特定影像之時間以及在特定影像中元件之座標將特定影像之元件分類。在一實施例中，影像辨識引擎112a~112n為卷積神經網路(CNN)，關節類型為主體之19個類型其中一者，特定影像之時間為由用於特定影像的來源攝像機114所產生的影像之時間戳，以及座標(x, y)識別在2D影像平面上元件的位置。The array of joint data structures for a particular image categorizes elements of a particular image by joint type, time of the particular image, and coordinates of the element within the particular image. In one embodiment, the image recognition engines 112a~112n are convolutional neural networks (CNN), the joint type is one of the 19 types of subjects, and the time of the specific image is generated by the source camera 114 for the specific image The time stamp of the image, and the coordinates (x, y) identify the position of the component on the 2D image plane.

CNN的輸出為每攝像機用於各個影像的信心陣列(confidence array)之矩陣。信心陣列之矩陣被變換成關節資料結構之矩陣。如圖4中所繪示的關節資料結構400係使用來儲存各個關節的資訊。關節資料結構400識別影像從其接收的攝像機之2D影像空間中特定影像中的元件之x和y位置。關節數目識別經識別的關節之類型。例如，在一實施例中，值的範圍從1到19。1的值指示關節為左腳踝，2的值指示關節為右腳踝等等。使用針對CNN之中出矩陣中之元件的信心陣列來選定關節之類型。例如，在一實施例中，若對應至左腳踝的值在用於該影像元件的信心陣列中為最高，則接著關節數的值為「1」。The output of the CNN is a matrix of confidence arrays per camera for each image. The matrix of confidence arrays is transformed into a matrix of joint data structures. A joint data structure 400 as shown in FIG. 4 is used to store information about each joint. The joint data structure 400 identifies the x and y positions of elements in a particular image in the 2D image space of the camera from which the image is received. The joint number identifies the type of joint identified. For example, in one embodiment, the values range from 1 to 19. A value of 1 indicates the joint is the left ankle, a value of 2 indicates the joint is the right ankle, and so on. The type of joint is selected using the confidence array for the elements in the CNN's in-out matrix. For example, in one embodiment, if the value corresponding to the left ankle is the highest in the confidence array for that image element, then the value of the next joint number is "1".

信心數指示在預測該關節上CNN的信心之程度。若信心數的值為高，則其意味CNN在其預測上是有信心的。整數-Id被指定為關節資料結構，用以將其唯一地識別。遵循上述映射，每影像的信心陣列之輸出矩陣被轉換成用於各個影像的關節資料結構之陣列。在一實施例中，關節分析包括進行k個最近相鄰者結合、混合高斯以及在各個輸入影像上各種影像的形態變換(morphology transformation)。結果包含關節資料結構之陣列，其能以位元遮罩(bit mask)的形式儲存於在時間上各個時刻將影像數目映射至位元遮罩的環狀緩衝器中。主體追蹤引擎The confidence number indicates how confident the CNN is in predicting this joint. If the value of the confidence number is high, it means that the CNN is confident in its predictions. An integer-Id is specified in the joint data structure to uniquely identify it. Following the above mapping, the output matrix of confidence arrays for each image is transformed into an array of joint data structures for each image. In one embodiment, joint analysis includes performing k-nearest neighbor union, Gaussian mixture, and morphology transformation of various images on each input image. The result contains an array of joint data structures that can be stored as bit masks in a ring buffer that maps image numbers to bit masks at various points in time. subject tracking engine

追蹤引擎110係組構以從具有重疊視域的攝像機接收對應於影像之序列之影像的影像辨識引擎112a~112n所產生的關節資料結構之陣列。每影像的關節資料結構之陣列係由影像辨識引擎112a~112n經由網路181發送至追蹤引擎110。追蹤引擎110將對應至在不同序列中影像的關節資料結構之陣列中元件的座標轉譯成在現實空間中具有座標的候選關節。追蹤引擎110包含用以將在現實空間中具有座標之成組的候選關節(關節之星座圖)識別為在現實空間中的主體的邏輯。在一實施例中，追蹤引擎110在時間上給定時刻針對所有攝像機累計來自影像辨識引擎的關節資料結構之陣列，並且將此資訊儲存為在主體資料庫140中的字典，用以被使用於識別候選關節的星座圖。字典可以鍵值對(key-value pairs)的形式來安排，其中鍵為攝像機id而值為來自攝像機的關節資料結構之陣列。在這樣的實施例中，此字典被使用於基於試探(heuristics-based)的分析中，用以決定候選關節並且用於將關節指定到主體。在這樣的實施例中，追蹤引擎110的高階輸入(high-level input)、處理及輸出係例示於表1中。由主體追蹤引擎110應用的用於藉由將候選關節結合以建立主體以及在現實空間之區域中追蹤主體之移動的邏輯之細節係在2018年8月21日公告的名為「Subject Identification and Tracking Using Image Recognition Engine」的美國專利第10,055,853號中提出，其於此藉參考併入。表1：在範例實施例中來自主體追蹤引擎110的輸入、處理及輸出。

主體資料結構Tracking engine 110 is configured to receive an array of joint data structures generated by image recognition engines 112a-112n corresponding to images of a sequence of images from cameras with overlapping views. The array of joint data structures for each image is sent by the image recognition engines 112 a - 112 n to the tracking engine 110 via the network 181 . The tracking engine 110 translates coordinates corresponding to elements in the array of joint data structures of images in different sequences into candidate joints with coordinates in real space. Tracking engine 110 includes logic to identify a set of candidate joints (constellation of joints) with coordinates in real space as subjects in real space. In one embodiment, the tracking engine 110 accumulates an array of joint data structures from the image recognition engine for all cameras at a given moment in time and stores this information as a dictionary in the subject database 140 for use in Constellation diagrams for identifying candidate joints. Dictionaries can be arranged in key-value pairs, where the key is the camera id and the value is an array from the camera's joint data structure. In such embodiments, this dictionary is used in a heuristics-based analysis to determine candidate joints and to assign joints to subjects. In such an embodiment, the high-level input, processing, and output of the tracking engine 110 are illustrated in Table 1. Details of the logic applied by the subject tracking engine 110 to create a subject by combining candidate joints and to track the subject's movement in a region of real space are published on August 21, 2018 entitled "Subject Identification and Tracking US Patent No. 10,055,853, "Using Image Recognition Engine," which is hereby incorporated by reference. Table 1: Input, processing and output from subject tracking engine 110 in an example embodiment.

Master data structure

主體追蹤引擎110使用試探式(heuristics)來將由影像辨識引擎112a~112n所識別的主體之關節連接。這樣做，主體追蹤引擎110創建新的主體且藉由更新他們分別的關節位置來更新現存主體的位置。主體追蹤引擎110使用三角測量技術來將關節之位置從2D空間座標(x, y)投射到3D現實空間座標(x, y, z)。圖5繪示使用來儲存主體的主體資料結構500。主體資料結構500儲存將主體相關資料儲存為鍵值字典(key-value dictionary)。鍵為frame_number(訊框號)而值為另一個鍵值字典，其中鍵為camera_id(攝像機識別)而值為具有在現實空間中他們位置之(主體的)一系列18個關節。主體資料係儲存於主體資料庫140中。每一個新的主體亦被指定一唯一識別符，其被使用來存取在主體資料庫140中主體的資料。The subject tracking engine 110 uses heuristics to connect the joints of the subjects identified by the image recognition engines 112a-112n. In doing so, the subject tracking engine 110 creates new subjects and updates the positions of existing subjects by updating their respective joint positions. The subject tracking engine 110 uses triangulation techniques to project joint positions from 2D space coordinates (x, y) to 3D real space coordinates (x, y, z). FIG. 5 illustrates a principal data structure 500 used to store principals. The subject data structure 500 stores subject-related data as a key-value dictionary. The key is frame_number and the value is another key-value dictionary where the key is camera_id (camera identification) and the value is a series of 18 joints (of the subject) with their positions in real space. The subject data is stored in the subject database 140 . Each new subject is also assigned a unique identifier, which is used to access the subject's data in subject database 140 .

在一實施例中，系統識別主體的關節且創建主體的骨架(skeleton)。骨架被投射到現實空間中，其指示在現實空間中主體之位置和定向。此在機器視覺(machine vision)領域中被稱為「姿態估測(pose estimation)」。在一實施例中，系統在圖形使用者介面(GUI; graphical user interface)上顯示在現實空間中主體的定向及位置。在一實施例中，影像分析為匿名的，亦即指定到透過關節分析的唯一識別符如上所述並未識別主體之個人身分。匹配引擎In one embodiment, the system identifies the joints of the subject and creates a skeleton of the subject. A skeleton is projected into real space, which indicates the position and orientation of the subject in real space. This is called "pose estimation" in the field of machine vision. In one embodiment, the system displays the orientation and position of the subject in real space on a graphical user interface (GUI). In one embodiment, the image analysis is anonymous, ie, the unique identifier assigned to the joint analysis does not personally identify the subject as described above. matching engine

匹配引擎170包括用以藉由識別在現實空間之區域中執行客戶端應用的行動裝置(由識別的主體所攜載)之位置而將識別的主體與他們分別的使用者帳戶匹配的邏輯。在一實施例中，匹配引擎獨立地或結合地使用多種技術來將識別的主體與使用者帳戶匹配。可以在不保有關於使用者的生物特徵識別資訊下實行本系統，使得關於帳戶持有者的生物特徵資訊不會被暴露到由這類資訊之分布引發的安全及隱私關切。Matching engine 170 includes logic to match identified subjects with their respective user accounts by identifying the locations of mobile devices (carried by identified subjects) executing client applications in regions of real space. In one embodiment, the matching engine uses a variety of techniques, independently or in combination, to match identified subjects to user accounts. The system can be implemented without retaining biometric information about users such that biometric information about account holders is not exposed to the security and privacy concerns that arise from the distribution of such information.

在一實施例中，顧客在進入購物商店之時使用在個人行動計算裝置上執行的客戶端應用、識別要與在行動裝置上客戶端應用關聯的真實使用者帳戶用來登入到系統。系統接著將從在影像資料庫160中成組的未指定的號誌影像中選定的「號誌」影像發送到在行動裝置上執行的客戶端應用。當相同影像並未被釋出用於與在商店中另一客戶端應用一起使用時，號誌影像對於在購物商店中的該客戶端應用是唯一的，直到系統已將使用者帳戶匹配至識別的主體為止。在該匹配之後，號誌影像變為可利用於再次使用。客戶端應用引起行動裝置顯示號誌影像，其號誌影像之顯示為由要被系統偵測的行動裝置發射的信號。匹配引擎170使用影像辨識引擎112a~n或是分開的影像辨識引擎(未繪示於圖1)，用以辨識號誌影像且決定在該購物商店中顯示該號誌的行動計算裝置的位置。匹配引擎170將行動計算裝置之位置匹配到識別的主體之位置。匹配引擎170接著將識別的主體(儲存於主體資料庫140中)連結至於主體出現在購物商店中的期間連結到客戶端應用的使用者帳戶(儲存於使用者帳戶資料庫150)。未將生物特徵資訊使用於將識別的主體與使用者帳戶匹配，並且在支持此過程中沒有東西被儲存。亦即，在影像之序列中沒有資訊被使用來為了在支持此過程中將識別的主體與使用者帳戶匹配的目的而與儲存的生物特徵資訊比較。In one embodiment, a customer, upon entering a shopping store, uses a client application executing on a personal mobile computing device, identifying a real user account to be associated with the client application on the mobile device to log into the system. The system then sends the selected "sign" image from the group of unassigned sign images in the image database 160 to the client application executing on the mobile device. When the same image has not been released for use with another client application in the store, the sign image is unique to that client application in the shopping store until the system has matched the user account to the identifying of the subject. After this matching, the sign image becomes available for reuse. The client application causes the mobile device to display a beacon image, which is displayed as a signal emitted by the mobile device to be detected by the system. The matching engine 170 uses the image recognition engines 112a-n or a separate image recognition engine (not shown in FIG. 1 ) to recognize the sign image and determine the location of the mobile computing device displaying the sign in the shopping store. The matching engine 170 matches the location of the mobile computing device to the location of the identified subject. Matching engine 170 then links the identified subject (stored in subject database 140 ) to the user account (stored in user account database 150 ) that is linked to the client application during the subject's presence in the shopping store. No biometric information is used to match identified subjects to user accounts, and nothing is stored to support this process. That is, no information in the sequence of images is used to compare with stored biometric information for the purpose of matching identified subjects with user accounts in support of this process.

在其它實施例中，匹配引擎170選替的或結合的使用來自行動計算裝置120的其它信號來將識別的主體連結至使用者帳戶。這類信號的範例包括在現實空間之區域中識別行動計算裝置之位置的服務位置信號、從加速度計獲得的行動計算裝置之速度及定向以及行動計算裝置之羅盤等。In other embodiments, the matching engine 170 alternatively or in combination uses other signals from the mobile computing device 120 to link identified subjects to user accounts. Examples of such signals include a service location signal identifying the location of the mobile computing device in an area of real space, the speed and orientation of the mobile computing device obtained from an accelerometer, the compass of the mobile computing device, and the like.

在一些實施例中，雖然提供了不保有關於帳戶持有者的任何生物特徵資訊的實施例，但系統可以使用生物特徵資訊來協助將還未連結的識別的主體匹配至使用者帳戶。例如，在一實施例中，系統在他或她的使用者記錄中儲存顧客的「髮色」。在匹配過程期間，系統可能使用例如主體之髮色作為額外輸入以消除歧異(disambiguate)且將主體匹配至使用者帳戶。若使用者具有紅色染色頭髮且僅一位具有紅色染色頭髮的主體在現實空間之區域中或極為接近行動計算裝置，則接著系統可能選擇具有紅色髮色的主體來匹配使用者帳戶。In some embodiments, although embodiments are provided that do not hold any biometric information about account holders, the system may use biometric information to assist in matching identified subjects that have not been linked to user accounts. For example, in one embodiment, the system stores a customer's "hair color" in his or her user record. During the matching process, the system may use, for example, the subject's hair color as an additional input to disambiguate and match the subject to the user account. If the user has red dyed hair and only one subject with red dyed hair is in the area of real space or in close proximity to the mobile computing device, then the system may select a subject with red hair color to match the user account.

在圖6到9C中的流程圖單獨的或結合的提出由匹配引擎170可使用的四個技術之過程步驟。號誌影像The flowcharts in FIGS. 6 through 9C , individually or in combination, present the process steps of four techniques that may be used by the matching engine 170 . Signal image

圖6為提出用於對於將在現實空間之區域中識別的主體與他們分別的使用者帳戶匹配的第一技術的過程步驟的流程圖600。在購物商店的範例中，主體為在商店中移動於貨架之間的走道和其它開放空間中的顧客(或購物者)。過程開始於步驟602。當主體進入現實空間之區域時，主體打開行動計算裝置上客戶端應用且嘗試登入。在步驟604系統驗證使用者證件(credentials)(例如，藉由查詢使用者帳戶資料庫150)並且接受來自客戶端應用的登入通訊，用以將認證的使用者帳戶與行動計算裝置關聯。系統決定客戶端應用之使用者帳戶還未連結至識別的主體。在步驟606，系統發送號誌影像到客戶端應用以用於在行動計算裝置上顯示。號誌影像之範例包括各種形狀的純色(solid color)，像是紅色矩形、粉紅色大象等。可以使用各種影像作為號誌，較佳的是合適於由影像辨識引擎的高度信心辨識。各個號誌影像可以具有唯一識別符。處理系統包括用以在將使用者帳戶匹配至在現實空間之區域中識別的主體之前從識別使用者帳戶的行動裝置上之客戶端應用接受登入通訊，以及在接受登入通訊之後將來自成組號誌影像中的選定號誌影像發送至在行動裝置上的客戶端應用的邏輯。FIG. 6 is a flowchart 600 presenting process steps for a first technique for matching subjects identified in regions of real space with their respective user accounts. In the example of a shopping store, the subjects are customers (or shoppers) moving in the aisles between shelves and other open spaces in the store. The process begins at step 602 . When the subject enters an area of real space, the subject opens the client application on the mobile computing device and attempts to log in. At step 604 the system verifies user credentials (eg, by querying the user account database 150 ) and accepts login communications from the client application for associating the authenticated user account with the mobile computing device. The system has determined that the user account for the client application has not been associated with the identified principal. At step 606, the system sends the sign image to the client application for display on the mobile computing device. Examples of sign images include various shapes of solid colors, such as red rectangles, pink elephants, and so on. Various images can be used as markers, preferably suitable for high confidence recognition by an image recognition engine. Each sign image may have a unique identifier. The processing system includes a method for receiving a login communication from a client application on a mobile device that identifies a user account prior to matching the user account to a subject identified in the region of real space, and after accepting the login communication, from the group number Logic to send selected ones of the logo images to the client application on the mobile device.

在一實施例中，系統從影像資料庫160選擇可用的號誌影像以用於發送至客戶端應用。在將號誌影像發送至客戶端應用之後，系統將在影像資料庫160中的該號誌影像之狀態改變為「已指定」，使得此影像不被指定至任何其它客戶端應用。影像之狀態保持「已指定」直到用以將識別的主體匹配至行動計算裝置的過程完成。在完成匹配之後，可以將狀態改變成「可用的」。此允許在給定系統中小型組的號誌使用輪轉，簡化影像辨識問題。In one embodiment, the system selects available sign images from the image database 160 for sending to the client application. After sending the sign image to the client application, the system changes the status of the sign image in the image database 160 to "assigned" so that the image is not assigned to any other client application. The status of the image remains "assigned" until the process to match the identified subject to the mobile computing device is complete. After matching is done, the state can be changed to "Available". This allows the use of round robin for small groups of signs in a given system, simplifying the image recognition problem.

客戶端應用接收號誌影像且將其顯示在行動計算裝置上。在一實施例中，客戶端應用亦增加顯示的亮度用以增加影像的可見性(visibility)。影像由一或多個攝像機114捕捉且發送到影像處理引擎，其稱為WhatCNN。在步驟608處系統使用WhatCNN來辨識在行動計算裝置上顯示的號誌影像。在一實施例中，WhatCNN為訓練來處理在影像中明定的定界框(bounding box)的卷積神經網路，用以產生識別的主體之手的分類。一個訓練的WhatCNN處理來自一個攝像機的影像訊框。在購物商店的範例實施例中，對於在各個影像訊框中各個手關節，WhatCNN識別手關節是否是空的。WhatCNN亦識別號誌影像識別符(在影像資料庫160中)或在手關節中庫存品項之SKU(存量保留單元(stock keeping unit))號碼、指示在手關節中品項的信心值為非SKU品項(亦即，其不屬於購物商店庫存)以及在影像訊框中手關節位置的脈絡。The client application receives the sign image and displays it on the mobile computing device. In one embodiment, the client application also increases the brightness of the display to increase the visibility of the image. Imagery is captured by one or more cameras 114 and sent to an image processing engine called WhatCNN. At step 608 the system uses WhatCNN to recognize sign images displayed on the mobile computing device. In one embodiment, WhatCNN is a Convolutional Neural Network trained to process bounding boxes defined in images to generate a classification of the identified subject's hand. A WhatCNN is trained on image frames from a camera. In the example embodiment of the shopping store, for each hand joint in each image frame, WhatCNN identifies whether the hand joint is empty or not. WhatCNN also recognizes the symbol image identifier (in the image database 160) or the SKU (stock keeping unit) number of the item in the hand joint, indicating that the confidence value of the item in the hand joint is false SKU items (i.e., which are not part of the shopping store inventory) and context of hand joint positions in the image frame.

如上所提及的，具有重疊視域的兩個以上的攝像機在現實空間中補捉主體的影像。單一主體的關節可以顯現在分別影像通道中多個攝像機的影像訊框中。每攝像機的WhatCNN模型識別在主體之手(由手關節表示)中的號誌影像(在行動計算裝置上顯示的)。協調邏輯將WhatCNN模型之輸出結合到識別的主體之左手(稱為left_hand_classid)和右手(right_hand_classid)中號誌影像之合併的資料結構列舉識別符中(步驟610)。系統儲存此資訊到將subject_id連同時間戳(包括在現實空間中關節的位置)映射至left_hand_classid和right_hand_classid的字典中。WhatCNN的細節係提出於2018年2月27日申請名為「Item Put and Take Detection Using Image Recognition」的美國專利申請案第15/907,112中，其於此藉參考併入如同於此全部提出一般。As mentioned above, two or more cameras with overlapping fields of view capture images of subjects in real space. The joints of a single subject can be displayed in image frames of multiple cameras in separate image channels. The per-camera WhatCNN model recognizes the image of a sign (displayed on a mobile computing device) in the subject's hand (represented by hand joints). The coordination logic incorporates the output of the WhatCNN model into the merged data structure enumeration identifiers of the identified subject's left hand (called left_hand_classid) and right hand (right_hand_classid) mid-sign images (step 610). The system stores this information in a dictionary that maps subject_id along with timestamps (including the joint's position in real space) to left_hand_classid and right_hand_classid. Details of WhatCNN are set forth in US Patent Application Serial No. 15/907,112, filed February 27, 2018, entitled "Item Put and Take Detection Using Image Recognition," which is hereby incorporated by reference as if fully set forth herein.

在步驟612處，系統檢查發送給客戶端應用的號誌影像是否藉由將用於所有識別的主體之雙手的WhatCNN模型之輸出疊代(iterating)而由WhatCNN辨識出。若號誌影像未被辨識出，在步驟614處系統發送提醒給客戶端應用以在行動計算裝置上顯示號誌影像並且重覆過程步驟608到612。否則，若號誌影像由WhatCNN辨識出，則系統將與用戶端應用關聯的user_account(使用者帳戶)(來自使用者帳戶資料庫150)匹配至持有行動計算裝置的識別的主體之subject_id(主體身分)(來自主體資料庫140)(步驟616)。在一實施例中，系統保有此映射(subject_id-user_account)直到該主體出現於現實空間之區域中。過程結束於步驟618。服務位置At step 612, the system checks whether the sign image sent to the client application was recognized by WhatCNN by iterating the output of the WhatCNN model for all identified hands of the subject. If the sign image is not recognized, at step 614 the system sends a reminder to the client application to display the sign image on the mobile computing device and repeats process steps 608 to 612 . Otherwise, if the sign image is recognized by WhatCNN, the system matches the user_account (user account) associated with the client application (from the user account database 150) to the subject_id (subject_id) of the identified subject holding the mobile computing device identity) (from subject database 140) (step 616). In one embodiment, the system maintains this mapping (subject_id-user_account) until the subject appears in a region of real space. The process ends at step 618. service location

在圖7中的流程圖700提出用於對於將識別的主體與使用者帳戶匹配之第二技術的過程步驟。此技術使用由行動裝置發射的射頻信號，其指示行動裝置之位置。過程開始於步驟702，如上述在步驟604中系統接受來自在行動計算裝置上客戶端應用的登入通訊，用以將認證的使用者帳戶連結至行動計算裝置。在步驟706，系統固定間隔(at regular interval)從現實空間之區域中的行動裝置接收服務位置資訊。在一實施例中，系統使用從行動計算裝置之全球定位系統(GPS)接收器發射的行動計算裝置之緯度和經度座標來確定位置。在一實施例中，從GPS座標獲得的行動計算裝置之服務位置具有1到3公尺之間的準確度。在另一實施例中，從GPS座標獲得的行動計算裝置之服務位置具有1到5公尺之間的準確度。Flowchart 700 in FIG. 7 presents process steps for a second technique for matching identified principals to user accounts. This technology uses radio frequency signals emitted by mobile devices, which indicate the location of the mobile device. The process begins at step 702, and as described above at step 604, the system accepts a login communication from a client application on the mobile computing device for linking an authenticated user account to the mobile computing device. In step 706, the system receives service location information from mobile devices in the real space area at regular intervals. In one embodiment, the system determines location using latitude and longitude coordinates of the mobile computing device transmitted from a global positioning system (GPS) receiver of the mobile computing device. In one embodiment, the service location of the mobile computing device obtained from GPS coordinates has an accuracy between 1 and 3 meters. In another embodiment, the service location of the mobile computing device obtained from GPS coordinates has an accuracy between 1 and 5 meters.

能獨立地或結合上述技術使用其它技術用以決定行動計算裝置之服務位置。這類技術的範例包括使用來自不同無線存取點(WAP; wireless access point)的信號強度，諸如在圖2及3中繪示的250及252，作為行動計算裝置離分別的存取點多遠的指示。接著系統使用已知的無線存取點(WAP)250及252之位置用以三角測量及決定在現實空間之區域中行動計算裝置之位置。由行動計算裝置發射的其它類型的信號(像是藍牙、超頻寬和群蜂(Zigbee))亦可以被使用來決定行動計算裝置之服務位置。Other techniques can be used independently or in combination with the above techniques to determine the service location of the mobile computing device. Examples of such techniques include using signal strengths from different wireless access points (WAPs), such as 250 and 252 shown in FIGS. 2 and 3, as a measure of how far the mobile computing device is from the respective access points instructions. The system then uses the known locations of Wireless Access Points (WAPs) 250 and 252 to triangulate and determine the location of the mobile computing device in the area of real space. Other types of signals emitted by mobile computing devices, such as Bluetooth, ultra-wideband, and Zigbee, may also be used to determine the service location of the mobile computing device.

系統在步驟708處固定間隔(像是每秒)以還未連結至識別的主體的客戶端應用來監控行動裝置之服務位置。在步驟708處，系統決定具有未匹配使用者帳戶的行動計算裝置與具有未匹配使用者帳戶的所有其它行動計算裝置的距離。系統將此距離與預定臨限距離「d」(像是3公尺)比較。若該行動計算裝置遠離了具有未匹配使用者帳戶的所有其它行動裝置至少「d」距離(步驟710)，則系統決定了最近還未對行動計算裝置連結的主體(步驟714)。在步驟712，識別的主體之位置係從JointsCNN之輸出獲得。在一實施例中，從JointsCNN獲得的主體之位置比行動計算裝置之服務位置更準確。在步驟616，系統進行與在上述流程圖600中所述相同的過程，用以將識別的主體之subject_id與客戶端應用之user_account匹配。過程結束於步驟718。The system monitors the mobile device's service location at step 708 at regular intervals, such as every second, with client applications that are not yet connected to the identified subject. At step 708, the system determines the distance of the mobile computing device with unmatched user accounts from all other mobile computing devices with unmatched user accounts. The system compares this distance with a predetermined threshold distance "d" (say 3 meters). If the mobile computing device is at least "d" away from all other mobile devices with unmatched user accounts (step 710), the system determines the subject that has not recently been associated with a mobile computing device (step 714). At step 712, the location of the identified subject is obtained from the output of the JointsCNN. In one embodiment, the subject's location obtained from JointsCNN is more accurate than the mobile computing device's service location. In step 616, the system performs the same process as described above in flowchart 600 to match the subject_id of the identified subject with the user_account of the client application. The process ends at step 718.

未將生物特徵資訊使用於將識別的主體與使用者帳戶匹配，並且在支持此過程中沒有東西被儲存。亦即，在影像之序列中沒有資訊被使用來為了在支持此過程中將識別的主體與使用者帳戶匹配的目的而與儲存的生物特徵資訊比較。因此，用以將識別的主體與使用者帳戶匹配的邏輯不使用與使用者帳戶關聯的個人識別生物特徵資訊來操作。速度和定向No biometric information is used to match identified subjects to user accounts, and nothing is stored to support this process. That is, no information in the sequence of images is used to compare with stored biometric information for the purpose of matching identified subjects with user accounts in support of this process. Accordingly, the logic used to match an identified subject to a user account does not operate using personally identifying biometric information associated with the user account. speed and orientation

在圖8中的流程圖800提出用於對於將識別的主體與使用者帳戶匹配之第三技術的過程步驟。此技術使用由行動計算裝置之加速度計所發射的信號來將識別的主體與客戶端應用匹配。過程開始於步驟802。過程開始於步驟604，如上在第一及第二技術中所述接受來自客戶端應用的登入通訊。在步驟806處，系統接收從行動計算裝置發射的信號，該行動計算裝置承載來自在現實空間之區域中行動計算裝置上加速度計的資料，其可以固定間隔被發送。在步驟808，系統計算具有未匹配的使用者帳戶之所有行動計算裝置之平均速度。Flowchart 800 in FIG. 8 presents process steps for a third technique for matching identified principals to user accounts. This technique uses signals emitted by the accelerometer of the mobile computing device to match identified subjects with client applications. The process begins at step 802 . The process begins at step 604 by accepting a login communication from a client application as described above in the first and second techniques. At step 806, the system receives a signal transmitted from the mobile computing device carrying data from an accelerometer on the mobile computing device in a region of real space, which may be sent at regular intervals. At step 808, the system calculates the average speed of all mobile computing devices with unmatched user accounts.

加速度計提供沿著三軸(x, y, z)的行動計算裝置之加速度。在一實施例中，速度係藉由在小時間間隔(例如，每10毫秒)上取得加速度值來計算，用以計算在時間「t」處的當前速度，亦即v_t = v₀ + a_t ，其中v₀ 為初始速度。在一實施例中，v₀ 被初始化為「0」，並且每時間t+1，v_t 變為v₀ 。接著結合沿著三軸的速度以決定在時間「t」的行動計算裝置之整體速度。最終在步驟808，系統計算經較大時間之周期所有計算裝置之移動的速度之平均，該較大時間之周期像是3秒，其對於平均個人之行走步態是足夠長的，或是超過更長的時間之周期。The accelerometer provides the acceleration of the mobile computing device along three axes (x, y, z). In one embodiment, velocity is calculated by taking acceleration values at small time intervals (eg, every 10 milliseconds) to calculate the current velocity at time "t", i.e. v _t = v ₀ + a _t , where v ₀ is the initial velocity. In one embodiment, v ₀ is initialized to "0", and every time t+1, v _t becomes v ₀ . The velocities along the three axes are then combined to determine the overall velocity of the mobile computing device at time "t". Finally at step 808, the system calculates the average of the speeds of movement of all computing devices over a larger period of time, say 3 seconds, which is long enough for the average individual's walking gait, or exceeds longer period of time.

在步驟810，系統計算所有成對的具有對還未連結的經識別主體未匹配的客戶端應用的所有成對的行動計算裝置的速度之間的歐氏距離(Euclidean distance)(亦稱為L2範數(norm))。主體之速度係得自他們的關節相對於時間之位置上的改變、從關節分析獲得並且儲存在具有時間戳的分別主體資料結構500中。在一實施例中，各個主體之質心(center of mass)位置係使用關節分析來決定。主體之質心位置資料的速度或其它導出(derivative)係用於與行動計算裝置之速度比較。對於各個subject_id-user_account對來說，若在他們分別速度之間歐氏距離之值小於threshold_0 (臨限_0)時，則增值subject_id-user_account對的score_counter (分數計數)。上述過程係於固定時間間隔進行，因此更新用於各個subject_id-user_account對的score_counter。In step 810, the system calculates the Euclidean distance (also referred to as L2 ) between the velocities of all pairs of mobile computing devices with unmatched client applications for unlinked identified subjects. Norm (norm)). The velocities of the agents are derived from changes in the positions of their joints with respect to time, obtained from joint analysis and stored in separate agent data structures 500 with time stamps. In one embodiment, the location of the center of mass of each body is determined using joint analysis. The velocity or other derivative of the subject's centroid location data is used for comparison with the velocity of the mobile computing device. For each subject_id-user_account pair, if the value of the Euclidean distance between their respective speeds is less than threshold_0 (threshold_0), the score_counter (score count) of the subject_id-user_account pair is incremented. The above process is performed at regular intervals, so the score_counter for each subject_id-user_account pair is updated.

在固定時間間隔(例如，每一秒)上，系統將用於成對的每一個未匹配的使用者帳戶之score_counter值與每一個還未連結的識別的主體比較(步驟812)。若最高分數大於threshold_1(臨界_1)(步驟814)，在步驟816處系統計算最高分數與第二高分數(對於成對的具有不同主體的相同使用者帳戶)之間的差。若該差大於threshold_2(臨界_2)，在步驟818處系統選定user_account對識別的主體之映射，且遵循與上述在步驟616相同的過程。過程結束於步驟820。At regular intervals (eg, every second), the system compares the score_counter value for each unmatched user account in the pair to each identified subject that has not been linked (step 812). If the highest score is greater than threshold_1 (step 814), at step 816 the system calculates the difference between the highest score and the second highest score (for pairs of identical user accounts with different principals). If the difference is greater than threshold_2, at step 818 the system selects the mapping of user_account to identified principals and follows the same process as at step 616 described above. The process ends at step 820.

在另一實施例中，當JointsCNN辨識手握持行動計算裝置時，使用(識別的主體之)握持行動計算裝置的手之速度在上述過程中取代使用主體之質心的速度。此改善匹配演算法的效能。為了決定臨限之值(threshold_0, threshold_1, threshold_2)，系統使用具有指定給影像之標籤的訓練資料。在訓練期間，使用臨限值的各種組合並且演算法的輸出係與影像之地面實況標籤(ground truth label)匹配。造成最佳整體指定準確度的臨限之值係針對在生產(或推論)上使用。In another embodiment, when JointsCNN recognizes a hand holding a mobile computing device, the velocity of the hand (of the identified subject) holding the mobile computing device is used instead of the velocity of the subject's center of mass in the above process. This improves the performance of the matching algorithm. To determine the threshold values (threshold_0, threshold_1, threshold_2), the system uses training data with labels assigned to the images. During training, various combinations of thresholds are used and the output of the algorithm is matched to the ground truth labels of the images. The threshold value resulting in the best overall specified accuracy is for use in production (or inference).

未將生物特徵資訊使用於將識別的主體與使用者帳戶匹配，並且在支持此過程中沒有東西被儲存。亦即，在影像之序列中沒有資訊被使用來為了在支持此過程中將識別的主體與使用者帳戶匹配的目的而與儲存的生物特徵資訊比較。因此，用以將識別的主體與使用者帳戶匹配的邏輯不使用與使用者帳戶關聯的個人識別生物特徵資訊來操作。網路整合No biometric information is used to match identified subjects to user accounts, and nothing is stored to support this process. That is, no information in the sequence of images is used to compare with stored biometric information for the purpose of matching identified subjects with user accounts in support of this process. Accordingly, the logic used to match an identified subject to a user account does not operate using personally identifying biometric information associated with the user account. network integration

網路整合為學習典範，其中許多網路係聯合地使用來解決問題。整合典型地將從單一分類器獲得的預測準確度改善了驗證與學習多個模型的工作量及成本的一因子。在用以將使用者帳戶匹配到還未連結的識別主體的第四技術中，上面提出的第二及第三技術係在整合(或網路整合)中聯合地使用。為了使用在整合中的兩個技術，有關特徵係從兩個技術之應用中擷取。圖9A~9C用於擷取特徵、訓練整合、以及使用訓練的整合以預測使用者帳戶對還未連結的識別主體之匹配的過程步驟(在流程圖900中)Network integration is a learning paradigm where many networks are used jointly to solve problems. Integrating typically improves the predictive accuracy obtained from a single classifier by a factor of the effort and cost of validating and learning multiple models. In a fourth technique for matching user accounts to not-yet-linked identities, the second and third techniques presented above are used jointly in integration (or network integration). In order to use the two techniques in the integration, relevant features are extracted from the application of the two techniques. FIGS. 9A-9C are process steps for extracting features, training integration, and using the trained integration to predict matches of user accounts to not-yet-linked identified subjects (in flowchart 900)

圖9A提出用於使用第二技術產生特徵的過程步驟，該第二技術使用行動計算裝置之服務位置。過程開始於步驟902。在步驟904，計算用於第二技術的Count_X(計數_X)，指示具有未匹配的使用者帳戶的行動計算裝置之服務位置距具有未匹配的使用者帳戶的所有其它行動計算裝置X公尺次數的數目。在步驟906，subject_id-user_account對之所有元組(tuple)的Count_X值係由系統儲存以用於由整合使用。在一實施例中，使用多個X之值，例如1m、2m、3m, 4m、5m(步驟908及910)。對於各個X的值，計數係儲存為將subject_id-user_account之元組映射到計數分數(其為整數)的字典。在其中使用5的X之值的範例中，5個這類字典係在步驟912處創建。過程結束於步驟914。FIG. 9A presents process steps for generating features using a second technique that uses a mobile computing device's service location. The process begins at step 902 . At step 904, Calculate Count_X for the second technique, indicating that the service location of the mobile computing device with the unmatched user account is X meters away from all other mobile computing devices with the unmatched user account number of times. In step 906, the Count_X values of all tuples of subject_id-user_account pairs are stored by the system for use by the integration. In one embodiment, multiple values of X are used, such as lm, 2m, 3m, 4m, 5m (steps 908 and 910). For each value of X, the count is stored as a dictionary mapping subject_id-user_account tuples to count scores, which are integers. In the example where a value of X of 5 is used, 5 such dictionaries are created at step 912 . The process ends at step 914.

圖9B提出用於使用第三技術產生特徵的過程步驟，該第三技術使用行動計算裝置之速度。過程開始於步驟920。在步驟922處，決定用於第三技術的Count_Y(計數_Y)，其等於指示特定subject_id-user_account對之間的歐式距離小於threshold_0次數的數目的score_counter值。在步驟924，subject_id-user_account對之所有元組(tuple)的Count_Y值係由系統儲存以用於由整合使用。在一實施例中，使用多個threshold_0之值，例如五個不同的值(步驟926及928)。對於各個threshold_0的值，Count_Y係儲存為將subject_id-user_account之元組映射到計數分數(其為整數)的字典。在其中使用5的臨限之值的範例中，5個這類字典係在步驟930處創建。過程結束於步驟932。Figure 9B presents process steps for generating features using a third technique that uses the speed of a mobile computing device. The process starts at step 920. At step 922, a Count_Y for the third technique is determined, which is equal to the score_counter value indicating the number of times the Euclidean distance between a particular subject_id-user_account pair is less than threshold_0. At step 924, the Count_Y values for all tuples of subject_id-user_account pairs are stored by the system for use by the integration. In one embodiment, multiple values of threshold_0 are used, eg, five different values (steps 926 and 928). Count_Y is stored as a dictionary mapping subject_id-user_account tuples to count scores (which are integers) for each value of threshold_0. In the example where a threshold value of 5 is used, 5 such dictionaries are created at step 930 . The process ends at step 932.

接著來自第二及第三技術的特徵被使用以創建標籤的訓練資料集且被使用以訓練網路整合。為了收集這類資料集，多個主體(購物者)行走在像是購物商店的現實空間之區域中。這些主體之影像在固定時間間隔上使用攝像機114來收集。人類標籤機(labeler)檢視影像且將正確的識別符(subject_id and user_account)指定到在訓練資料中的影像。在圖9C中提出的流程圖900中描述此過程。過程開始於步驟940。在步驟942，在從第二及第三技術獲得的Count_X和Count_Y字典的形式中的特徵係與由在影像上的人類標標籤機所指定的對應實況標籤比較，用以識別subject_id與user_account之正確匹配(真)和不正確(假)的匹配。Features from the second and third techniques are then used to create a labeled training dataset and used to train the network ensemble. To collect such datasets, multiple subjects (shoppers) walk in an area of real space like a shopping store. Images of these subjects are collected using cameras 114 at regular time intervals. A human labeler looks at the images and assigns the correct identifiers (subject_id and user_account) to the images in the training data. This process is described in the flowchart 900 presented in Figure 9C. The process begins at step 940. In step 942, features in the form of Count_X and Count_Y dictionaries obtained from the second and third techniques are compared with the corresponding live labels specified by human labelers on the imagery to identify the correct subject_id and user_account Matches (true) and incorrect (false) matches.

隨著吾人僅具有對於subject_id和user_account的兩個種類的結果：真或假，二進制分類器係使用此訓練資料集來訓練(步驟944)。針對二進制分類的一般使用方法包括決策樹(decision tree)、隨機森林(random forest)、神經網路、梯度提升(gradient boost)、支撐向量機(support vector machine)等。使用訓練的二進制分類器來將新的機率觀察分類為真(true)或假(false)。藉由給定為對於subject_id-user_account之輸入Count_X和Count_Y字典而在生產(或推論)中使用訓練的二進制分類器。在步驟946，訓練的二進制分類器將各個元組分類為真或假。過程結束於步驟948。As we only have two kinds of results for subject_id and user_account: true or false, a binary classifier is trained using this training data set (step 944). Common methods for binary classification include decision tree, random forest, neural network, gradient boost, support vector machine, etc. Use the trained binary classifier to classify new probabilistic observations as true or false. The trained binary classifier is used in production (or inference) by being given as input Count_X and Count_Y dictionaries for subject_id-user_account. At step 946, the trained binary classifier classifies each tuple as true or false. The process ends at step 948.

若在應用上述四個技術之後在現實空間之區域中有未匹配的計算裝置，則系統發送通知給行動計算裝置以打開客戶端應用。若使用者接受通知，則客戶端應用將如在第一技術中所述顯示號誌影像。接著系統將遵循在第一技術中的步驟來將購物者簽入(將subject_id對user_account匹配)。若顧客未對通知反應，系統將發送通知給在購物商店中的員工指示未匹配顧客的位置。接著員工可以走向顧客，請他打開在他的行動計算裝置上的客戶端應用以使用號誌影像來簽入系統。If there is an unmatched computing device in the area of the real space after applying the above four techniques, the system sends a notification to the mobile computing device to open the client application. If the user accepts the notification, the client application will display the sign image as described in the first technique. The system will then follow the steps in the first technique to sign the shopper in (matching subject_id to user_account). If the customer does not respond to the notification, the system will send a notification to an employee in the shopping store indicating the location of the unmatched customer. The employee can then walk up to the customer and ask him to open the client application on his mobile computing device to log into the system using the sign image.

未將生物特徵資訊使用於將識別的主體與使用者帳戶匹配，並且在支持此過程中沒有東西被儲存。亦即，在影像之序列中沒有資訊被使用來為了在支持此過程中將識別的主體與使用者帳戶匹配的目的而與儲存的生物特徵資訊比較。因此，用以將識別的主體與使用者帳戶匹配的邏輯不使用與使用者帳戶關聯的個人識別生物特徵資訊來操作。架構No biometric information is used to match identified subjects to user accounts, and nothing is stored to support this process. That is, no information in the sequence of images is used to compare with stored biometric information for the purpose of matching identified subjects with user accounts in support of this process. Accordingly, the logic used to match an identified subject to a user account does not operate using personally identifying biometric information associated with the user account. architecture

上面提出的四個技術的系統的範例架構係應用於將user_account匹配至在圖10中提出的現實空間之區域中還未連結的主體。因為圖10為架構圖，故省略某些細節以改善本發明說明的簡明。在圖10中提出的系統接收來自複數個攝像機114的影像訊框。如上所述，在一實施例中，攝像機114可以在時間上彼此同步，使得在相同時間或在時間上接近地並且在相同影像捕捉率上來捕捉影像。在相同時間或在時間上接近涵蓋現實空間之區域的所有攝像機中捕捉的影像係在同步的影像可以在處理引擎中被識別為代表在現實空間中在具有固定位置的主體之時間上的時刻不同視點的意義上進行同步。影像係儲存於每攝像機1002影像訊框之環狀緩衝器中。The example architecture of the system of the four techniques presented above is applied to match user_accounts to unconnected principals in the region of real space presented in FIG. 10 . Since FIG. 10 is an architectural diagram, certain details are omitted to improve clarity of description of the present invention. The system presented in FIG. 10 receives image frames from a plurality of cameras 114 . As noted above, in one embodiment, the cameras 114 may be temporally synchronized with each other such that images are captured at the same time or closely in time and at the same image capture rate. Images captured in all cameras at the same time or temporally close to an area encompassing real space are synchronized images that can be recognized in the processing engine as representing different moments in time in real space of subjects with fixed positions Synchronize in the sense of viewpoint. The images are stored in a ring buffer of 1002 image frames per camera.

「主體識別」子系統1004(亦稱為第一影像處理器)處理從攝像機114接收的影像訊框，用以識別及追蹤在現實空間中的主體。第一影像處理器包括主體影像辨識引擎，像是上述的JointsCNN。The "subject recognition" subsystem 1004 (also referred to as the first image processor) processes image frames received from the camera 114 to identify and track subjects in real space. The first image processor includes a subject image recognition engine, such as the aforementioned JointsCNN.

「語意差異(semantic diffing)」子系統1006 (亦稱為第二影像處理器)包括背景影像辨識引擎，其從複數個攝像機接收影像之對應序列且語意地辨識在背景中顯著的差異(亦即，像是貨架的庫存顯示結構)，例如當他們關於隨時間推移在來自各個攝像機的影像中放置和取得庫存品項時。第二影像處理器接收主體識別子系統1004之輸出以及來自攝像機114的影像訊框作為輸入。「語意差異」子系統的細節係在2018年4月04日申請名為「Predicting Inventory Events using Semantic Diffing」的美國專利申請案第15/945,466號以及在2018年4月04日申請名為「Predicting Inventory Events using Foreground/Background Processing」的美國專利申請案第15/945,473號中提出，其兩者係藉由參考於此併入，仿佛於此全部提出一般。第二影像處理器處理識別的背景改變，用以作成由識別的主體取得庫存品項以及由識別的主體在庫存顯示結構上放置庫存品項的第一組偵測。第一組偵測亦稱為放置和取得庫存品項的背景偵測。在購物商店的範例中，第一偵測識別從貨架取得的庫存品項或由商店的顧客或員工放置在貨架上的庫存品項。語意差異子系統包括用以將識別的背景改變與識別的主體關聯的邏輯。The "semantic diffing" subsystem 1006 (also referred to as the second image processor) includes a background image recognition engine that receives corresponding sequences of images from a plurality of cameras and semantically identifies differences that are significant in the background (i.e. , like the inventory display structure of a shelf), for example as they relate to placing and fetching inventory items over time in images from various cameras. The second image processor receives the output of the subject recognition subsystem 1004 and the image frame from the camera 114 as input. Details of the "Semantic Diffing" subsystem are contained in U.S. Patent Application Serial No. 15/945,466, filed April 04, 2018, entitled "Predicting Inventory Events using Semantic Diffing" and filed April 04, 2018, entitled "Predicting Inventory Events using Foreground/Background Processing" in US Patent Application No. 15/945,473, both of which are hereby incorporated by reference as if fully set forth herein. The second image processor processes the identified context changes to make a first set of detections of the inventory item being taken by the identified subject and the inventory item being placed on the inventory display structure by the identified subject. The first set of detections is also known as background detections for placing and taking inventory items. In the example of a shopping store, the first detection identifies inventory items taken from shelves or placed on shelves by customers or employees of the store. The semantic difference subsystem includes logic to associate identified context changes with identified subjects.

「區域提議」子系統1008(亦稱為第三影像處理器)包括前景影像辨識引擎，從複數個攝像機114接收影像之對應序列且語意地辨識在前景中顯著的物件(亦即，購物者、他們的手及庫存品項)，例如當他們關於隨時間推移在來自各個攝像機的影像中放置和取得庫存品項時。區域提議子系統1008亦接收主體識別子系統1004的輸出。第三影像處理器處理來自攝像機114的影像之序列，用以識別及分類在影像之對應序列中在影像中表示的前景改變。第三影像處理器處理識別的前景改變，用以作成由識別的主體取得庫存品項以及由識別的主體在庫存顯示結構上放置庫存品項的第二組偵測。第二組偵測亦稱為放置和取得庫存品項的背景偵測。在購物商店的範例中，第二組偵測識別由商店的顧客和員工取得庫存品項和在庫存顯示結構上放置庫存品項。區域提議子系統的細節係提出於2018年2月27日申請名為「Item Put and Take Detection Using Image Recognition」的美國專利申請案第15/907,112中，其於此藉參考併入如同於此全部提出一般。The "region proposal" subsystem 1008 (also referred to as the third image processor) includes a foreground image recognition engine that receives corresponding sequences of images from the plurality of cameras 114 and semantically identifies objects that are prominent in the foreground (i.e., shoppers, their hands and inventory items), such as when they place and take inventory items in the footage from various cameras over time. The region proposal subsystem 1008 also receives the output of the subject identification subsystem 1004 . A third image processor processes the sequence of images from the camera 114 to identify and classify foreground changes represented in the images in the corresponding sequence of images. The third image processor processes the identified foreground change to make a second set of detections of the inventory item being taken by the identified subject and the inventory item being placed on the inventory display structure by the identified subject. The second set of detections is also known as background detections for putting and taking inventory items. In the example of a shopping store, the second set of detections identifies taking inventory items and placing inventory items on the inventory display structure by customers and employees of the store. Details of the region proposal subsystem are set forth in U.S. Patent Application Serial No. 15/907,112, filed February 27, 2018, entitled "Item Put and Take Detection Using Image Recognition," which is hereby incorporated by reference as if fully set forth herein. Presented in general.

在圖10中所述的系統包括選擇邏輯1010，用以處理第一及第二組偵測來產生包括用於識別的主體的系列的庫存品項的日誌資料結構。針對在現實空間中取得或放置，選擇邏輯1010從語意差異子系統1006或區域提議子系統1008其一者選擇輸出。在一實施例中，選擇邏輯1010使用由用於第一組偵測的語意差異子系統產生的信心分數且使用由用於第二組偵測的區域提議子系統產生的信心分數，用以作出選擇。具有用於特定偵測較高信心分數的子系統之輸出被選定且使用來產生日誌資料結構1012(亦稱為推車籃資料結構)，其包括與識別的主體關聯的一系列的庫存品項(以及他們的數量)。The system depicted in FIG. 10 includes selection logic 1010 to process the first and second sets of detections to generate a log data structure that includes a series of inventory items for identified subjects. The selection logic 1010 selects an output from either the semantic difference subsystem 1006 or the region proposal subsystem 1008 for take or place in real space. In one embodiment, the selection logic 1010 uses the confidence scores produced by the semantic difference subsystem for the first set of detections and the confidence scores produced by the region proposal subsystem for the second set of detections to make choose. The output of the subsystem with the higher confidence score for a particular detection is selected and used to generate a log data structure 1012 (also known as a cart data structure) comprising a series of inventory items associated with the identified subject (and their number).

為了處理對於在日誌資料結構1012中的品項的支付，在圖10中的系統將用於匹配識別的主體(與日誌資料關聯)的四個技術應用到user_account(使用者帳戶)，其包括像是信用或銀行帳戶資訊的支付方法。在一實施例中，四個技術係如在圖中所繪示順序地應用。若在用於第一技術的流程圖600中的過程步驟生成了主體與使用者帳戶之間的匹配，則接著此資訊係由支付處理器1036使用以針對在日誌資料結構中的庫存品項來對顧客收費。否則(步驟1028)，遵循在用於第二技術的流程圖700中提出的過程步驟並且使用者帳戶由支付處理器1036所使用。若第二技術不能將使用者帳戶與主體匹配(1030)，則接遵循著在用於第三技術的流程圖800中提出的過程步驟。若第三技術不能將使用者帳戶與主體匹配(1032)，則接著遵循在用於第四技術的流程圖900中提出的過程步驟來將使用者帳戶與主體匹配。To process payments for items in the log data structure 1012, the system in FIG. 10 applies four techniques for matching identified principals (associated with log data) to user_accounts, which include Is the payment method for credit or bank account information. In one embodiment, the four techniques are applied sequentially as depicted in the figure. If the process steps in flowchart 600 for the first technique generate a match between the principal and the user account, then this information is used by the payment processor 1036 to check for the inventory item in the log data structure Charge customers. Otherwise (step 1028 ), the process steps set forth in flowchart 700 for the second technique are followed and the user account is used by payment processor 1036 . If the second technique fails to match the user account to the principal (1030), then the process steps set forth in flowchart 800 for the third technique follow. If the third technique is unable to match the user account to the principal (1032), then follow the process steps set forth in flowchart 900 for the fourth technique to match the user account to the principal.

若第四技術不能將使用者帳戶與主體匹配(1034)，則系統發送通知給行動計算裝置以打開客戶端應用且遵循在用於第一技術的流程圖600中所提出的步驟。若顧客未對通知反應，系統將發送通知給在購物商店中的員工指示未匹配顧客的位置。接著員工可以走向顧客，請他打開在他的行動計算裝置上的客戶端應用以使用號誌影像來簽入系統(步驟1040)。要了解的是，在圖10中提出的架構之其它實施例中，能使用少於四個技術來將使用者帳戶對還未連結的識別主體匹配。網路組構If the fourth technique fails to match the user account with the principal (1034), the system sends a notification to the mobile computing device to open the client application and follows the steps set forth in flowchart 600 for the first technique. If the customer does not respond to the notification, the system will send a notification to an employee in the shopping store indicating the location of the unmatched customer. The employee can then walk up to the customer and ask him to open the client application on his mobile computing device to log into the system using the sign image (step 1040). It is to be appreciated that in other embodiments of the architecture presented in FIG. 10, fewer than four techniques can be used to match user accounts to not-yet-linked identities. network structure

圖11提出主控在網路節點103上被主控的匹配引擎170的網路的架構。在示出的實施例中系統包括複數個網路節點103、101a~101n以及102。在這類實施例中，網路節點亦稱為處理平台。處理平台(網路節點)103、101a~101n及102以及攝像機1112、1114、1116、…1118係連接至網路1181。FIG. 11 presents the architecture of a network hosting matching engines 170 hosted on network nodes 103 . In the illustrated embodiment, the system includes a plurality of network nodes 103 , 101 a - 101 n and 102 . In such embodiments, network nodes are also referred to as processing platforms. Processing platforms (network nodes) 103 , 101 a - 101 n and 102 and cameras 1112 , 1114 , 1116 , . . . 1118 are connected to network 1181 .

圖11繪示連接至網路的複數個攝像機1112、1114、1116、…1118。大數量的攝像機可以佈署在特定系統中。在一實施例中，攝像機1112到1118係分別使用乙太式(Ethernet-based)連接器1122、1124、1126及1128連接至網路1181。在這類實施例中，乙太式連接器具有每秒1個十億位元(gigabit)的資料傳輸速度，亦稱為十億位元乙太網路(Gigabit Ethernet)。要了解的是，在其它實施例中，攝像機114係使用能具有比十億位元乙太網路更快或更慢資料傳輸率的其它類型的網路連接來連接至網路。也同樣，在替代的實施例中，成組的攝像機可以直接連接至各個處理平台，並且處理平台可以耦接至網路。FIG. 11 shows a plurality of cameras 1112, 1114, 1116, . . . 1118 connected to a network. A large number of cameras can be deployed in a given system. In one embodiment, the cameras 1112 to 1118 are connected to the network 1181 using Ethernet-based connectors 1122 , 1124 , 1126 and 1128 respectively. In such embodiments, the Ethernet connector has a data transmission speed of 1 gigabit per second, also known as Gigabit Ethernet. It is to be appreciated that in other embodiments, the camera 114 is connected to the network using other types of network connections that can have faster or slower data transfer rates than Gigabit Ethernet. Also, in alternative embodiments, groups of cameras may be directly connected to individual processing platforms, and the processing platforms may be coupled to a network.

儲存子系統1130儲存基本程式化及資料構造，其提供本發明之某些實施例的功能特性(functionality)。例如，實行匹配引擎170之功能特性的各種模組可儲存於儲存子系統1130中。儲存子系統1130為電腦可讀記憶體的一範例，該電腦可讀記憶體包含非暫態資料儲存媒體，其具有儲存在記憶體中的電腦指令，由電腦可執行以進行於此所述資料處理及影像處理的所有或任何組合，該指令包括用以將在現實空間之區域中的主體與使用者帳戶連結的邏輯、用以決定在影像中所表示識別的主體的位置、藉由在現實空間之區域中藉於此所述的過程執行客戶端應用來識別行動計算裝置的位置而將識別的主體與使用者帳戶匹配的邏輯。在其它實施例中，電腦指令可以儲存於其它類型的記憶體中，包括可攜式記憶體，其包含由電腦可讀的非暫態資料儲存媒體或複數個媒體。Storage subsystem 1130 stores the basic programming and data structures that provide the functionality of certain embodiments of the present invention. For example, various modules implementing the functional characteristics of matching engine 170 may be stored in storage subsystem 1130 . Storage subsystem 1130 is an example of computer readable memory, which includes non-transitory data storage media having computer instructions stored in memory executable by the computer to perform the data described herein All or any combination of image processing and image processing, the instructions include logic for associating subjects in regions of real space with user accounts, for determining the location of identified subjects represented in images, by The logic in regions of the space whereby the processes described herein are executed by the client application to identify the location of the mobile computing device to match the identified subject to a user account. In other embodiments, computer instructions may be stored in other types of memory, including portable memory, which includes a non-transitory data storage medium or media readable by a computer.

這些軟體模組一般由處理器子系統1150所執行。主控記憶體子系統1132典型地包括若干個記憶體，其包括用於在程式執行期間儲存指令及資料的主隨機存取記憶體(RAM; random access memory)1134，以及儲存固定指令於其中的唯讀記憶體(ROM; read-only memory)1136。在一實施例中，RAM 1134被使用為緩衝器，用於儲存由匹配引擎170所匹配的subject_id-user_account元組。These software modules are generally executed by the processor subsystem 1150 . The master memory subsystem 1132 typically includes several memories, including a main random access memory (RAM; random access memory) 1134 for storing instructions and data during program execution, and a RAM for storing fixed instructions therein. Read-only memory (ROM; read-only memory) 1136 . In one embodiment, RAM 1134 is used as a buffer for storing subject_id-user_account tuples matched by matching engine 170 .

檔案儲存子系統1140提供用於程式和資料檔案的持久儲存。在範例實施例中，儲存子系統1140包括以由數字識別的RAID 0(獨立磁碟冗餘陣列)1142佈設中四個120十億位元組(GB;Gigabyte)固態硬碟(SSD; solid state disks)。在範例實施例中，並非在RAM中之在使用者帳戶資料庫150中的使用者帳戶資料和在影像資料庫160中的影像資料係以RAID 0來儲存。在範例實施例中，硬碟驅動(HDD; hard disk drive)1146在存取速度上比RAID 0 1142儲存更慢。固態硬碟(SSD)1144包含作業系統以及用於匹配引擎170的相關檔案。File storage subsystem 1140 provides persistent storage for program and data files. In the exemplary embodiment, the storage subsystem 1140 includes four 120 gigabyte (GB; Gigabyte) solid state drives (SSD; solid state) in a RAID 0 (Redundant Array of Independent Disks) 1142 arrangement identified by a number disks). In an exemplary embodiment, the user account data in the user account database 150 and the image data in the image database 160 that are not in RAM are stored in RAID 0. In an exemplary embodiment, the hard disk drive (HDD; hard disk drive) 1146 is slower in access speed than the RAID 0 1142 storage. Solid state drive (SSD) 1144 contains the operating system and associated files for matching engine 170 .

在範例組構中，三台攝像機1112、1114及1116係連接至處理平台(網路節點)103上。各個攝像機具有專用圖處理單元GPU 1 1162、GPU 2 1164以及GPU 3 1166，用以處理由攝像機發送的影像。要了解的是，每處理平台可以連接少於或多於三台攝像機。據此，在網路節點中組構較少或較多的GPU使得各個攝像機具有專屬的GPU用於處理從攝像機接收的影像訊框。處理器子系統1150、儲存子系統1130及GPU 1162、1164及1166係使用匯流排子系統1154來進行通訊。In the example configuration, three cameras 1112 , 1114 and 1116 are connected to the processing platform (network node) 103 . Each camera has dedicated graphics processing units GPU 1 1162, GPU 2 1164, and GPU 3 1166 for processing images sent by the cameras. It is to be understood that less or more than three cameras may be connected per processing platform. Accordingly, fewer or more GPUs are configured in the network nodes so that each camera has a dedicated GPU for processing image frames received from the cameras. Processor subsystem 1150 , storage subsystem 1130 , and GPUs 1162 , 1164 , and 1166 communicate using bus subsystem 1154 .

網路介面子系統1170係連接至匯流排子系統1154，形成部分的處理平台(網路節點)103。網路介面子系統1170提供介面到外側網路，包括到在其它電腦系統中對應的介面裝置的介面。網路介面子系統1170允許處理平台藉由使用纜線(或導線)或無線地任一者在網路之上進行通訊。在現實空間之區域中由行動計算裝置120發射的無線的射頻信號1175係由網路介面子系統1170接收(經由無線存取點)以用於藉由匹配引擎170來處理。若干個週邊裝置，像是使用者介面輸出裝置以及使用者介面輸入裝置，亦連接至形成部分的處理平台(網路節點)103的匯流排子系統1154。有意地不將這些子系統和裝置繪示於圖11上用以改善本發明說明的簡明。雖然匯流排子系統1154係示意地繪示為單一匯流排，但匯流排子系統之替代的實施例可使用多個匯流排。Network interface subsystem 1170 is connected to bus subsystem 1154 , forming part of processing platform (network node) 103 . Network interface subsystem 1170 provides interfaces to outside networks, including interfaces to corresponding interface devices in other computer systems. Network interface subsystem 1170 allows processing platforms to communicate over a network by either using cables (or wires) or wirelessly. Wireless radio frequency signals 1175 transmitted by mobile computing device 120 in the region of real space are received by network interface subsystem 1170 (via a wireless access point) for processing by matching engine 170 . Several peripheral devices, such as UI output devices and UI input devices, are also connected to the bus subsystem 1154 forming part of the processing platform (network node) 103 . These subsystems and devices are intentionally not shown in FIG. 11 to improve clarity of description of the present invention. Although the bus subsystem 1154 is schematically shown as a single bus, alternative embodiments of the bus subsystem may use multiple bus bars.

在一實施例中，攝像機114可以使用Chameleon3 1.3 MP Color USB3 Vision (Sony ICX445)來實行，其具有1288 x 964的解析度、30 FPS的框率並且每影像在1.3百萬像素，具有300 - ∞的工作距離(mm)之變焦鏡頭，具有98.2° - 23.8°的1/3”感測器之視域。特定實行In one embodiment, camera 114 may be implemented using a Chameleon3 1.3 MP Color USB3 Vision (Sony ICX445) with 1288 x 964 resolution, 30 FPS frame rate and 1.3 megapixels per image, with 300 - ∞ The working distance (mm) of the zoom lens has a field of view of 1/3" sensor of 98.2° - 23.8°. Specific implementation

在各種實施例中，上述用於在現實空間之區域中將主體與使用者帳戶連結的系統亦包括一或多個下列特徵。In various embodiments, the above-described system for linking a subject with a user account in a region of real space also includes one or more of the following features.

系統包括複數個攝像機，在複數個攝像機中的攝像機生成在現實空間中的對應的視域中分別影像之序列。處理系統係耦接至複數個攝像機，處理系統包括用以決定在影像中表示之識別的主體之位置。系統藉由識別在現實空間之區域中執行客戶端應用的行動裝置之位置來將識別的主體與使用者帳戶匹配，並且將行動裝置之位置與主體之位置匹配。The system includes a plurality of cameras, a camera in the plurality of cameras generating a sequence of respective images in a corresponding field of view in real space. A processing system is coupled to the plurality of cameras, the processing system includes determining a position of an identified subject represented in the image. The system matches the identified subject to the user account by identifying the location of the mobile device executing the client application in the region of real space, and matches the location of the mobile device to the location of the subject.

在一實施例中，系統由行動計算裝置發射的信號包含影像。In one embodiment, the signal transmitted by the system from the mobile computing device includes an image.

在一實施例中，由行動裝置發射的信號包含射頻信號。In one embodiment, the signal transmitted by the mobile device includes a radio frequency signal.

在一實施例中，系統包括對處理系統可存取的成組的號誌影像。處理系統包括用以在將使用者帳戶匹配至在現實空間之區域中識別的主體之前從識別使用者帳戶的行動計算裝置上之客戶端應用接受登入通訊的邏輯，以及在接受登入通訊之後，系統將來自成組號誌影像中的選定號誌影像發送至在行動裝置上的客戶端應用的邏輯。In one embodiment, the system includes a set of signature images accessible to the processing system. The processing system includes logic to accept a login communication from a client application on the mobile computing device that identifies the user account prior to matching the user account to a subject identified in the region of real space, and after accepting the login communication, the system Logic to send selected beacon images from a set of beacon images to a client application on a mobile device.

在一實施例中，處理系統將選定的號誌影像之狀態設定為已指定(assigned)。處理系統接收選定的號誌影像之顯示的影像。處理系統辨識顯示的影像以及將辨識的號誌影像與來自該組號誌影像之指定的號誌影像匹配。處理系統將顯示位於現實空間之區域中辨識的號誌影像的行動裝置之位置與還未連結之識別的主體匹配。在將使用者帳戶匹配至識別的主體之後，處理系統將辨識的號誌影像之狀態設定為可用的(available)。In one embodiment, the processing system sets the status of the selected marker image to assigned. The processing system receives the displayed image of the selected sign image. The processing system identifies the displayed image and matches the identified sign image with a specified sign image from the set of sign images. The processing system matches the location of the mobile device displaying the image of the identified sign located in the region of real space with the identified subject not yet associated. After matching the user account to the identified subject, the processing system sets the status of the identified sign image to available.

在一實施例中，在行動計算裝置上的客戶端應用傳送加速度計資料到處理系統，並且系統使用從行動計算裝置傳送的加速度計資料來將識別的主體與使用者帳戶匹配。In one embodiment, a client application on the mobile computing device transmits accelerometer data to the processing system, and the system uses the accelerometer data transmitted from the mobile computing device to match identified subjects with user accounts.

在一個這類實施例中，用以將識別的主體與使用者帳戶匹配的邏輯包括，使用在現實空間之區域中經一段時間間隔來自複數個位置從行動計算裝置傳送的加速度計資料，和使用在現實空間之區域中經該時間間隔指示識別主體之位置的資料之導出(derivative)的邏輯。In one such embodiment, the logic to match an identified subject to a user account includes using accelerometer data transmitted from a plurality of locations over time intervals in a region of real space from a mobile computing device, and using Logic for the derivation of data indicating the location of the identified subject over the time interval in the region of real space.

在一實施例中，由行動計算裝置發射的信號包括位置資料和加速度計資料。In one embodiment, the signal transmitted by the mobile computing device includes location data and accelerometer data.

在一實施例中，由行動計算裝置發射的信號包含影像。In one embodiment, the signal transmitted by the mobile computing device includes images.

揭露一種將在現實空間之區域中的主體與使用者帳戶連結的方法。揭露了使用者帳戶係與在行動計算裝置上可執行的客戶端應用連結。方法包括，使用複數個攝像機以在現實空間中對應的視域中生成分別影像序列。接著，方法包括決定在影像中表示之識別的主體之位置。方法包括，藉由識別在現實空間之區域中執行客戶端應用的行動計算裝置之位置來將識別的主體與使用者帳戶匹配。最後，方法包括將行動計算裝置之位置與主體之位置匹配。A method of associating a subject with a user account in a region of real space is disclosed. It is disclosed that a user account is linked to a client application executable on a mobile computing device. The method includes using a plurality of cameras to generate respective image sequences in corresponding fields of view in real space. Next, the method includes determining a location of the identified subject represented in the image. The method includes matching an identified subject to a user account by identifying a location of a mobile computing device executing a client application in a region of real space. Finally, the method includes matching the location of the mobile computing device to the location of the subject.

在一實施例中，方法亦包括，將選定的號誌影像之狀態設定為已指定、接收選定的號誌影像之顯示的影像、辨識該顯示的號誌影像以及將辨識的影像與來自該組號誌影像之已指定的影像匹配。方法包括，將顯示位於現實空間之區域中辨識的號誌影像的行動計算裝置之位置與還未連結之識別的主體匹配。最後，方法包括，在將使用者帳戶匹配至識別的主體之後，將辨識的號誌影像之狀態設定為可用的(available)。In one embodiment, the method also includes setting the state of the selected sign image to Designated, receiving a displayed image of the selected sign image, identifying the displayed sign image, and comparing the identified image to the image from the group. The specified image match for the sign image. The method includes matching the location of a mobile computing device displaying an image of an identified sign located in an area of real space to an identified subject that has not been associated. Finally, the method includes, after matching the user account to the identified subject, setting the status of the identified sign image to available.

在一實施例中，將識別的主體與使用者帳戶匹配的步驟更包括，在現實空間之區域中經一段時間間隔使用來自複數個位置的從行動計算裝置傳送的加速度計資料。資料之導出指示在現實空間中經一段時間間隔識別的主體之位置。In one embodiment, the step of matching the identified subject to the user account further includes using accelerometer data transmitted from the mobile computing device from a plurality of locations over time intervals in a region of real space. The derivation of data indicates the location of the identified subject over time intervals in real space.

揭示印記有用以將在現實空間之區域中的主體與使用者帳戶連結之電腦程式指令的非暫態電腦可讀儲存媒體。使用者帳戶係與可執行於行動計算裝置上的客戶端應用連結，當在處理器上執行時，該指令實行一方法。方法包括，使用複數個攝像機以在現實空間中對應的視域中生成分別影像序列。方法包括決定在影像中表示之識別的主體之位置。方法包括，藉由識別在現實空間之區域中執行客戶端應用的行動計算裝置之位置來將識別的主體與使用者帳戶匹配。最後，方法包括將行動計算裝置之位置與主體之位置匹配。A non-transitory computer readable storage medium having computer program instructions for associating a subject in a region of real space with a user account is revealed. A user account is associated with a client application executable on the mobile computing device, the instructions executing a method when executed on the processor. The method includes using a plurality of cameras to generate respective image sequences in corresponding fields of view in real space. The method includes determining a location of an identified subject represented in the image. The method includes matching an identified subject to a user account by identifying a location of a mobile computing device executing a client application in a region of real space. Finally, the method includes matching the location of the mobile computing device to the location of the subject.

在一實施例中，非暫態電腦可讀儲存媒體實行進一步包含下列步驟的方法。方法包括，將選定的號誌影像之狀態設定為已指定、接收選定的號誌影像之顯示的影像、辨識該顯示的號誌影像以及將辨識的影像與來自該組號誌影像之已指定的影像匹配。方法包括將顯示位於現實空間之區域中辨識的號誌影像的行動計算裝置之位置與還未連結之識別的主體匹配。在將使用者帳戶匹配至識別的主體之後，將辨識的號誌影像之狀態設定為可用的(available)。In one embodiment, the non-transitory computer readable storage medium performs a method further comprising the following steps. The method includes setting the state of the selected sign image to assigned, receiving a displayed image of the selected sign image, identifying the displayed sign image, and comparing the identified image to the assigned image from the set of sign images. Image matching. The method includes matching the location of a mobile computing device displaying an image of an identified sign located in an area of real space to an identified subject that has not been associated. After matching the user account to the identified subject, the status of the identified sign image is set to available.

在一實施例中，該非暫態電腦可讀儲存媒體實行該方法，包括：藉由使用在現實空間之區域中經一段時間間隔來自複數個位置從行動計算裝置傳送的加速度計資料，和使用在現實空間之區域中經該時間間隔指示識別主體之位置的資料之導出來將該識別的主體與使用者帳戶匹配。In one embodiment, the non-transitory computer readable storage medium implements the method comprising: by using accelerometer data transmitted from a plurality of locations over time intervals in a region of real space from a mobile computing device, and using The derivation of data indicating the location of the identified subject over the time interval in the region of real space matches the identified subject with a user account.

上面說明或參考的任何資料結構和代碼係依據在電腦可讀記憶體中的許多實行來儲存，其包含非暫態電腦可讀儲存媒體，其可為能夠儲存用於藉由電腦系統使用的代碼及/或資料的任何裝置或媒體。此包括(但不限於)揮發性記憶體、非揮發性記憶體、特定應用積體電路(ASIC)、場可程式化閘陣列(FPGA)、磁性和光學儲存裝置(諸如碟驅動、磁帶、CD(光碟片)、DVD(數位多功能碟片或數位視訊碟片))或現已知或之後發展的能夠儲存電腦可讀媒體的其它媒體。先前的說明係提出以使能完成且使用所揭示的技術。對揭露的實行之各種修改將是明白的，並且於此定義的一般原則在不悖離本揭露的技術之精神及範圍下可應用到其它實行及應用中。因此，本揭露的技術並不打算受限於繪示的實行，但要符合與於此揭露的原則和特徵一致的最寬廣範圍。本揭露的技術之範圍係由附加的申請專利範圍所定義。Any data structures and codes described or referenced above are stored according to many implementations in computer readable memory, which includes non-transitory computer readable storage media, which may be capable of storing code for use by a computer system and/or data on any device or media. This includes (but is not limited to) volatile memory, non-volatile memory, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), magnetic and optical storage devices (such as disk drives, magnetic tape, CD (Compact Disc), DVD (Digital Versatile Disc or Digital Video Disc)) or other media now known or later developed capable of storing computer-readable media. The previous description was presented to enable completion and use of the disclosed techniques. Various modifications to the disclosed implementations will be apparent, and the general principles defined herein may be applied to other implementations and applications without departing from the spirit and scope of the technology of the present disclosure. Thus, the techniques of the present disclosure are not intended to be limited to the practice shown, but to be accorded the widest scope consistent with the principles and features disclosed herein. The scope of the technology disclosed herein is defined by the appended claims.

100‧‧‧系統 102‧‧‧網路節點 103‧‧‧網路節點 110‧‧‧主體追蹤引擎 112a, 112b, 112n‧‧‧影像辨識引擎 114‧‧‧攝像機 116a, 116b, 116n‧‧‧走道 118a, 118b, 118m‧‧‧行動計算裝置 120‧‧‧行動計算裝置 130‧‧‧訓練資料庫 140‧‧‧主體資料庫 150‧‧‧使用者帳戶資料庫 160‧‧‧影像資料庫 170‧‧‧匹配引擎 181‧‧‧網路 202‧‧‧貨架 204‧‧‧貨架 206‧‧‧攝像機 208‧‧‧攝像機 216‧‧‧視域 218‧‧‧視域 220‧‧‧地板 230‧‧‧屋頂 240‧‧‧主體 250‧‧‧無線存取點 252‧‧‧無線存取點 101a, 101b, 101n‧‧‧網路節點 1112‧‧‧攝像機 1114‧‧‧攝像機 1116‧‧‧攝像機 1118‧‧‧攝像機 1122‧‧‧乙太式連接器 1124‧‧‧乙太式連接器 1126‧‧‧乙太式連接器 1128‧‧‧乙太式連接器 1130‧‧‧儲存子系統 1132‧‧‧主控記憶體子系統 1134‧‧‧主隨機存取記憶體 1136‧‧‧唯讀記憶體 1140‧‧‧檔案儲存子系統 1142‧‧‧獨立磁碟冗餘陣列 1144‧‧‧固態硬碟 1146‧‧‧硬碟驅動 1150‧‧‧處理器子系統 1154‧‧‧匯流排子系統 1162‧‧‧圖形處理單元 1164‧‧‧圖形處理單元 1166‧‧‧圖形處理單元 1170‧‧‧網路介面子系統 1175‧‧‧無線射頻信號 1181‧‧‧網路100‧‧‧system 102‧‧‧Network Node 103‧‧‧Network Node 110‧‧‧subject tracking engine 112a, 112b, 112n‧‧‧Image recognition engine 114‧‧‧camera 116a, 116b, 116n‧‧‧Aisle 118a, 118b, 118m‧‧‧Mobile computing device 120‧‧‧Mobile Computing Devices 130‧‧‧Training database 140‧‧‧subject database 150‧‧‧user account database 160‧‧‧image database 170‧‧‧Matching Engine 181‧‧‧Internet 202‧‧‧shelves 204‧‧‧shelves 206‧‧‧Camera 208‧‧‧camera 216‧‧‧Vision 218‧‧‧Vision 220‧‧‧Floor 230‧‧‧roof 240‧‧‧subject 250‧‧‧Wireless access points 252‧‧‧Wireless access points 101a, 101b, 101n‧‧‧Network Node 1112‧‧‧Camera 1114‧‧‧camera 1116‧‧‧Camera 1118‧‧‧camera 1122‧‧‧Ethernet connector 1124‧‧‧Ethernet connector 1126‧‧‧Ethernet connector 1128‧‧‧Ethernet connector 1130‧‧‧storage subsystem 1132‧‧‧Master control memory subsystem 1134‧‧‧main random access memory 1136‧‧‧ROM 1140‧‧‧File Storage Subsystem 1142‧‧‧Redundant Array of Independent Disks 1144‧‧‧SSD 1146‧‧‧hard disk drive 1150‧‧‧processor subsystem 1154‧‧‧bus subsystem 1162‧‧‧Graphics Processing Unit 1164‧‧‧Graphics Processing Unit 1166‧‧‧Graphics Processing Unit 1170‧‧‧Network Interface Subsystem 1175‧‧‧radio frequency signal 1181‧‧‧Internet

圖1示出系統的架構級示意，其中匹配引擎將由主體追蹤引擎識別的主體連結至與在行動裝置上執行的客戶端應用連結的使用者帳戶。Figure 1 shows an architectural level schematic of a system in which a matching engine associates a subject identified by a subject tracking engine to a user account associated with a client application executing on a mobile device.

圖2為示出具有行動計算裝置的主體和攝像機佈設的購物商店中走道的側視圖。FIG. 2 is a side view showing an aisle in a shopping store with a body of a mobile computing device and a camera arrangement.

圖3為示出具有行動計算裝置的主體和攝像機佈設的購物商店中圖2之走道的頂視圖。3 is a top view showing the aisle of FIG. 2 in a shopping store with a body of a mobile computing device and a camera arrangement.

圖4繪示用於儲存主體之關節資訊的範例資料結構。FIG. 4 illustrates an example data structure for storing joint information of a subject.

圖5繪示用於儲存包括關聯關節之資訊的主體的範例資料結構。FIG. 5 illustrates an example data structure for storing a body including information about associated joints.

圖6為繪示用於使用在行動計算裝置上顯示的號誌影像來將識別的主體匹配至使用者帳戶之過程步驟的流程圖。6 is a flowchart illustrating process steps for matching identified subjects to user accounts using a badge image displayed on a mobile computing device.

圖7為繪示用於使用行動計算裝置之服務位置來將識別的主體匹配至使用者帳戶之過程步驟的流程圖。7 is a flow diagram illustrating process steps for using a service location of a mobile computing device to match identified subjects to user accounts.

圖8為繪示用於使用主體和行動計算裝置之速度來將識別的主體匹配至使用者帳戶之過程步驟的流程圖。8 is a flow diagram illustrating process steps for matching identified subjects to user accounts using subject and velocity of the mobile computing device.

圖9A為繪示用於使用網路總體來將識別的主體匹配至使用者帳戶之第一部分的過程步驟的流程圖。9A is a flowchart illustrating process steps for using a network population to match identified subjects to a first portion of user accounts.

圖9B為繪示用於使用網路總體來將識別的主體匹配至使用者帳戶之第二部分的過程步驟的流程圖。9B is a flowchart illustrating process steps for using a network population to match identified subjects to a second portion of user accounts.

圖9C為繪示用於使用網路總體來將識別的主體匹配至使用者帳戶之第三部分的過程步驟的流程圖。9C is a flow diagram illustrating the process steps for using the network population to match identified subjects to user accounts for a third portion of the process.

圖10為一種範例架構，其中圖6到圖9C中出現的四個技術應用到現實空間之區域中用以可靠地將識別的主體匹配至使用者帳戶。FIG. 10 is an example architecture in which the four techniques presented in FIGS. 6-9C are applied to regions of real space to reliably match identified subjects to user accounts.

圖11為組構用於主控圖1之匹配引擎的攝像機和電腦硬體佈設。FIG. 11 shows the camera and computer hardware layout used to host the matching engine in FIG. 1 .

100‧‧‧系統 100‧‧‧system

101a,101b,101n‧‧‧網路節點 101a, 101b, 101n‧‧‧Network node

102‧‧‧網路節點 102‧‧‧Network Node

103‧‧‧網路節點 103‧‧‧Network Node

110‧‧‧主體追蹤引擎 110‧‧‧subject tracking engine

112a,112b,112n‧‧‧影像辨識引擎 112a, 112b, 112n‧‧‧Image recognition engine

114‧‧‧攝像機 114‧‧‧camera

116a,116b,116n‧‧‧走道 116a, 116b, 116n‧‧‧Aisle

118a,118b,118m‧‧‧行動計算裝置 118a, 118b, 118m‧‧‧Mobile computing device

120‧‧‧行動計算裝置 120‧‧‧Mobile Computing Devices

130‧‧‧訓練資料庫 130‧‧‧Training database

140‧‧‧主體資料庫 140‧‧‧subject database

150‧‧‧使用者帳戶資料庫 150‧‧‧user account database

160‧‧‧影像資料庫 160‧‧‧image database

170‧‧‧匹配引擎 170‧‧‧Matching Engine

181‧‧‧網路 181‧‧‧Internet

Claims

A system for associating a subject in a region of real space with a user account associated with a client application executable on a mobile computing device, comprising: a processing system configured to receive data in the A sequence of images of corresponding views in real space, the processing system including logic for determining the location of an identified subject represented in the images, including for executing a client in the region of the real space by identifying The location of the mobile computing device for which the client application on the mobile computing device causes the token The image is displayed on the mobile computing device in the region of the real space, and the logic to match the identified subject to a user account is identified using an image recognition engine that determines a location of the mobile computing device displaying the sign image The location of the mobile computing device.

The system of claim 1, wherein the mobile computing device emits a signal usable to indicate the location of the mobile computing device in the region of real space, and the method used to match the identifying subject to a user account Logic uses the emitted signal to identify the location of the mobile computing device.

The system of claim 1, wherein the logic for matching the identified subject with a user account is not associated with the user account Personal identification biometric information to operate.

The system of claim 1, comprising a set of signage images accessible to the processing system, and wherein the processing system includes a system for matching the user account to an area in the physical space The identifying principal previously received a login communication from the client application on the mobile computing device that identified the user account, and after accepting the login communication, sending a selected one of the set of banner images to the mobile computing device The logic for this client application on .

The system of claim 1, wherein the client application on the mobile computing device transmits accelerometer data to the processing system, and the logic for matching the identifying subject to a user account is used from the The accelerometer data transmitted by the mobile computing device.

The system of claim 1, wherein the mobile computing device emits a signal usable to indicate the location of the mobile computing device in the region of real space, and said is used to match the identifying entity with a user account The logic includes training a network to identify a location of a mobile computing device in the region of real space based on the signal emitted by the mobile computing device.

Such as the system of item 1 of the scope of the patent application, which further includes a log data structure, which includes a series of inventory items for the identification subject, the processing system The system includes associating the log data structure for the matching identified subject to a user account for the identified subject.

A system for associating a subject in a region of real space with a user account associated with a client application executable on a mobile computing device, comprising: a processing system configured to receive data in the A sequence of images of corresponding views in real space, the processing system including logic for determining the location of an identified subject represented in the images, including for executing a client in the region of the real space by identifying A location of a mobile computing device for an application, and logic for matching the location of the mobile computing device with the location of the principal to match the identifying principal with a user account, wherein the client application on the mobile computing device transmits location data to The processing system, and the logic for matching the identifier to a user account uses location data communicated from the mobile computing device, and wherein the logic for matching the identifier to a user account comprises using the logic for sending location data from the mobile computing device from multiple locations over a time interval in the region of real space, the logic further includes: all other mobile computing devices that decide to send location information that does not match a user account the device is separated from the mobile computing device by a predetermined distance; determining the closest unmatched identifying subject to the mobile computing device; and The unmatched identity is matched to a user account of the client application executing on the mobile computing device.

A method of associating a subject in a region of real space with a user account associated with a client application executable on a mobile computing device, the method comprising: receiving a corresponding view in the real space a sequence of images of a plurality of images; determine the location of an identified subject represented in the sequence of images; associate the identified subject with a user account by identifying the location of a mobile computing device executing a client application in the region of real space matching; and matching the location of the mobile computing device with the location of the subject, wherein the client application on the mobile computing device causes a sign image to be displayed on the mobile computing device in the region of the real space, and The step of matching the identified subject to the user account further includes identifying the location of the mobile computing device using an image recognition engine that determines the location of the mobile computing device displaying the sign image.

The method of claim 9, wherein: the mobile computing device emits a signal usable to indicate the location of the mobile computing device in the region of real space; and the matching of the identifying subject to a user account The steps further include using the transmitted signal to identify a location of the mobile computing device.

The method of claim 9, wherein the matching of the identifying subject with the user account is performed without using personally identifiable biometric information associated with the user account.

The method of claim 9, further comprising: receiving a login communication from a client application on a mobile computing device that identifies the user account before matching the user account to an identified subject in the region of the physical space , and after accepting the login communication, sending a selected sign image from the set of sign images to the client application on the mobile computing device.

The method of claim 9, wherein the step of matching the identification subject with the user account further includes: using accelerometer data transmitted from the mobile computing device.

The method of claim 9, wherein the step of matching the identified subject with the user account further includes: training the network to identify mobile computing in the area of the real space based on the signal emitted by the mobile computing device The location of the device.

The method of claim 9 further comprising associating a log data structure comprising a series of inventory items for the matching identified subject to a user account for the identified subject.

A method of associating a subject in a region of real space with a user account associated with a client application executable on a mobile computing device, the method comprising: receiving a corresponding view in the real space a sequence of images of a plurality of images; determine the location of an identified subject represented in the sequence of images; associate the identified subject with a user account by identifying the location of a mobile computing device executing a client application in the region of real space matching; and matching the location of the mobile computing device to the location of the subject, wherein the step of matching the identified subject to a user account further includes: using location data transmitted from the mobile computing device, and wherein the method further including: matching the identifying subject to a user account using the location data of the mobile computing device transmitted from a plurality of locations over time intervals in the region of real space from the mobile computing device, including: determining to transmit unmatched usage all other mobile computing devices that are separated by a predetermined distance from the mobile computing device; determine the closest unmatched identifying subject to the mobile computing device; and associate the unmatched identifying subject on the mobile computing device The user account matches the executed client application.

A non-transitory computer-readable storage medium imprinted with computer program instructions for connecting subjects in regions of real space to Linking a user account, the user account is linked to a client application executable on a mobile computing device, and when executed on a processor, the instructions execute a method comprising: receiving a corresponding field of view in the real space a sequence of images; determining a location of an identified subject represented in the sequence of images; matching the identified subject to a user account by identifying a location of a mobile computing device executing a client application in the region of real space and matching the location of the mobile computing device with the location of the subject, wherein the client application on the mobile computing device causes a sign image to be displayed on the mobile computing device in the region of the real space, and the Matching the identifying subject to the user account further includes identifying the location of the mobile computing device using an image recognition engine that determines the location of the mobile computing device displaying the sign image.

The non-transitory computer-readable storage medium of claim 17, wherein the mobile computing device emits a signal usable to indicate the location of the mobile computing device in the region of real space, and said identifying subject The step of matching the user account further includes using the transmitted signal to identify the location of the mobile computing device.

The non-transitory computer-readable storage medium according to claim 17 of the patent application, wherein the step of matching the identification subject with the user account is performed without using personal identification biometric information associated with the user account.

According to the non-transitory computer-readable storage medium of claim 17, said implementing the method further includes: before matching the user account to identify the subject in the region of the real space, from the identification of the user account A client application on the mobile computing device receives the login communication, and after accepting the login communication, sends a selected sign image from the set of sign images to the client application on the mobile computing device.

The non-transitory computer-readable storage medium of claim 17, wherein the step of matching the identification subject with a user account further includes: using accelerometer data transmitted from the mobile computing device.

The non-transitory computer-readable storage medium of claim 17, wherein the matching of the identified subject with the user account further includes: training the network to identify the real space based on the signal emitted by the mobile computing device The location of the mobile computing device in the area.

The non-transitory computer-readable storage medium of claim 17 further includes associating a log data structure including a series of inventory items for the matching identified subject to a user account for the identified subject.

A non-transitory computer-readable storage medium imprinted with computer program instructions for connecting subjects in regions of real space to Linking a user account, the user account is linked to a client application executable on a mobile computing device, and when executed on a processor, the instructions execute a method comprising: receiving a corresponding field of view in the real space a sequence of images; determining a location of an identified subject represented in the sequence of images; matching the identified subject to a user account by identifying a location of a mobile computing device executing a client application in the region of real space and matching the location of the mobile computing device to the location of the subject, wherein said step of matching the identified subject to a user account further comprises: using location data transmitted from the mobile computing device, and wherein said performing the The method further includes matching the identifying subject to a user account using the location data of the mobile computing device transmitted from a plurality of locations over a time interval in the region of real space, including: determining to transmit an unidentified all other mobile computing devices matching the location information of the user account are separated from the mobile computing device by a predetermined distance; determining the closest unmatched identifying subject to the mobile computing device; The user account of the client application executing on the device matches.