WO2022176774A1 - Analysis device, analysis system, analysis method, and program - Google Patents

Analysis device, analysis system, analysis method, and program Download PDF

Info

Publication number
WO2022176774A1
WO2022176774A1 PCT/JP2022/005374 JP2022005374W WO2022176774A1 WO 2022176774 A1 WO2022176774 A1 WO 2022176774A1 JP 2022005374 W JP2022005374 W JP 2022005374W WO 2022176774 A1 WO2022176774 A1 WO 2022176774A1
Authority
WO
WIPO (PCT)
Prior art keywords
sales floor
product
analysis
behavior
processors
Prior art date
Application number
PCT/JP2022/005374
Other languages
French (fr)
Japanese (ja)
Inventor
叡一 松元
俊太 齋藤
大輔 西野
良博 山田
義文 丸山
優一 野々目
Original Assignee
株式会社Preferred Networks
株式会社イトーヨーカ堂
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社Preferred Networks, 株式会社イトーヨーカ堂 filed Critical 株式会社Preferred Networks
Publication of WO2022176774A1 publication Critical patent/WO2022176774A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion

Definitions

  • the present disclosure relates to analysis devices, analysis systems, analysis methods, and programs.
  • the problem of the present disclosure is to provide a novel technique for analyzing the degree of attention of a product.
  • one aspect of the present disclosure includes one or more memories and one or more processors, and the one or more processors detect human behavior on the sales floor.
  • the present invention relates to an analysis device for estimating the attention level of the sales floor.
  • FIG. 1 is a schematic diagram illustrating an analysis system according to one embodiment of the present disclosure
  • FIG. FIG. 2 is a block diagram showing the functional configuration of the analysis device according to one embodiment of the present disclosure.
  • FIG. 3 is a flowchart illustrating analysis processing according to one embodiment of the present disclosure.
  • FIG. 4 is a block diagram showing the hardware configuration of the analysis device according to one embodiment of the present disclosure.
  • an analysis system that captures images of a store's sales floor and uses a machine learning model to estimate the attention level of the sales floor based on the sales floor image.
  • FIG. 1 is a schematic diagram illustrating an analysis system according to one embodiment of the present disclosure
  • the analysis system 10 of the present embodiment includes, for example, an imaging device 20, a user terminal 30, and an analysis device 100, and when acquiring a sales floor image from the imaging device 20, the analysis device 100 acquires The video of the sales floor is analyzed, and the user terminal 30 is notified of the attention level of the sales floor and the products displayed in the sales floor.
  • the attention level of a product is an example of the attention level of a sales floor.
  • the degree of attention refers to the degree to which a sales floor or product is attractive.
  • the imaging device 20 may be, for example, a video camera installed in a store or the like, captures an image of the sales floor to be imaged, and transmits the sales floor image to the analysis device 100 .
  • the imaging device 20 is installed near a sales floor to be imaged and used to observe the sales floor.
  • the imaging device 20 may be fixed at a fixed location in the store, or may be a movable device provided on a robot or cart. Accordingly, various information can be obtained. Also, it is possible to reduce the number of imaging devices 20 to be installed. Also, a plurality of imaging devices 20 may be provided. As a result, it is possible to obtain an appropriate sales floor image even when there is a blind spot or the like.
  • the user terminal 30 may be, for example, an information processing device such as a personal computer, a tablet, or a smart phone provided in a store, etc., and an analysis device that analyzes information on the products on the sales floor and the attention level of the sales floor estimated based on the sales floor image.
  • 100 or the analysis result of the analysis device 100 is acquired from a server or the like that has been saved.
  • the user terminal 30 may include software related to store management and business improvement, such as various software for supporting determination of optimal product placement and sales clerk business evaluation. You may provide the software which can browse an analysis result.
  • the store clerk or the like uses the data obtained from the analysis device 100 such as the degree of attention of products, the layout of products with high attention, and data related to POP advertisements.
  • a business evaluation may be performed for a store clerk or the like who made a decision or placed a product that achieved a high degree of attention.
  • the analysis device 100 may be, for example, a personal computer provided in a store, or an information processing device such as a server provided in a location different from the store, such as a headquarters that manages the store or on the cloud. From the acquired sales floor image, the degree of attention is estimated for each sales floor and each product type displayed in the sales floor. Note that the analysis apparatus 100 may acquire the sales floor image acquired from the imaging device 20, or may acquire data obtained by subjecting the sales floor image to predetermined processing. In such a case, the sales floor image acquired by the imaging device 20 is output to a predetermined processing device, and the data processed by the processing device is output to the analysis device 100 . This makes it possible to facilitate the transmission of information on the sales floor image via the network and the subsequent processing in the analysis device 100 . When a plurality of imaging devices 20 are installed, one processing device may be provided for the plurality of imaging devices 20 .
  • the analysis device 100 uses a machine learning model such as a neural network to detect the behavior of a store clerk, a visitor, or the like related to the sales floor based on the sales floor video, and from the behavior detection result, the product and / or the sales floor can be estimated.
  • the estimated degree of attention can be used for subsequent sales promotion activities, or can be used for store management work and business improvement, such as the business evaluation of the store clerk who performed the display work.
  • the analysis apparatus 100 can determine how many customers walked in front of the sales floor, how many customers stopped in front of the sales floor, how many customers stopped in front of the sales floor, and how many customers displayed products displayed in the sales floor. Interactions with products and/or the sales floor by customers, such as whether the customers picked up the products and how many customers returned the products they picked up to the sales floor; Reactions are detected, and attention levels of products and/or sales floors are estimated based on the detection results.
  • the analysis apparatus 100 may perform real-time processing or batch processing of the sales floor images acquired from the imaging device 20 .
  • FIG. 2 is a block diagram showing the functional configuration of the analysis device 100 according to one embodiment of the present disclosure.
  • the analysis device 100 of this embodiment has an interaction detection unit 110 and an attention level estimation unit 120.
  • FIG. The interaction detection unit 110 and the attention level estimation unit 120 are installed in the analysis device 100 and implemented by one or more processors executing one or more programs stored in one or more memories.
  • the interaction detection unit 110 detects human behavior regarding the sales floor based on the sales floor video. Specifically, when the sales floor image is acquired from the imaging device 20, the interaction detection unit 110 removes moving objects such as people and shopping carts from the sales floor image as preprocessing. Then, changes in the sales floor are detected from the pre-processed sales floor image, and interaction with the sales floor and products by the extracted person is detected. In the following description, the interaction detection unit 110 detects the behavior of a person on the sales floor based on the sales floor image. good too. For example, the interaction detection unit 110 may detect the interaction of the store clerk with the sales floor and products by tracking the computer terminal that the store clerk carries around in the store, or may be attached to a device such as a shopping cart that a customer uses in the store. Customer interaction with the sales floor and merchandise may be detected by tracking the computer terminals that are being used.
  • the interaction detection unit 110 detects human behavior related to the sales floor, such as interactions with products on the sales floor.
  • the interaction detection unit 110 may use a change in the sales floor detected based on the sales floor video for behavior detection.
  • the interaction detection unit 110 uses a known object detector such as Mask-RCNN (Regional Convolutional Neural Network) to detect moving objects such as people and shopping carts in the sales floor video.
  • Mask-RCNN Registered Convolutional Neural Network
  • the interaction detection unit 110 calculates the difference between the frames of the preprocessed sales floor video, and determines whether the sales floor has changed based on the calculated difference. For example, the interaction detection unit 110 intermittently extracts frames from the sales floor video at predetermined time intervals, and calculates the difference between the extracted adjacent frames. Specifically, the interaction detection unit 110 may use the difference between the image data of the adjacent frames as the difference between the adjacent frames. In addition, the interaction detection unit 110 uses any appropriate machine learning model such as a convolutional neural network, inputs adjacent frames to the machine learning model, and uses the difference between the output feature maps as the difference between the adjacent frames. may be used. By comparing the feature quantity maps, it is expected that the effects of changes in lighting and vibrations on the sales floor can be effectively reduced.
  • machine learning model such as a convolutional neural network
  • the interaction detection unit 110 may use any suitable machine learning model, such as a convolutional neural network trained to detect a portion of the two input frames having a difference equal to or greater than a predetermined threshold. Each adjacent frame may be input to the model and the detected difference portion may be used as the difference between the adjacent frames. By detecting the difference between adjacent frames in this way, the interaction detection unit 110 can identify the position and/or time when the change occurred in the counter.
  • a convolutional neural network trained to detect a portion of the two input frames having a difference equal to or greater than a predetermined threshold.
  • Each adjacent frame may be input to the model and the detected difference portion may be used as the difference between the adjacent frames.
  • the interaction detection unit 110 may estimate the placement area for each product type from the sales floor video. Specifically, the interaction detection unit 110 performs area division for each product type on the frame of the sales floor image, and estimates an arrangement area for each product type. For example, the interaction detection unit 110 may use a trained machine learning model to perform region segmentation based on the product type for the frame of the sales floor video from which the moving object has been removed, and estimate the placement region for each product type. good.
  • the machine learning model may be trained to, upon input of a frame of sales floor video from which moving objects have been removed, divide the frame into regions and output a product region map indicating placement regions for each product type.
  • the interaction detection unit 110 inputs the frame of the sales floor video from which the moving object has been removed to the trained machine learning model, acquires the product area map indicating the display area for each product type in the sales floor, and obtains the acquired product area map.
  • a frame may be generated that is superimposed on the input frame and divided into regions for each product type.
  • the machine learning model for region estimation may be realized as, for example, a neural network, and a pair of a frame of a sales floor image and an annotated frame with an arrangement region for each product type is used as training data.
  • the machine learning model may be trained by supervised learning using as Specifically, the machine learning model may be an instance segmentation model such as Mask-RCNN, and for multiple products or product types in a frame, bounding boxes to be detected and corresponding segmentation It may be trained to predict the mask.
  • the machine learning model may be a convolutional neural network and may be trained to segment by clustering feature vectors in the feature map. In other words, areas with similar feature vectors can be considered areas in which products of the same type are displayed.
  • a convolutional neural network may be trained by tuning such as a pretrained convolutional neural network on a large image dataset such as another Imagenet, or by assigning temporary labels to product regions, It may be trained to predict that label number.
  • the interaction detection unit 110 estimates the product names and/or the number of products (including the quantity of products) displayed in the arrangement area.
  • the interaction detection unit 110 uses a trained machine learning model to estimate the product name and/or the number of products in the product group included in the placement area for each product type.
  • the machine learning model is trained to output the product name and/or the center position of the product contained in the frame when the motion-removed sales floor video frame is input.
  • a product name may be indicated by product identification information such as a product number assigned in advance to the product name.
  • each product may be indicated by a symbol (such as a circle) indicating the center of each product in the frame, or may be indicated by a product center heat map or the like.
  • a machine learning model is trained to receive a frame of motion-removed sales floor video and output the product name and/or product center of the product imaged in that frame.
  • Such a machine learning model may be realized, for example, as a neural network, and a frame of a sales floor image and an annotated frame with a product name for each product type and/or the center of each product in the frame. may be trained by supervised learning using pairs of as training data.
  • the interaction detection unit 110 uses a trained machine learning model to estimate the product name of a product group displayed in an arrangement area
  • the machine learning model captures images from the input frame into the frame. It may specify product identification information such as the product number of the purchased product. That is, the machine learning model is realized as a neural network, and is trained by supervised learning using pairs of sales floor video frames and frames with annotations that give product identification information for each product in the frames as training data. may be After acquiring the machine learning model trained in this way, the interaction detection unit 110 can use the machine learning model to estimate the product name of each product displayed in the frame of the sales floor video.
  • the input frame may be a frame divided into areas, or may be a frame not divided into areas.
  • the machine learning model may be a neural network that determines product feature values for each product type from the frames of the sales floor video. After estimating the feature amount of each product arranged in the frame using the machine learning model, the interaction detection unit 110 may specify the product name corresponding to the estimated feature amount as the product.
  • the product may be determined as unknown.
  • external information such as store layout information and POS data
  • the interaction detection unit 110 uses a trained machine learning model to estimate the number of products in the product group in the placement area
  • the machine learning model can, for example, estimate the number of products captured in the frame from the input frame. It may also specify the center. That is, the machine learning model is realized as a neural network and trained by supervised learning using pairs of frames of sales floor images and frames with annotations that give the center of each product in the frame as training data. good too.
  • the interaction detection unit 110 uses the machine learning model to estimate the center of each product displayed in the frame of the sales floor image, and divides the frame into regions. , the number of products displayed in each placement area can be estimated based on the estimated number of centers in each placement area. For example, the interaction detection unit 110 uses both a machine learning model for identifying product names and a machine learning model for estimating product centers to determine the product names of products arranged in each placement region of the segmented frame. and the center of each item can be generated. The interaction detection unit 110 can estimate the product name and the number of products for each product type by counting the number of center items included in each placement area based on the frame.
  • the estimation of the number of products is not limited to this.
  • a machine learning model may be used that detects by a bounding box indicating the position of each product within a frame.
  • the interaction detection unit 110 may estimate the number of products by counting the number of bounding boxes included in each placement area.
  • the number of items may be estimated by taking the item-centric heatmap as the item density and integrating the item-centric heatmap for each placement region.
  • the interaction detection unit 110 may estimate the number of products in each placement region of the frame using a machine learning model trained to regress the number of products from the feature amount of the placement region.
  • Estimating the number of items by regression of item density and number of items described above may also predict the number of hidden items that are not captured in the frame if the machine learning model is properly trained.
  • the size of the area where products are placed is calculated backwards. You may
  • the interaction detection unit 110 identifies the person and movement in the image of the person extracted from the preprocessed sales floor image.
  • the interaction detection unit 110 may distinguish between a store clerk and a customer captured in the sales floor video. That is, when the interaction detection unit 110 receives an image of a person extracted from a sales floor image, the interaction detection unit 110 uses a machine learning model trained to determine whether the person is a customer or a salesclerk. good.
  • the machine learning model may be implemented, for example, as a convolutional neural network, and trained using annotated image data of store employees and annotated image data of customers as training data.
  • the interaction detection unit 110 may discriminate between a store clerk and a customer captured based on the sales floor image, and use the discrimination result to estimate behavior.
  • the interaction detection unit 110 may detect the movement of the person from the extracted video of the person. For example, the interaction detection unit 110 detects that the customer walks in front of the product sales floor, the customer stops in front of the product sales floor, the customer picks up the product, or the customer returns the product. Predetermined behaviors related to customer interactions with products and sales floors may be detected.
  • the interaction detection unit 110 may generate trajectory data by tracking the position of the person using known tracking technology, and detect walking or staying of the person from the generated trajectory data.
  • the trajectory data may, for example, associate a position with the time when the person was at the position.
  • the bounding box of a person is detected for each frame, and in the detection results of temporally adjacent frames, if the difference in the feature amount corresponding to the detection area is small or if the overlap of the bounding boxes is large.
  • the interaction detection unit 110 may determine that the customer has walked in front of the sales floor to be analyzed. Further, when the customer's trajectory data indicates that the customer has stayed in front of the analysis target sales floor for a predetermined threshold time or longer, the interaction detection unit 110 determines that the customer stopped in front of the analysis target sales floor. good too.
  • the trajectory data may be configured as time-series trajectory data
  • the interaction detection unit 110 detects whether the time-series trajectory data and the time-series trajectory data have passed through the sales floor, or have passed through the sales floor.
  • Any appropriate machine learning model such as a neural network trained with a pair of annotations indicating whether the customer has interacted as training data may be used to determine whether the customer stopped in front of the sales floor to be analyzed. .
  • the interaction detection unit 110 detects the movement of body parts such as the hands of the person using a known pose estimation technique, and detects the interaction of the person with the product on the sales floor based on the detection result. good.
  • a known pose estimation technique OpenPose, AlphaPose, or the like may be used as known pose estimation techniques.
  • the interaction detection unit 110 detects the position of the customer's hand from the image of the customer, and if the detected hand is in the area where the product is placed in the sales floor image for a predetermined threshold time or longer, the customer detects that the product is in question. It may be determined that the user has interacted with
  • the interaction detection unit 110 is trained using a pair of an image of a hand extracted by pose estimation and an annotation indicating whether the hand is picking up a product, returning the product, or something else as training data.
  • a machine learning model such as any suitable neural network, may be used to determine whether a customer has picked up or returned an item at the analyzed department.
  • the interaction detection unit 110 may use an action estimator to estimate the action of the extracted person.
  • the action estimator takes as input an extracted video of a person, and may be any suitable machine learning model, such as a neural network, trained to determine which of the predetermined actions the person is performing. may be implemented.
  • the machine learning model uses a video of a person and the products and the sales floor, such as when the person walks in front of the product sales floor, stops in front of the product sales floor, looks at the product, picks up the product, or returns the product. may be trained as training data pairs with predetermined behaviors associated with interactions with .
  • the interaction detection unit 110 may use an action estimator to detect not only the behavior of customers but also the behavior of store employees regarding products and sales floors.
  • the action estimator uses a machine learning model trained to detect not only the predetermined interactions of shoppers with the products described above, but also the interactions of the store clerk with the product and the sales floor. Interactions by store clerks with products and sales floors may also be detected.
  • the interaction detection unit 110 is a machine learning model such as a neural network trained to detect interactions such as display work such as arranging, replenishing, and replacing products in an arrangement area by a store clerk, and sales promotion work such as presenting POP advertisements. may be used to detect the interaction of the clerk with the product from the image of the clerk.
  • the interaction detection unit 110 further detects interactions with products by customers and store clerks based on these detection results. Specifically, the interaction detection unit 110 identifies the position and time at which the change occurred in the sales floor from the detection result of the change in the sales floor, and from the detection result of the product area, the product name and the product displayed in the analysis target sales floor. It is possible to specify the number of people, and from the results of detecting people, it is possible to specify interactions with products by customers or store clerks.
  • the interaction detection unit 110 identifies an increase or decrease in the number of products for each product type, and an interaction with a person or product who was in the sales floor at that time, from the sales floor images before and after the change in the position and time of the change in the sales floor. be able to.
  • the interaction detection unit 110 may be able to detect, from the sales floor video before and after the change, that a customer picked up two products A and the number of products displayed on the sales floor decreased by two.
  • the interaction detection unit 110 may be able to detect that the store clerk replenished product B and the number of products displayed in the sales floor increased from the sales floor images before and after the change.
  • the interaction detection unit 110 may be able to detect from the sales floor image that a customer has passed or stopped in front of the sales floor even when there is no change in the sales floor.
  • the interaction detection unit 110 In order to determine the interaction for the combination of the detection result of the change in the sales floor, the product area, and the movement with the person, the interaction detection unit 110 detects the change in the sales floor, the product area, and the movement with the person, and the interaction.
  • a table showing the correspondence may be stored in advance.
  • the interaction detection unit 110 may refer to the table and determine an interaction corresponding to a combination of detection results of changes in the sales floor, product areas, and movements of a person on a rule basis.
  • the interaction detection unit 110 may use the amount of products in the sales floor estimated based on the sales floor for behavior detection. When the interaction with the product on the sales floor to be analyzed is detected in this way, the interaction detection unit 110 passes the detection result of the interaction to the attention level estimation unit 120 .
  • the attention level estimation unit 120 estimates the attention level of the product based on the interaction detection result. Specifically, for example, the attention level estimation unit 120 performs statistical processing such as normalization on the interactions detected within a predetermined period, and determines the product and/or sales floor that are the target of the interaction by the customer. Calculate attention. For example, the attention level estimation unit 120 counts the number of interactions in which customers stopped at the sales floor, normalizes the total number of visitors to the store during the period and the total number of customers who have passed the sales floor, and determines the sales floor and/or Or you may determine the attention degree of the goods currently displayed.
  • the attention level estimation unit 120 counts the number of interactions in which customers stopped at the sales floor, normalizes the total number of visitors to the store during the period and the total number of customers who have passed the sales floor, and determines the sales floor and/or Or you may determine the attention degree of the goods currently displayed.
  • the attention degree estimation unit 120 counts the number of interactions in which customers pick up the products displayed in each placement area of the sales floor, and based on the relative number of interactions in each placement area, each placement area is counted.
  • the prominence of items displayed in the area may be determined. For example, it is conceivable that an arrangement area with a relatively large number of interactions attracts a high degree of attention not only to the displayed products but also to the arrangement area. For this reason, the estimated attention level may be used to display actively sold products in an arrangement area with a high attention level.
  • the attention level estimation unit 120 may estimate the relationship between the interaction by the clerk and the attention level. For example, in response to an interaction in which a store clerk performs display work such as replenishment, replacement, or tidying up of a product, the detection results of customer interaction with the product after that interaction are aggregated, and the store clerk is based on the aggregated interaction detection result. It may be estimated how much the interaction with the product affected the attention of the product. For example, the attention level estimation unit 120 calculates an increase or decrease in customer interaction with a certain product before and after the store clerk puts the sales floor in order. You may judge that it contributed to sales of goods.
  • the attention degree estimation unit 120 may notify the store clerk or the department in charge of the estimated attention degree and use it for subsequent display strategies and sales promotion strategies. For example, when the attention level of a product is equal to or higher than a predetermined threshold, the attention level estimation unit 120 expands the arrangement area of the product, increases the order quantity of the product so as not to cause shortages, or The store clerk or department in charge may be notified to raise the evaluation of the store clerk who displayed the product. In addition, when the attention level of a certain sales floor is equal to or higher than a predetermined threshold, the attention level estimation unit 120 displays a product to be promoted in the sales floor, or instructs the sales clerk to raise the evaluation of the sales clerk who displayed the sales floor. You can notify the department in charge.
  • FIG. 3 is a flowchart illustrating analysis processing according to one embodiment of the present disclosure.
  • step S101 the analysis device 100 acquires a sales floor image. Specifically, the analysis device 100 acquires the sales floor image from the imaging device 20 installed in the sales floor.
  • the analysis apparatus 100 may execute the following steps on the acquired sales floor image in real time, or temporarily store the acquired sales floor image and store the stored sales floor image at an appropriate timing. The following steps may be performed on the video.
  • step S102 the analysis device 100 preprocesses the sales floor video. Specifically, analysis device 100 uses any known object detector to detect moving objects such as people and shopping carts in sales floor images, and uses any known moving object removal technology to detect To remove a moving object from a sales floor image. In addition, the analysis device 100 performs region segmentation for each product type on the sales floor video from which the moving object has been removed, and estimates an arrangement region for each product type.
  • the analysis device 100 detects the behavior of sales staff and customers with respect to the sales floor and the products on the sales floor. Specifically, the analysis device 100 may calculate the difference between the frames of the sales floor video from which the moving object has been removed, and determine whether a change has occurred in the sales floor based on the calculated difference. Also, the analysis device 100 may estimate the product names and/or the quantity (the number of products) of the products displayed in the arrangement area. Also, the analysis device 100 may recognize a person or movement in an image of a person detected as a moving object. For example, the analysis device 100 may discriminate between a store clerk and a customer captured in the sales floor video.
  • the analysis device 100 can detect, from the image of the person, that the customer walks in front of the product sales floor, the customer stops in front of the product sales floor, the customer looks at the product, the customer picks up the product, and so on.
  • predetermined behaviors related to customer interaction with the product or the sales floor such as the customer returning the product, may be detected.
  • the analysis device 100 may also detect interaction by a salesclerk with a product or sales floor (for example, display work such as replenishment, replacement, and arrangement of products in the sales floor, sales promotion work such as presentation of POP advertisements, etc.).
  • Detection of these interactions may be performed based on, for example, a machine learning model such as a neural network, and a machine learning model may be configured for each type of human behavior to be detected.
  • An end-to-end machine learning model may be constructed that detects the desired type of behavior.
  • the analysis device 100 may estimate the attention level of the sales floor based on the behavior detection results. Specifically, the analysis device 100 detects the attention level of the sales floor based on the detection results of various behaviors. For example, the analysis device 100 may perform statistical processing on behaviors detected within a predetermined period of time, and calculate the degree of attention of the products and/or sales floors targeted by the behaviors of customers and store clerks. The analysis apparatus 100 may notify the store clerk or the department in charge of the estimated degree of attention, and may use it for subsequent display strategies and sales promotion strategies.
  • Part or all of the analysis device 100 in the above-described embodiment may be configured by hardware, or information on software (program) executed by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like. processing.
  • software information processing software that realizes at least part of the functions of each device in the above-described embodiments can be stored on a flexible disk, CD-ROM (Compact Disc-Read Only Memory), or USB (Universal Serial Bus) memory or other non-temporary storage medium (non-temporary computer-readable medium) and read by a computer to execute software information processing.
  • the software may be downloaded via a communication network.
  • information processing may be executed by hardware by implementing software in a circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • the type of storage medium that stores the software is not limited.
  • the storage medium is not limited to a detachable one such as a magnetic disk or an optical disk, and may be a fixed storage medium such as a hard disk or memory. Also, the storage medium may be provided inside the computer, or may be provided outside the computer.
  • FIG. 4 is a block diagram showing an example of the hardware configuration of the analysis device 100 in the embodiment described above.
  • the analysis device 100 includes, for example, a processor 71 , a main storage device 72 (memory), an auxiliary storage device 73 (memory), a network interface 74 , and a device interface 75 . It may also be implemented as a connected computer 7 .
  • the computer 7 in FIG. 4 has one of each component, it may have a plurality of the same components.
  • the software is installed in a plurality of computers, and each of the plurality of computers executes the same or different processing of the software. good too. In this case, it may be in the form of distributed computing in which each computer communicates via the network interface 74 or the like to execute processing.
  • the analysis apparatus 100 in the above-described embodiment may be configured as a system in which functions are realized by one or more computers executing instructions stored in one or more storage devices. Further, the information transmitted from the terminal may be processed by one or more computers provided on the cloud, and the processing result may be transmitted to the terminal.
  • Various operations of the analysis device 100 in the above-described embodiment may be executed in parallel using one or more processors or using multiple computers via a network. Also, various operations may be distributed to a plurality of operation cores in the processor and executed in parallel. Also, part or all of the processing, means, etc. of the present disclosure may be executed by at least one of a processor and a storage device provided on a cloud capable of communicating with the computer 7 via a network. Thus, the analysis device 100 in the above-described embodiments may be in the form of parallel computing by one or more computers.
  • the processor 71 may be an electronic circuit (processing circuit, processing circuit, CPU, GPU, FPGA, ASIC, etc.) including a computer control device and arithmetic device. Also, the processor 71 may be a semiconductor device or the like including a dedicated processing circuit. The processor 71 is not limited to an electronic circuit using electronic logic elements, and may be realized by an optical circuit using optical logic elements. Also, the processor 71 may include arithmetic functions based on quantum computing.
  • the processor 71 can perform arithmetic processing based on the data and software (programs) input from each device, etc. of the internal configuration of the computer 7, and output the arithmetic result and control signal to each device, etc.
  • the processor 71 may control each component of the computer 7 by executing the OS (Operating System) of the computer 7, applications, and the like.
  • the analysis device 100 in the above-described embodiment may be realized by one or more processors 71.
  • the processor 71 may refer to one or more electronic circuits arranged on one chip, or one or more electronic circuits arranged on two or more chips or two or more devices. You can point When multiple electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.
  • the main storage device 72 is a storage device that stores commands executed by the processor 71 and various types of data.
  • the auxiliary storage device 73 is a storage device other than the main storage device 72 .
  • These storage devices mean any electronic components capable of storing electronic information, and may be semiconductor memories.
  • the semiconductor memory may be either volatile memory or non-volatile memory.
  • a storage device for storing various data in the analysis device 100 in the above-described embodiment may be implemented by the main storage device 72 or the auxiliary storage device 73, or may be implemented by a built-in memory built into the processor 71.
  • the storage unit 72 in the above-described embodiment may be realized by the main storage device 72 or the auxiliary storage device 73.
  • a plurality of processors may be connected (coupled) to one storage device (memory), or a single processor may be connected.
  • a plurality of storage devices (memories) may be connected (coupled) to one processor.
  • the analysis device 100 in the above-described embodiment is composed of at least one storage device (memory) and a plurality of processors connected (coupled) to this at least one storage device (memory)
  • at least One processor may include a configuration that is connected (coupled) to at least one storage device (memory). Also, this configuration may be realized by storage devices (memory) and processors included in a plurality of computers.
  • a configuration in which a storage device (memory) is integrated with a processor for example, a cache memory including an L1 cache and an L2 cache
  • a cache memory for example, a cache memory including an L1 cache and an L2 cache
  • the network interface 74 is an interface for connecting to the communication network 8 wirelessly or by wire. As for the network interface 74, an appropriate interface such as one conforming to existing communication standards may be used. The network interface 74 may exchange information with the external device 9A connected via the communication network 8 .
  • the communication network 8 may be any one of WAN (Wide Area Network), LAN (Local Area Network), PAN (Personal Area Network), etc., or a combination of them. It is sufficient if information can be exchanged between them. Examples of WANs include the Internet, examples of LANs include IEEE 802.11 and Ethernet (registered trademark), and examples of PANs include Bluetooth (registered trademark) and NFC (Near Field Communication).
  • the device interface 75 is an interface such as USB that directly connects with the external device 9B.
  • the external device 9A is a device connected to the computer 7 via a network.
  • the external device 9B is a device that is directly connected to the computer 7. FIG.
  • the external device 9A or the external device 9B may be an input device.
  • the input device is, for example, a device such as a camera, microphone, motion capture, various sensors, keyboard, mouse, or touch panel, and provides the computer 7 with acquired information.
  • a device such as a personal computer, a tablet terminal, or a smartphone including an input unit, a memory, and a processor may be used.
  • the external device 9A or the external device 9B may be an output device as an example.
  • the output device may be, for example, a display device such as LCD (Liquid Crystal Display), CRT (Cathode Ray Tube), PDP (Plasma Display Panel), or organic EL (Electro Luminescence) panel, and output audio etc. It may be a speaker or the like that Alternatively, a device such as a personal computer, a tablet terminal, or a smartphone including an output unit, a memory, and a processor may be used.
  • the external device 9A or the external device 9B may be a storage device (memory).
  • the external device 9A may be a network storage or the like, and the external device 9B may be a storage such as an HDD.
  • the external device 9A or the external device 9B may be a device having the functions of some of the components of each device (server 100 or terminal 200) in the above-described embodiments. That is, the computer 7 may transmit or receive part or all of the processing results of the external device 9A or the external device 9B.
  • the expression "at least one (one) of a, b and c" or “at least one (one) of a, b or c" includes any of a, b, c, a-b, ac, b-c, or a-b-c. Also, multiple instances of any element may be included, such as a-a, a-b-b, a-a-b-b-c-c, and so on. It also includes the addition of other elements than the listed elements (a, b and c), such as having d such as a-b-c-d.
  • connection and “coupled” when used, they refer to direct connection/coupling, indirect connection/coupling , electrically connected/coupled, communicatively connected/coupled, operatively connected/coupled, physically connected/coupled, etc. intended as a term.
  • the term should be interpreted appropriately according to the context in which the term is used, but any form of connection/bonding that is not intentionally or naturally excluded is not included in the term. should be interpreted restrictively.
  • the physical structure of element A is such that it is capable of performing operation B configuration, including that a permanent or temporary setting/configuration of element A is configured/set to actually perform action B good.
  • element A is a general-purpose processor
  • the processor has a hardware configuration capable of executing operation B, and operation B is performed by setting a permanent or temporary program (instruction). It just needs to be configured to actually run.
  • the element A is a dedicated processor or a dedicated arithmetic circuit, etc., regardless of whether or not control instructions and data are actually attached, the circuit structure of the processor actually executes the operation B. It just needs to be implemented.
  • finding a global optimum finding an approximation of a global optimum, finding a local optimum, and finding a local optimum It includes approximations of values and should be interpreted accordingly depending on the context in which the term is used. It also includes stochastically or heuristically approximating these optimum values.
  • each piece of hardware may work together to perform the predetermined processing, or a part of the hardware may perform the predetermined processing. You may do all of Also, some hardware may perform a part of the predetermined processing, and another hardware may perform the rest of the predetermined processing.
  • the hardware that performs the first process and the hardware that performs the second process may be the same or different. In other words, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more pieces of hardware.
  • hardware may include an electronic circuit or a device including an electronic circuit.
  • each storage device (memory) among the plurality of storage devices (memories) stores only part of the data. may be stored, or the entirety of the data may be stored.

Abstract

Provided is a novel technology for analyzing the level of interest in a product. One embodiment of the present invention relates to an analysis device having one or more memories and one or more processors, wherein the one or more processors estimate the level of interest in a sales area on the basis of detection results regarding human behavior related to the sales area.

Description

解析装置、解析システム、解析方法及びプログラムAnalysis device, analysis system, analysis method and program
 本開示は、解析装置、解析システム、解析方法及びプログラムに関する。 The present disclosure relates to analysis devices, analysis systems, analysis methods, and programs.
 スーパーマーケットやコンビニエンスストアなどの小売業界において、情報技術の活用が進展している。例えば、店舗における商品の陳列にも情報技術が活用されてきている。 The use of information technology is progressing in the retail industry such as supermarkets and convenience stores. For example, information technology has been utilized in the display of merchandise in stores.
特開2020-71874号公報JP 2020-71874 A
 本開示の課題は、商品の注目度を解析するための新規な技術を提供することである。  The problem of the present disclosure is to provide a novel technique for analyzing the degree of attention of a product.
 上記課題を解決するため、本開示の一態様は、1つ以上のメモリと、1つ以上のプロセッサと、を有し、前記1つ以上のプロセッサは、売場に関する人の挙動の検出結果に基づき前記売場の注目度を推定する、解析装置に関する。 In order to solve the above problems, one aspect of the present disclosure includes one or more memories and one or more processors, and the one or more processors detect human behavior on the sales floor. The present invention relates to an analysis device for estimating the attention level of the sales floor.
図1は、本開示の一実施例による解析システムを示す概略図である。1 is a schematic diagram illustrating an analysis system according to one embodiment of the present disclosure; FIG. 図2は、本開示の一実施例による解析装置の機能構成を示すブロック図である。FIG. 2 is a block diagram showing the functional configuration of the analysis device according to one embodiment of the present disclosure. 図3は、本開示の一実施例による解析処理を示すフローチャートである。FIG. 3 is a flowchart illustrating analysis processing according to one embodiment of the present disclosure. 図4は、本開示の一実施例による解析装置のハードウェア構成を示すブロック図である。FIG. 4 is a block diagram showing the hardware configuration of the analysis device according to one embodiment of the present disclosure.
 以下、図面に基づいて本開示の実施の形態を説明する。 Embodiments of the present disclosure will be described below based on the drawings.
 以下の実施例では、店舗の売場を撮像し、機械学習モデルを利用して売場映像に基づいて売場の注目度を推定する解析システムが開示される。 In the embodiment below, an analysis system is disclosed that captures images of a store's sales floor and uses a machine learning model to estimate the attention level of the sales floor based on the sales floor image.
 [解析システム]
 まず、図1を参照して、本開示の一実施例による解析システムを説明する。図1は、本開示の一実施例による解析システムを示す概略図である。
[Analysis system]
First, an analysis system according to an embodiment of the present disclosure will be described with reference to FIG. 1 is a schematic diagram illustrating an analysis system according to one embodiment of the present disclosure; FIG.
 図1に示されるように、本実施例の解析システム10は、例えば、撮像装置20、ユーザ端末30及び解析装置100を備え、撮像装置20から売場映像を取得すると、解析装置100は、取得した売場映像を解析し、売場や当該売場に陳列されている商品の注目度をユーザ端末30に通知する。なお、商品の注目度は売場の注目度の一例である。また、注目度とは、売場や商品が魅力的であるかの度合いを指す。 As shown in FIG. 1, the analysis system 10 of the present embodiment includes, for example, an imaging device 20, a user terminal 30, and an analysis device 100, and when acquiring a sales floor image from the imaging device 20, the analysis device 100 acquires The video of the sales floor is analyzed, and the user terminal 30 is notified of the attention level of the sales floor and the products displayed in the sales floor. Note that the attention level of a product is an example of the attention level of a sales floor. Also, the degree of attention refers to the degree to which a sales floor or product is attractive.
 撮像装置20は、例えば、店舗等に設置されたビデオカメラであってもよく、撮像対象の売場を撮像し、売場映像を解析装置100に送信する。典型的には、撮像装置20は、撮像対象の売場の近傍に設置され、当該売場を観測するのに利用される。撮像装置20は、店舗の一定の個所に固定されてもよく、ロボットやカートに備えられた移動可能なものであってもよい。これにより、種々の情報を取得することができる。また、設置される撮像装置20の台数を減らすことも可能となる。また、撮像装置20は複数備えられてもよい。これにより、死角等が発生する場合であっても適切な売場映像を取得することができる。 The imaging device 20 may be, for example, a video camera installed in a store or the like, captures an image of the sales floor to be imaged, and transmits the sales floor image to the analysis device 100 . Typically, the imaging device 20 is installed near a sales floor to be imaged and used to observe the sales floor. The imaging device 20 may be fixed at a fixed location in the store, or may be a movable device provided on a robot or cart. Accordingly, various information can be obtained. Also, it is possible to reduce the number of imaging devices 20 to be installed. Also, a plurality of imaging devices 20 may be provided. As a result, it is possible to obtain an appropriate sales floor image even when there is a blind spot or the like.
 ユーザ端末30は、例えば、店舗等に備えられたパーソナルコンピュータ、タブレット、スマートフォンなどの情報処理装置であってもよく、売場映像に基づき推定された売場の商品や売場の注目度に関する情報を解析装置100や解析装置100の解析結果を保存したサーバ等から取得する。例えば、ユーザ端末30は、店舗の運営業務や業務改善に関するソフトウェア、例えば最適な商品配置や店員の業務評価を決定するのを支援するための各種ソフトウェアを備えてもよく、また、解析装置100の解析結果を閲覧可能なソフトウェアを備えてもよい。店員等は、このような解析装置100から取得した商品の注目度や注目度の高い商品の配置やPOP広告に関するデータ等を利用して各種ソフトウェアによって解析されたデータに基づき、売場の商品配置を決定したり、高い注目度を実現した商品配置を行った店員等の業務評価を実行してもよい。 The user terminal 30 may be, for example, an information processing device such as a personal computer, a tablet, or a smart phone provided in a store, etc., and an analysis device that analyzes information on the products on the sales floor and the attention level of the sales floor estimated based on the sales floor image. 100 or the analysis result of the analysis device 100 is acquired from a server or the like that has been saved. For example, the user terminal 30 may include software related to store management and business improvement, such as various software for supporting determination of optimal product placement and sales clerk business evaluation. You may provide the software which can browse an analysis result. The store clerk or the like uses the data obtained from the analysis device 100 such as the degree of attention of products, the layout of products with high attention, and data related to POP advertisements. A business evaluation may be performed for a store clerk or the like who made a decision or placed a product that achieved a high degree of attention.
 解析装置100は、例えば、店舗に備えられたパーソナルコンピュータ、店舗とは異なる場所、例えば店舗を管理する本部やクラウド上に備えられたサーバなどの情報処理装置であってもよく、撮像装置20から取得した売場映像から、売場や売場に陳列されている商品種別毎の注目度を推定する。なお、解析装置100は、撮像装置20から取得した売場映像を取得してもよく、売場映像に所定の処理を行ったデータを取得してもよい。このような場合、撮像装置20が取得した売場映像は所定の処理装置に出力され、その処理装置で処理されたデータが解析装置100に出力される。これにより、売場映像に関する情報のネットワークを介した送信や解析装置100においての以降の処理を容易にすることが可能となる。複数の撮像装置20が設置されている場合、この処理装置は複数の撮像装置20に対して1つ設けられてもよい。 The analysis device 100 may be, for example, a personal computer provided in a store, or an information processing device such as a server provided in a location different from the store, such as a headquarters that manages the store or on the cloud. From the acquired sales floor image, the degree of attention is estimated for each sales floor and each product type displayed in the sales floor. Note that the analysis apparatus 100 may acquire the sales floor image acquired from the imaging device 20, or may acquire data obtained by subjecting the sales floor image to predetermined processing. In such a case, the sales floor image acquired by the imaging device 20 is output to a predetermined processing device, and the data processed by the processing device is output to the analysis device 100 . This makes it possible to facilitate the transmission of information on the sales floor image via the network and the subsequent processing in the analysis device 100 . When a plurality of imaging devices 20 are installed, one processing device may be provided for the plurality of imaging devices 20 .
 本実施例による解析装置100は、ニューラルネットワークなどの機械学習モデルを利用して、売場映像に基づいて売場に関する店員や来店客などの挙動を検出し、挙動の検出結果から当該商品及び/又は売場の注目度を推定することができる。推定した注目度は以降の販促活動等に利用したり、あるいは、陳列作業を行った店員の業務評価等、店舗の運営業務や業務改善等に利用されうる。 The analysis device 100 according to the present embodiment uses a machine learning model such as a neural network to detect the behavior of a store clerk, a visitor, or the like related to the sales floor based on the sales floor video, and from the behavior detection result, the product and / or the sales floor can be estimated. The estimated degree of attention can be used for subsequent sales promotion activities, or can be used for store management work and business improvement, such as the business evaluation of the store clerk who performed the display work.
 例えば、解析装置100は、図1に示されるような売場映像について、売場前を何人の来店客が歩いたか、売場前で何人の来店客が立ち止まったか、売場に陳列されている商品を何人の来店客が手に取ったか、手に取った商品を何人の来店客が売場に戻したか、などの来店客による商品及び/又は売場とのインタラクション、言い換えると来店客の商品及び/又は売場への反応を検出し、検出結果に基づき商品及び/又は売場の注目度を推定する。ここで、解析装置100は、撮像装置20から取得した売場映像をリアルタイム処理してもよいし、バッチ処理してもよい。 For example, the analysis apparatus 100 can determine how many customers walked in front of the sales floor, how many customers stopped in front of the sales floor, how many customers stopped in front of the sales floor, and how many customers displayed products displayed in the sales floor. Interactions with products and/or the sales floor by customers, such as whether the customers picked up the products and how many customers returned the products they picked up to the sales floor; Reactions are detected, and attention levels of products and/or sales floors are estimated based on the detection results. Here, the analysis apparatus 100 may perform real-time processing or batch processing of the sales floor images acquired from the imaging device 20 .
 本開示によると、売場映像に基づいて売場や商品の注目度を推定し、推定した注目度に基づき適切な商品配置を決定することが可能になる。また、推定した注目度に基づき店員等による商品陳列やPOP広告の効果を推定することが可能になる。 According to this disclosure, it is possible to estimate the degree of attention of the sales floor and products based on the sales floor video, and determine appropriate product placement based on the estimated degree of attention. In addition, it is possible to estimate the effect of product display and POP advertisements by store clerks and the like based on the estimated degree of attention.
 [解析装置]
 次に、図2を参照して、本開示の一実施例による解析装置100を説明する。図2は、本開示の一実施例による解析装置100の機能構成を示すブロック図である。
[analysis device]
Next, with reference to FIG. 2, the analysis device 100 according to one embodiment of the present disclosure will be described. FIG. 2 is a block diagram showing the functional configuration of the analysis device 100 according to one embodiment of the present disclosure.
 図2に示されるように、本実施例の解析装置100は、インタラクション検出部110及び注目度推定部120を有する。インタラクション検出部110及び注目度推定部120は、解析装置100にインストールされ、1つ以上のメモリに格納された1つ以上のプログラムを1つ以上のプロセッサが実行することによって実現される。 As shown in FIG. 2, the analysis device 100 of this embodiment has an interaction detection unit 110 and an attention level estimation unit 120. FIG. The interaction detection unit 110 and the attention level estimation unit 120 are installed in the analysis device 100 and implemented by one or more processors executing one or more programs stored in one or more memories.
 インタラクション検出部110は、売場映像に基づいて売場に関する人の挙動を検出する。具体的には、撮像装置20から売場映像を取得すると、インタラクション検出部110は、前処理として、売場映像から人やショッピングカートなどの動体を除去する。そして、前処理された売場映像から売場の変化を検出すると共に、抽出した人物による売場及び商品とのインタラクションを検出する。なお、以下の説明においてインタラクション検出部110は、売場に関する人の挙動を売場映像に基づいて検出するが、インタラクション検出部110は、売場に関する人の挙動を売場映像以外の情報に基づいて検出してもよい。例えば、インタラクション検出部110は、店員が店内で持ち歩くコンピュータ端末を追跡することで店員による売場及び商品とのインタラクションを検出してもよいし、来店客が店内で使用するショッピングカートなどの器具に備え付けられているコンピュータ端末を追跡することで来店客による売場及び商品とのインタラクションを検出してもよい。 The interaction detection unit 110 detects human behavior regarding the sales floor based on the sales floor video. Specifically, when the sales floor image is acquired from the imaging device 20, the interaction detection unit 110 removes moving objects such as people and shopping carts from the sales floor image as preprocessing. Then, changes in the sales floor are detected from the pre-processed sales floor image, and interaction with the sales floor and products by the extracted person is detected. In the following description, the interaction detection unit 110 detects the behavior of a person on the sales floor based on the sales floor image. good too. For example, the interaction detection unit 110 may detect the interaction of the store clerk with the sales floor and products by tracking the computer terminal that the store clerk carries around in the store, or may be attached to a device such as a shopping cart that a customer uses in the store. Customer interaction with the sales floor and merchandise may be detected by tracking the computer terminals that are being used.
 インタラクション検出部110は、売場の商品に対するインタラクションなどの売場に関する人の挙動を検出する。ここで、インタラクション検出部110は、挙動の検出に、売場映像に基づいて検出された売場の変化を利用してもよい。例えば、一例となる前処理として、インタラクション検出部110は、Mask-RCNN(Regional Convolutional Neural Network)などの公知の物体検出器を利用して、売場映像における人やショッピングカートなどの動体を検出する。売場映像内に動体を検出すると、インタラクション検出部110は、公知の動体除去技術を用いて、検出した動体を売場映像から除去し、動体が除去された売場映像と、抽出した動体の映像とを導出する。 The interaction detection unit 110 detects human behavior related to the sales floor, such as interactions with products on the sales floor. Here, the interaction detection unit 110 may use a change in the sales floor detected based on the sales floor video for behavior detection. For example, as an example of preprocessing, the interaction detection unit 110 uses a known object detector such as Mask-RCNN (Regional Convolutional Neural Network) to detect moving objects such as people and shopping carts in the sales floor video. When a moving object is detected in the sales floor image, the interaction detection unit 110 removes the detected moving object from the sales floor image using a known moving object removal technique, and combines the sales floor image from which the moving object has been removed and the extracted moving object image. derive
 そして、インタラクション検出部110は、前処理された売場映像のフレーム間の差分を算出し、算出した差分に基づき売場に変化が生じたか判断する。例えば、インタラクション検出部110は、所定の時間間隔で売場映像からフレームを間欠的に抽出し、抽出した隣接フレームの差分を算出する。具体的には、インタラクション検出部110は、隣接フレームの差分として、当該隣接フレームの画像データの差分を利用してもよい。また、インタラクション検出部110は、畳み込みニューラルネットワークなどの何れか適切な機械学習モデルを利用し、当該機械学習モデルに隣接フレームを入力し、出力された特徴量マップの差分を当該隣接フレームの差分として利用してもよい。特徴量マップを比較することによって、売場に対する照明の変化や振動の影響などを効果的に低減できると予想される。あるいは、インタラクション検出部110は、入力される2つのフレームにおいて所定の閾値以上の差分のある部分を検出するよう訓練された畳み込みニューラルネットワークなどの何れか適切な機械学習モデルを利用し、当該機械学習モデルに各隣接フレームを入力し、検出された差分部分を隣接フレームの差分として利用してもよい。このようにして、隣接フレーム間における差分を検出すると、インタラクション検出部110は、売場に変化が生じた位置及び/又は時刻を特定することができる。 Then, the interaction detection unit 110 calculates the difference between the frames of the preprocessed sales floor video, and determines whether the sales floor has changed based on the calculated difference. For example, the interaction detection unit 110 intermittently extracts frames from the sales floor video at predetermined time intervals, and calculates the difference between the extracted adjacent frames. Specifically, the interaction detection unit 110 may use the difference between the image data of the adjacent frames as the difference between the adjacent frames. In addition, the interaction detection unit 110 uses any appropriate machine learning model such as a convolutional neural network, inputs adjacent frames to the machine learning model, and uses the difference between the output feature maps as the difference between the adjacent frames. may be used. By comparing the feature quantity maps, it is expected that the effects of changes in lighting and vibrations on the sales floor can be effectively reduced. Alternatively, the interaction detection unit 110 may use any suitable machine learning model, such as a convolutional neural network trained to detect a portion of the two input frames having a difference equal to or greater than a predetermined threshold. Each adjacent frame may be input to the model and the detected difference portion may be used as the difference between the adjacent frames. By detecting the difference between adjacent frames in this way, the interaction detection unit 110 can identify the position and/or time when the change occurred in the counter.
 また、インタラクション検出部110は、売場映像から商品種別毎の配置領域を推定してもよい。具体的には、インタラクション検出部110は、売場映像のフレームに対して商品種別毎に領域分割を実行し、商品種別毎に配置領域を推定する。例えば、インタラクション検出部110は、訓練済み機械学習モデルを利用して、動体除去された売場映像のフレームに対して商品種別に基づく領域分割を実行し、商品種別毎の配置領域を推定してもよい。当該機械学習モデルは、動体除去された売場映像のフレームが入力されると、当該フレームを領域分割して商品種別毎の配置領域を示す商品領域マップを出力するよう訓練されてもよい。例えば、インタラクション検出部110は、動体除去された売場映像のフレームを訓練済み機械学習モデルに入力して、売場における商品種別毎の陳列領域を示す商品領域マップを取得すると、取得した商品領域マップを入力フレームに重畳し、商品種別毎に領域分割されたフレームを生成してもよい。 Also, the interaction detection unit 110 may estimate the placement area for each product type from the sales floor video. Specifically, the interaction detection unit 110 performs area division for each product type on the frame of the sales floor image, and estimates an arrangement area for each product type. For example, the interaction detection unit 110 may use a trained machine learning model to perform region segmentation based on the product type for the frame of the sales floor video from which the moving object has been removed, and estimate the placement region for each product type. good. The machine learning model may be trained to, upon input of a frame of sales floor video from which moving objects have been removed, divide the frame into regions and output a product region map indicating placement regions for each product type. For example, the interaction detection unit 110 inputs the frame of the sales floor video from which the moving object has been removed to the trained machine learning model, acquires the product area map indicating the display area for each product type in the sales floor, and obtains the acquired product area map. A frame may be generated that is superimposed on the input frame and divided into regions for each product type.
 ここで、領域推定のための機械学習モデルは、例えば、ニューラルネットワークとして実現されてもよく、売場映像のフレームと、商品種別毎の配置領域が付されたアノテーション付きのフレームとのペアを訓練データとして利用した教師有り学習によって訓練されてもよい。具体的には、当該機械学習モデルは、Mask-RCNNなどのインスタンスセグメンテーションモデルであってもよく、フレーム内の複数の商品又は商品種別に対して、検出対象のバウンディングボックスと、これに対応するセグメンテーションマスクとを予測するよう訓練されてもよい。 Here, the machine learning model for region estimation may be realized as, for example, a neural network, and a pair of a frame of a sales floor image and an annotated frame with an arrangement region for each product type is used as training data. may be trained by supervised learning using as Specifically, the machine learning model may be an instance segmentation model such as Mask-RCNN, and for multiple products or product types in a frame, bounding boxes to be detected and corresponding segmentation It may be trained to predict the mask.
 あるいは、当該機械学習モデルは、畳み込みニューラルネットワークであってもよく、特徴量マップにおける特徴ベクトルをクラスタリングすることによって領域分割するよう訓練されてもよい。すなわち、特徴ベクトルが近い領域は同一種別の商品が陳列されている領域と考えることができる。このような畳み込みニューラルネットワークは、別のImagenet等の大規模画像データセットで事前学習された畳み込みニューラルネットワーク等をチューニングすることによって訓練されてもよいし、あるいは、商品領域に仮のラベルを割当て、当該ラベル番号を予測するよう訓練されてもよい。 Alternatively, the machine learning model may be a convolutional neural network and may be trained to segment by clustering feature vectors in the feature map. In other words, areas with similar feature vectors can be considered areas in which products of the same type are displayed. Such a convolutional neural network may be trained by tuning such as a pretrained convolutional neural network on a large image dataset such as another Imagenet, or by assigning temporary labels to product regions, It may be trained to predict that label number.
 しかしながら、本開示はこれに限定されず、他の何れか適切な商品種別毎の領域分割技術が利用されてもよい。 However, the present disclosure is not limited to this, and any other appropriate area division technique for each product type may be used.
 このようにして商品種別毎に領域分割された売場映像を取得すると、インタラクション検出部110は、配置領域に陳列されている商品の商品名及び/又は商品数(商品の量を含む)を推定してもよい。具体的には、インタラクション検出部110は、商品種別毎の配置領域に含まれる商品群について、訓練済み機械学習モデルを利用して、商品群の商品名及び/又は商品数を推定する。当該機械学習モデルは、動体除去された売場映像のフレームが入力されると、当該フレーム内に含まれる商品の商品名及び/又は中心位置を出力するよう訓練される。例えば、商品名は、当該商品名に予め割り当てられた商品番号等の商品識別情報によって示されてもよい。また、各商品の中心位置は、フレーム内の各商品の中心を示す記号(例えば、丸印など)等によって示されてもよいし、あるいは、商品中心ヒートマップ等によって示されてもよい。機械学習モデルは、動体除去された売場映像のフレームが入力されると、当該フレームに撮像された商品の商品名及び/又は商品中心を出力するよう訓練される。このような機械学習モデルは、例えば、ニューラルネットワークとして実現されてもよく、売場映像のフレームと、当該フレーム内の商品種別毎の商品名及び/又は各商品の中心が付されたアノテーション付きのフレームとのペアを訓練データとして利用した教師有り学習によって訓練されてもよい。 After acquiring the sales floor image segmented into regions for each product type in this manner, the interaction detection unit 110 estimates the product names and/or the number of products (including the quantity of products) displayed in the arrangement area. may Specifically, the interaction detection unit 110 uses a trained machine learning model to estimate the product name and/or the number of products in the product group included in the placement area for each product type. The machine learning model is trained to output the product name and/or the center position of the product contained in the frame when the motion-removed sales floor video frame is input. For example, a product name may be indicated by product identification information such as a product number assigned in advance to the product name. Also, the center position of each product may be indicated by a symbol (such as a circle) indicating the center of each product in the frame, or may be indicated by a product center heat map or the like. A machine learning model is trained to receive a frame of motion-removed sales floor video and output the product name and/or product center of the product imaged in that frame. Such a machine learning model may be realized, for example, as a neural network, and a frame of a sales floor image and an annotated frame with a product name for each product type and/or the center of each product in the frame. may be trained by supervised learning using pairs of as training data.
 より詳細には、インタラクション検出部110が、訓練済み機械学習モデルを利用して配置領域に陳列されている商品群の商品名を推定する場合、当該機械学習モデルは、入力フレームからフレーム内に撮像された商品の商品番号等の商品識別情報を特定するものであってもよい。すなわち、当該機械学習モデルは、ニューラルネットワークとして実現され、売場映像のフレームと、フレーム内の各商品の商品識別情報を付与したアノテーション付きのフレームとのペアを訓練データとして利用した教師有り学習によって訓練されてもよい。このように訓練された機械学習モデルを取得すると、インタラクション検出部110は、当該機械学習モデルを利用して、売場映像のフレーム内に陳列された各商品の商品名を推定することができる。ここで、入力フレームは、領域分割されたフレームであってもよいし、領域分割されていないフレームであってもよい。 More specifically, when the interaction detection unit 110 uses a trained machine learning model to estimate the product name of a product group displayed in an arrangement area, the machine learning model captures images from the input frame into the frame. It may specify product identification information such as the product number of the purchased product. That is, the machine learning model is realized as a neural network, and is trained by supervised learning using pairs of sales floor video frames and frames with annotations that give product identification information for each product in the frames as training data. may be After acquiring the machine learning model trained in this way, the interaction detection unit 110 can use the machine learning model to estimate the product name of each product displayed in the frame of the sales floor video. Here, the input frame may be a frame divided into areas, or may be a frame not divided into areas.
 あるいは、当該機械学習モデルは、売場映像のフレームから商品種別毎の商品の特徴量を決定するニューラルネットワークであってもよい。インタラクション検出部110は、当該機械学習モデルを利用してフレーム内に配置された各商品の特徴量を推定すると、推定した特徴量に対応する商品名を当該商品として特定してもよい。 Alternatively, the machine learning model may be a neural network that determines product feature values for each product type from the frames of the sales floor video. After estimating the feature amount of each product arranged in the frame using the machine learning model, the interaction detection unit 110 may specify the product name corresponding to the estimated feature amount as the product.
 なお、既存のどの商品種別にも該当しない場合、当該商品は未知として決定されてもよい。また、店舗のレイアウト情報やPOSデータなどの外部情報が利用可能である場合、外部情報から解析対象の売場に配置される商品を絞ることができ、解析対象の売場(例えば、野菜売場、お菓子売場など)の商品に適した商品分類(例えば、野菜、お菓子など)種別毎の機械学習モデルを取得することができ、推定精度を向上させることができる。 In addition, if the product does not correspond to any existing product type, the product may be determined as unknown. In addition, when external information such as store layout information and POS data is available, it is possible to narrow down the products to be placed in the analysis target sales floor from the external information, It is possible to acquire a machine learning model for each type of product classification (for example, vegetables, sweets, etc.) suitable for the product of the sales floor, etc., and improve the estimation accuracy.
 次に、インタラクション検出部110が、訓練済み機械学習モデルを利用して配置領域における商品群の商品数を推定する場合、当該機械学習モデルは、例えば、入力フレームからフレーム内に撮像された商品の中心を特定するものであってもよい。すなわち、当該機械学習モデルは、ニューラルネットワークとして実現され、売場映像のフレームと、フレーム内の各商品の中心を付与したアノテーション付きのフレームとのペアを訓練データとして利用した教師有り学習によって訓練されてもよい。 Next, when the interaction detection unit 110 uses a trained machine learning model to estimate the number of products in the product group in the placement area, the machine learning model can, for example, estimate the number of products captured in the frame from the input frame. It may also specify the center. That is, the machine learning model is realized as a neural network and trained by supervised learning using pairs of frames of sales floor images and frames with annotations that give the center of each product in the frame as training data. good too.
 このように訓練された機械学習モデルを取得すると、インタラクション検出部110は、当該機械学習モデルを利用して、売場映像のフレーム内に陳列された各商品の中心を推定し、領域分割されたフレームを参照して、各配置領域内における推定された中心の個数に基づき各配置領域に陳列されている商品数を推定することができる。例えば、インタラクション検出部110は、商品名を特定する機械学習モデルと商品中心を推定する機械学習モデルとを併用して、領域分割されたフレームの各配置領域に配置されている商品群の商品名と各商品の中心とを示すフレームを生成することができる。インタラクション検出部110は、当該フレームに基づき各配置領域に含まれる中心の個数を計数することによって、商品名と商品種別毎の商品数とを推定することができる。 After acquiring the machine learning model trained in this way, the interaction detection unit 110 uses the machine learning model to estimate the center of each product displayed in the frame of the sales floor image, and divides the frame into regions. , the number of products displayed in each placement area can be estimated based on the estimated number of centers in each placement area. For example, the interaction detection unit 110 uses both a machine learning model for identifying product names and a machine learning model for estimating product centers to determine the product names of products arranged in each placement region of the segmented frame. and the center of each item can be generated. The interaction detection unit 110 can estimate the product name and the number of products for each product type by counting the number of center items included in each placement area based on the frame.
 なお、本開示による商品数の推定は、これに限定されるものでない。例えば、商品中心の代わりに、フレーム内の各商品の位置を示すバウンディングボックスによって検出する機械学習モデルが利用されてもよい。この場合、インタラクション検出部110は、各配置領域に含まれるバウンディングボックスの個数を計数することによって商品数を推定してもよい。あるいは、商品中心ヒートマップを商品密度とみなし、各配置領域に対して商品中心ヒートマップを積分することによって商品数を推定してもよい。あるいは、インタラクション検出部110は、配置領域の特徴量から商品数を回帰するよう訓練された機械学習モデルを利用して、フレームの各配置領域内の商品数を推定してもよい。上述した商品密度及び商品数の回帰による商品数の推定は、機械学習モデルが適切に訓練された場合には、フレームには撮像されていない隠れた商品の個数も予測することができうる。また、商品が配置されていない領域を対象として認識を行う機械学習モデルおよび商品が配置されうる領域を対象として認識を行う機械学習モデルを用いて、商品が配置されている領域の大きさを逆算してもよい。 It should be noted that the estimation of the number of products according to this disclosure is not limited to this. For example, instead of product center, a machine learning model may be used that detects by a bounding box indicating the position of each product within a frame. In this case, the interaction detection unit 110 may estimate the number of products by counting the number of bounding boxes included in each placement area. Alternatively, the number of items may be estimated by taking the item-centric heatmap as the item density and integrating the item-centric heatmap for each placement region. Alternatively, the interaction detection unit 110 may estimate the number of products in each placement region of the frame using a machine learning model trained to regress the number of products from the feature amount of the placement region. Estimating the number of items by regression of item density and number of items described above may also predict the number of hidden items that are not captured in the frame if the machine learning model is properly trained. In addition, by using a machine learning model that recognizes areas where no products are placed and a machine learning model that recognizes areas where products can be placed, the size of the area where products are placed is calculated backwards. You may
 一方、インタラクション検出部110は、前処理された売場映像から抽出された人物の映像について人物や動きを特定する。例えば、インタラクション検出部110は、売場映像に撮像された来店客と店員とを判別してもよい。すなわち、インタラクション検出部110は、売場映像から抽出された人物の映像を入力すると、当該人物が来店客か店員かを判別するよう訓練された機械学習モデルを利用して、当該判別を行ってもよい。当該機械学習モデルは、例えば、畳み込みニューラルネットワークとして実現され、アノテーションが付された店員の画像データと、アノテーションが付された来店客の画像データとを訓練データとして利用して訓練されてもよい。一般に、店員は所定の制服や名札等を身に付けており、機械学習モデルは、これらを検出することによって店員か来店客かを判別しうると考えられる。このようにして、インタラクション検出部110は、売場映像に基づいて撮像された来店客と店員とを判別し、判別結果を挙動の推定に利用してもよい。 On the other hand, the interaction detection unit 110 identifies the person and movement in the image of the person extracted from the preprocessed sales floor image. For example, the interaction detection unit 110 may distinguish between a store clerk and a customer captured in the sales floor video. That is, when the interaction detection unit 110 receives an image of a person extracted from a sales floor image, the interaction detection unit 110 uses a machine learning model trained to determine whether the person is a customer or a salesclerk. good. The machine learning model may be implemented, for example, as a convolutional neural network, and trained using annotated image data of store employees and annotated image data of customers as training data. In general, a store clerk wears a predetermined uniform, name tag, etc., and the machine learning model is considered to be able to distinguish between a store clerk and a visitor by detecting these. In this way, the interaction detection unit 110 may discriminate between a store clerk and a customer captured based on the sales floor image, and use the discrimination result to estimate behavior.
 また、インタラクション検出部110は、抽出された人物の映像から当該人物の動きを検出してもよい。例えば、インタラクション検出部110は、商品の売場の前を来店客が歩く、商品の売場の前で来店客が立ち止まる、来店客が商品を手に取る、又は、来店客が商品を戻す、などの来店客による商品や売場とのインタラクションに関連する所定の挙動を検出してもよい。 Also, the interaction detection unit 110 may detect the movement of the person from the extracted video of the person. For example, the interaction detection unit 110 detects that the customer walks in front of the product sales floor, the customer stops in front of the product sales floor, the customer picks up the product, or the customer returns the product. Predetermined behaviors related to customer interactions with products and sales floors may be detected.
 例えば、インタラクション検出部110は、公知のトラッキング技術を利用して当該人物の位置をトラッキングすることによって軌跡データを生成し、生成した軌跡データから人物の歩行や滞留を検出してもよい。ここで、軌跡データは、例えば、位置と当該位置に人物がいた時刻とを関連付けたものであってもよい。また、公知のトラッキング技術として、フレームごとに人物のバウンディングボックスを検出し、時間的に近接するフレームの検出結果において、その検出領域に対応する特徴量の差が小さかったり、バウンディングボックスの重なりが大きかったりする検出同士を「同じ人物」とみなして同じIDを振るなどして対応付け、その処理を対象の動画中の全部のフレームに対して適用することで、各人物の動いた軌跡を導出してもよい。具体的には、来店客の軌跡データが解析対象の売場の前を通過したことを示す場合、インタラクション検出部110は、来店客が解析対象の売場の前を歩いたと判定してもよい。また、来店客の軌跡データが解析対象の売場の前を所定の閾値時間以上滞留していたことを示す場合、インタラクション検出部110は、来店客が解析対象の売場の前で立ち止まったと判定してもよい。 For example, the interaction detection unit 110 may generate trajectory data by tracking the position of the person using known tracking technology, and detect walking or staying of the person from the generated trajectory data. Here, the trajectory data may, for example, associate a position with the time when the person was at the position. In addition, as a known tracking technique, the bounding box of a person is detected for each frame, and in the detection results of temporally adjacent frames, if the difference in the feature amount corresponding to the detection area is small or if the overlap of the bounding boxes is large. By assigning the same ID to each detected person and assigning the same ID to each other, and applying this processing to all frames in the target video, the trajectory of each person's movement is derived. may Specifically, when the trajectory data of the customer indicates that the customer has passed in front of the sales floor to be analyzed, the interaction detection unit 110 may determine that the customer has walked in front of the sales floor to be analyzed. Further, when the customer's trajectory data indicates that the customer has stayed in front of the analysis target sales floor for a predetermined threshold time or longer, the interaction detection unit 110 determines that the customer stopped in front of the analysis target sales floor. good too.
 あるいは、軌跡データは、時系列の軌跡データとして構成されてもよく、インタラクション検出部110は、時系列の軌跡データと、当該時系列の軌跡データが売場を素通りしたか、又は、売場の商品とインタラクションをとったかを示すアノテーションとのペアを訓練データとして訓練された何れか適切なニューラルネットワークなどの機械学習モデルを利用して、来店客が解析対象の売場の前で立ち止まったか判定してもよい。 Alternatively, the trajectory data may be configured as time-series trajectory data, and the interaction detection unit 110 detects whether the time-series trajectory data and the time-series trajectory data have passed through the sales floor, or have passed through the sales floor. Any appropriate machine learning model such as a neural network trained with a pair of annotations indicating whether the customer has interacted as training data may be used to determine whether the customer stopped in front of the sales floor to be analyzed. .
 また、インタラクション検出部110は、公知のポーズ推定技術を利用して当該人物の手などの体の部位の動きを検出し、検出結果に基づき当該人物による売場の商品とのインタラクションを検出してもよい。ここで、公知のポーズ推定技術として、OpenPose、AlphaPoseなどが利用されてもよい。具体的には、インタラクション検出部110は、来店客の映像から手の位置を検出し、売場映像における商品の配置領域内に検出した手が所定の閾値時間以上あった場合、来店客が当該商品とインタラクションを持ったと判定してもよい。 In addition, the interaction detection unit 110 detects the movement of body parts such as the hands of the person using a known pose estimation technique, and detects the interaction of the person with the product on the sales floor based on the detection result. good. Here, OpenPose, AlphaPose, or the like may be used as known pose estimation techniques. Specifically, the interaction detection unit 110 detects the position of the customer's hand from the image of the customer, and if the detected hand is in the area where the product is placed in the sales floor image for a predetermined threshold time or longer, the customer detects that the product is in question. It may be determined that the user has interacted with
 あるいは、インタラクション検出部110は、ポーズ推定によって抽出された手の映像と、当該手が商品を取り上げているか、商品を戻しているか、それ以外かを示すアノテーションとのペアを訓練データとして訓練された何れか適切なニューラルネットワークなどの機械学習モデルを利用して、来店客が解析対象の売場で商品を手に取ったか、あるいは、商品を戻したか判定してもよい。 Alternatively, the interaction detection unit 110 is trained using a pair of an image of a hand extracted by pose estimation and an annotation indicating whether the hand is picking up a product, returning the product, or something else as training data. A machine learning model, such as any suitable neural network, may be used to determine whether a customer has picked up or returned an item at the analyzed department.
 また、インタラクション検出部110は、アクション推定器を利用して、抽出された人物のアクションを推定してもよい。具体的には、アクション推定器は、抽出された人物の映像を入力とし、当該人物が所定のアクションの何れを行っているか判定するよう訓練された何れか適切なニューラルネットワークなどの機械学習モデルとして実現されてもよい。当該機械学習モデルは、人物の映像と、当該人物が商品の売場の前を歩く、商品の売場の前で立ち止まる、商品を見る、商品を手に取る、又は商品を戻す、などの商品や売場とのインタラクションに関連する所定の挙動とのペアを訓練データとして訓練されてもよい。 Also, the interaction detection unit 110 may use an action estimator to estimate the action of the extracted person. Specifically, the action estimator takes as input an extracted video of a person, and may be any suitable machine learning model, such as a neural network, trained to determine which of the predetermined actions the person is performing. may be implemented. The machine learning model uses a video of a person and the products and the sales floor, such as when the person walks in front of the product sales floor, stops in front of the product sales floor, looks at the product, picks up the product, or returns the product. may be trained as training data pairs with predetermined behaviors associated with interactions with .
 なお、インタラクション検出部110は、アクション推定器を利用して、来店客だけでなく店員による商品や売場に関する挙動も検出してもよい。この場合、アクション推定器は、上述した来店客による商品との所定のインタラクションだけでなく、店員による商品や売場とのインタラクションを検出するよう訓練された機械学習モデルを利用して、店員の映像から店員による商品や売場とのインタラクションを検出してもよい。例えば、インタラクション検出部110は、店員による配置領域における商品の整頓、補充、入替等の陳列作業、POP広告の提示などの販促作業などのインタラクションを検出するよう訓練されたニューラルネットワークなどの機械学習モデルを利用して、店員の映像から店員による商品とのインタラクションを検出してもよい。 It should be noted that the interaction detection unit 110 may use an action estimator to detect not only the behavior of customers but also the behavior of store employees regarding products and sales floors. In this case, the action estimator uses a machine learning model trained to detect not only the predetermined interactions of shoppers with the products described above, but also the interactions of the store clerk with the product and the sales floor. Interactions by store clerks with products and sales floors may also be detected. For example, the interaction detection unit 110 is a machine learning model such as a neural network trained to detect interactions such as display work such as arranging, replenishing, and replacing products in an arrangement area by a store clerk, and sales promotion work such as presenting POP advertisements. may be used to detect the interaction of the clerk with the product from the image of the clerk.
 このようにして、売場の変化、商品領域及び人物とのその動き等を検出すると、インタラクション検出部110は、これらの検出結果に基づき、さらに来店客や店員による商品とのインタラクションを検出する。具体的には、インタラクション検出部110は、売場の変化の検出結果から売場に変化が生じた位置及び時刻を特定し、商品領域の検出結果から解析対象の売場に陳列されている商品名及び商品数を特定し、人物の検出結果から来店客又は店員による商品とのインタラクションを特定することができる。これにより、インタラクション検出部110は、売場に変化が生じた位置及び時刻の変化前後の売場映像から、商品種別毎の商品数の増減や当該時刻に売場にいた人物及び商品とのインタラクションを特定することができる。 In this way, when changes in the sales floor, product areas, movements with people, etc. are detected, the interaction detection unit 110 further detects interactions with products by customers and store clerks based on these detection results. Specifically, the interaction detection unit 110 identifies the position and time at which the change occurred in the sales floor from the detection result of the change in the sales floor, and from the detection result of the product area, the product name and the product displayed in the analysis target sales floor. It is possible to specify the number of people, and from the results of detecting people, it is possible to specify interactions with products by customers or store clerks. As a result, the interaction detection unit 110 identifies an increase or decrease in the number of products for each product type, and an interaction with a person or product who was in the sales floor at that time, from the sales floor images before and after the change in the position and time of the change in the sales floor. be able to.
 例えば、インタラクション検出部110は、変化前後の売場映像から、来店客が商品Aを2個手に取り、売場に陳列されている商品数が2個減少したことを検出できるかもしれない。あるいは、インタラクション検出部110は、変化前後の売場映像から、店員が商品Bを補充し、売場に陳列されている商品数が増加したことを検出できるかもしれない。また、インタラクション検出部110は、売場に変化が生じていない場合でも、売場映像から来店客が売場の前を通過又は立ち止まったことを検出できるかもしれない。売場の変化、商品領域及び人物とのその動きの検出結果の組み合わせに対するインタラクションを判断するため、インタラクション検出部110は、売場の変化、商品領域及び人物とのその動きの検出結果と、インタラクションとの対応関係を示すテーブルを予め保持してもよい。インタラクション検出部110は、当該テーブルを参照して、ルールベースで売場の変化、商品領域及び人物とのその動きの検出結果の組み合わせに対応するインタラクションを決定してもよい。 For example, the interaction detection unit 110 may be able to detect, from the sales floor video before and after the change, that a customer picked up two products A and the number of products displayed on the sales floor decreased by two. Alternatively, the interaction detection unit 110 may be able to detect that the store clerk replenished product B and the number of products displayed in the sales floor increased from the sales floor images before and after the change. Further, the interaction detection unit 110 may be able to detect from the sales floor image that a customer has passed or stopped in front of the sales floor even when there is no change in the sales floor. In order to determine the interaction for the combination of the detection result of the change in the sales floor, the product area, and the movement with the person, the interaction detection unit 110 detects the change in the sales floor, the product area, and the movement with the person, and the interaction. A table showing the correspondence may be stored in advance. The interaction detection unit 110 may refer to the table and determine an interaction corresponding to a combination of detection results of changes in the sales floor, product areas, and movements of a person on a rule basis.
 すなわち、インタラクション検出部110は、挙動の検出に、売場に基づいて推定された売場の商品の量を利用してもよい。このようにして解析対象の売場における商品とのインタラクションを検出すると、インタラクション検出部110は、インタラクションの検出結果を注目度推定部120にわたす。 That is, the interaction detection unit 110 may use the amount of products in the sales floor estimated based on the sales floor for behavior detection. When the interaction with the product on the sales floor to be analyzed is detected in this way, the interaction detection unit 110 passes the detection result of the interaction to the attention level estimation unit 120 .
 注目度推定部120は、インタラクションの検出結果に基づいて商品の注目度を推定する。具体的には、注目度推定部120は、例えば、所定の期間内において検出されたインタラクションに対して正規化等の統計処理を施し、来店客によるインタラクションの対象となった商品及び/又は売場の注目度を算出する。例えば、注目度推定部120は、来店客が売場に立ち止まったインタラクションの回数を集計し、当該期間における合計来店客数や当該売場の前を通り過ぎた合計来店客数に対して正規化し、当該売場及び/又は陳列されている商品の注目度を決定してもよい。 The attention level estimation unit 120 estimates the attention level of the product based on the interaction detection result. Specifically, for example, the attention level estimation unit 120 performs statistical processing such as normalization on the interactions detected within a predetermined period, and determines the product and/or sales floor that are the target of the interaction by the customer. Calculate attention. For example, the attention level estimation unit 120 counts the number of interactions in which customers stopped at the sales floor, normalizes the total number of visitors to the store during the period and the total number of customers who have passed the sales floor, and determines the sales floor and/or Or you may determine the attention degree of the goods currently displayed.
 また、注目度推定部120は、売場の各配置領域に陳列されている商品を来店客が手に取ったインタラクションの回数を集計し、各配置領域のインタラクションの相対的な回数に基づき、各配置領域に陳列されている商品の注目度を決定してもよい。例えば、インタラクションの相対的な回数が多い配置領域は、陳列されている商品だけでなく当該配置領域に対する注目度も高いと考えられる。このため、推定した注目度を利用して、積極的に販売した商品を注目度が高い配置領域に陳列してもよい。 In addition, the attention degree estimation unit 120 counts the number of interactions in which customers pick up the products displayed in each placement area of the sales floor, and based on the relative number of interactions in each placement area, each placement area is counted. The prominence of items displayed in the area may be determined. For example, it is conceivable that an arrangement area with a relatively large number of interactions attracts a high degree of attention not only to the displayed products but also to the arrangement area. For this reason, the estimated attention level may be used to display actively sold products in an arrangement area with a high attention level.
 また、注目度推定部120は、店員によるインタラクションと注目度の関係を推定してもよい。例えば、店員が商品に対する補充、入替又は整頓等の陳列作業をしたインタラクションに対して、当該インタラクション後の来店客による当該商品に対するインタラクションの検出結果を集計し、集計したインタラクションの検出結果に基づき当該店員によるインタラクションが商品の注目度にどの程度影響があったかを推定してもよい。例えば、注目度推定部120は、店員によるある商品の売場の整頓前後の来店客による当該商品のインタラクションの増減を算出し、来店客によるインタラクションが有意に増加した場合、当該店員による商品の整頓が商品の販売に寄与したと判断してもよい。 Also, the attention level estimation unit 120 may estimate the relationship between the interaction by the clerk and the attention level. For example, in response to an interaction in which a store clerk performs display work such as replenishment, replacement, or tidying up of a product, the detection results of customer interaction with the product after that interaction are aggregated, and the store clerk is based on the aggregated interaction detection result. It may be estimated how much the interaction with the product affected the attention of the product. For example, the attention level estimation unit 120 calculates an increase or decrease in customer interaction with a certain product before and after the store clerk puts the sales floor in order. You may judge that it contributed to sales of goods.
 そして、注目度推定部120は、推定した注目度を店員や担当部署に通知し、以降の陳列戦略や販促戦略に利用してもよい。例えば、注目度推定部120は、ある商品の注目度が所定の閾値以上である場合、当該商品の配置領域を拡大したり、欠品が生じないように当該商品の発注量を増やしたり、当該商品を陳列した店員の評価を上げるよう店員や担当部署に通知してもよい。また、注目度推定部120は、ある売場の注目度が所定の閾値以上である場合、当該売場に販売促進したい商品を陳列したり、当該売場の陳列を行った店員の評価を上げるよう店員や担当部署に通知してもよい。 Then, the attention degree estimation unit 120 may notify the store clerk or the department in charge of the estimated attention degree and use it for subsequent display strategies and sales promotion strategies. For example, when the attention level of a product is equal to or higher than a predetermined threshold, the attention level estimation unit 120 expands the arrangement area of the product, increases the order quantity of the product so as not to cause shortages, or The store clerk or department in charge may be notified to raise the evaluation of the store clerk who displayed the product. In addition, when the attention level of a certain sales floor is equal to or higher than a predetermined threshold, the attention level estimation unit 120 displays a product to be promoted in the sales floor, or instructs the sales clerk to raise the evaluation of the sales clerk who displayed the sales floor. You can notify the department in charge.
 [解析処理]
 次に、図3を参照して、本開示の一実施例による解析処理を説明する。当該解析処理は、上述した解析装置100によって実行され、例えば、解析装置100の1つ以上のメモリに格納されたプログラムを1つ以上のプロセッサが実行することによって実現されうる。図3は、本開示の一実施例による解析処理を示すフローチャートである。
[Analysis processing]
Next, analysis processing according to an embodiment of the present disclosure will be described with reference to FIG. The analysis process is executed by the analysis device 100 described above, and can be realized by one or more processors executing a program stored in one or more memories of the analysis device 100, for example. FIG. 3 is a flowchart illustrating analysis processing according to one embodiment of the present disclosure.
 図3に示されるように、ステップS101において、解析装置100は、売場映像を取得する。具体的には、解析装置100は、売場に設置された撮像装置20から売場映像を取得する。ここで、解析装置100は、取得した売場映像に対してリアルタイムに以降のステップを実行してもよいし、あるいは、取得した売場映像を一時的に保存し、適切なタイミングで保存している売場映像に対して以降のステップを実行してもよい。 As shown in FIG. 3, in step S101, the analysis device 100 acquires a sales floor image. Specifically, the analysis device 100 acquires the sales floor image from the imaging device 20 installed in the sales floor. Here, the analysis apparatus 100 may execute the following steps on the acquired sales floor image in real time, or temporarily store the acquired sales floor image and store the stored sales floor image at an appropriate timing. The following steps may be performed on the video.
 ステップS102において、解析装置100は、売場映像を前処理する。具体的には、解析装置100は、何れか公知の物体検出器を利用して、売場映像における人やショッピングカートなどの動体を検出し、何れか公知の動体除去技術を利用して、検出した動体を売場映像から除去する。また、解析装置100は、動体除去された売場映像に対して商品種別毎に領域分割を実行し、商品種別毎の配置領域を推定する。 In step S102, the analysis device 100 preprocesses the sales floor video. Specifically, analysis device 100 uses any known object detector to detect moving objects such as people and shopping carts in sales floor images, and uses any known moving object removal technology to detect To remove a moving object from a sales floor image. In addition, the analysis device 100 performs region segmentation for each product type on the sales floor video from which the moving object has been removed, and estimates an arrangement region for each product type.
 ステップS103において、解析装置100は、売場や売場の商品に対する店員や来店客による挙動を検出する。具体的には、解析装置100は、動体除去された売場映像のフレーム間の差分を算出し、算出した差分に基づき売場に変化が生じたか判断してもよい。また、解析装置100は、配置領域に陳列されている商品の商品名及び/又は商品の量(商品数)を推定してもよい。また、解析装置100は、動体として検出された人物の映像における人物や動きを認識してもよい。例えば、解析装置100は、売場映像に撮像された来店客と店員とを判別してもよい。また、解析装置100は、人物の映像から、商品の売場の前を来店客が歩く、商品の売場の前で来店客が立ち止まる、来店客が商品を見る、来店客が商品を手に取る、又は、来店客が商品を戻す、などの来店客による商品や売場とのインタラクションに関連する所定の挙動を検出してもよい。また、解析装置100は、店員による商品や売場とのインタラクション(例えば、売場の商品の補充、入替、整頓などの陳列作業、POP広告の提示などの販促作業など)を検出してもよい。これらのインタラクションの検出は、例えば、ニューラルネットワークなどの機械学習モデルに基づき行われてもよく、検出対象の人の挙動の種別毎に機械学習モデルが構成されてもよいし、あるいは、売場映像から所望の種別の挙動を検出するエンド・ツー・エンドの機械学習モデルが構成されてもよい。 In step S103, the analysis device 100 detects the behavior of sales staff and customers with respect to the sales floor and the products on the sales floor. Specifically, the analysis device 100 may calculate the difference between the frames of the sales floor video from which the moving object has been removed, and determine whether a change has occurred in the sales floor based on the calculated difference. Also, the analysis device 100 may estimate the product names and/or the quantity (the number of products) of the products displayed in the arrangement area. Also, the analysis device 100 may recognize a person or movement in an image of a person detected as a moving object. For example, the analysis device 100 may discriminate between a store clerk and a customer captured in the sales floor video. In addition, the analysis device 100 can detect, from the image of the person, that the customer walks in front of the product sales floor, the customer stops in front of the product sales floor, the customer looks at the product, the customer picks up the product, and so on. Alternatively, predetermined behaviors related to customer interaction with the product or the sales floor, such as the customer returning the product, may be detected. The analysis device 100 may also detect interaction by a salesclerk with a product or sales floor (for example, display work such as replenishment, replacement, and arrangement of products in the sales floor, sales promotion work such as presentation of POP advertisements, etc.). Detection of these interactions may be performed based on, for example, a machine learning model such as a neural network, and a machine learning model may be configured for each type of human behavior to be detected. An end-to-end machine learning model may be constructed that detects the desired type of behavior.
 ステップS104において、解析装置100は、挙動の検出結果に基づき売場の注目度を推定してもよい。具体的には、解析装置100は、各種挙動の検出結果に基づき売場の注目度を検出する。例えば、解析装置100は、所定の期間内において検出された挙動に対して統計処理を施し、来店客や店員による挙動の対象となった商品及び/又は売場の注目度を算出してもよい。解析装置100は、推定した注目度を店員や担当部署に通知し、以降の陳列戦略や販促戦略に利用してもよい。 In step S104, the analysis device 100 may estimate the attention level of the sales floor based on the behavior detection results. Specifically, the analysis device 100 detects the attention level of the sales floor based on the detection results of various behaviors. For example, the analysis device 100 may perform statistical processing on behaviors detected within a predetermined period of time, and calculate the degree of attention of the products and/or sales floors targeted by the behaviors of customers and store clerks. The analysis apparatus 100 may notify the store clerk or the department in charge of the estimated degree of attention, and may use it for subsequent display strategies and sales promotion strategies.
 [ハードウェア構成]
 前述した実施形態における解析装置100の一部又は全部は、ハードウェアで構成されていてもよいし、CPU(Central Processing Unit)、又はGPU(Graphics Processing Unit)等が実行するソフトウェア(プログラム)の情報処理で構成されてもよい。ソフトウェアの情報処理で構成される場合には、前述した実施形態における各装置の少なくとも一部の機能を実現するソフトウェアを、フレキシブルディスク、CD-ROM(Compact Disc-Read Only Memory)、又はUSB(Universal Serial Bus)メモリ等の非一時的な記憶媒体(非一時的なコンピュータ可読媒体)に収納し、コンピュータに読み込ませることにより、ソフトウェアの情報処理を実行してもよい。また、通信ネットワークを介して当該ソフトウェアがダウンロードされてもよい。さらに、ソフトウェアがASIC(Application Specific Integrated Circuit)、又はFPGA(Field Programmable Gate Array)等の回路に実装されることにより、情報処理がハードウェアにより実行されてもよい。
[Hardware configuration]
Part or all of the analysis device 100 in the above-described embodiment may be configured by hardware, or information on software (program) executed by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like. processing. In the case of software information processing, software that realizes at least part of the functions of each device in the above-described embodiments can be stored on a flexible disk, CD-ROM (Compact Disc-Read Only Memory), or USB (Universal Serial Bus) memory or other non-temporary storage medium (non-temporary computer-readable medium) and read by a computer to execute software information processing. Alternatively, the software may be downloaded via a communication network. Furthermore, information processing may be executed by hardware by implementing software in a circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
 ソフトウェアを収納する記憶媒体の種類は限定されるものではない。記憶媒体は、磁気ディスク、又は光ディスク等の着脱可能なものに限定されず、ハードディスク、又はメモリ等の固定型の記憶媒体であってもよい。また、記憶媒体は、コンピュータ内部に備えられてもよいし、コンピュータ外部に備えられてもよい。 The type of storage medium that stores the software is not limited. The storage medium is not limited to a detachable one such as a magnetic disk or an optical disk, and may be a fixed storage medium such as a hard disk or memory. Also, the storage medium may be provided inside the computer, or may be provided outside the computer.
 図4は、前述した実施形態における解析装置100のハードウェア構成の一例を示すブロック図である。解析装置100は、一例として、プロセッサ71と、主記憶装置72(メモリ)と、補助記憶装置73(メモリ)と、ネットワークインタフェース74と、デバイスインタフェース75と、を備え、これらがバス76を介して接続されたコンピュータ7として実現されてもよい。 FIG. 4 is a block diagram showing an example of the hardware configuration of the analysis device 100 in the embodiment described above. The analysis device 100 includes, for example, a processor 71 , a main storage device 72 (memory), an auxiliary storage device 73 (memory), a network interface 74 , and a device interface 75 . It may also be implemented as a connected computer 7 .
 図4のコンピュータ7は、各構成要素を一つ備えているが、同じ構成要素を複数備えていてもよい。また、図4では、1台のコンピュータ7が示されているが、ソフトウェアが複数台のコンピュータにインストールされて、当該複数台のコンピュータそれぞれがソフトウェアの同一の又は異なる一部の処理を実行してもよい。この場合、コンピュータそれぞれがネットワークインタフェース74等を介して通信して処理を実行する分散コンピューティングの形態であってもよい。つまり、前述した実施形態における解析装置100は、1又は複数の記憶装置に記憶された命令を1台又は複数台のコンピュータが実行することで機能を実現するシステムとして構成されてもよい。また、端末から送信された情報をクラウド上に設けられた1台又は複数台のコンピュータで処理し、この処理結果を端末に送信するような構成であってもよい。 Although the computer 7 in FIG. 4 has one of each component, it may have a plurality of the same components. In addition, although one computer 7 is shown in FIG. 4, the software is installed in a plurality of computers, and each of the plurality of computers executes the same or different processing of the software. good too. In this case, it may be in the form of distributed computing in which each computer communicates via the network interface 74 or the like to execute processing. In other words, the analysis apparatus 100 in the above-described embodiment may be configured as a system in which functions are realized by one or more computers executing instructions stored in one or more storage devices. Further, the information transmitted from the terminal may be processed by one or more computers provided on the cloud, and the processing result may be transmitted to the terminal.
 前述した実施形態における解析装置100の各種演算は、1又は複数のプロセッサを用いて、又は、ネットワークを介した複数台のコンピュータを用いて、並列処理で実行されてもよい。また、各種演算が、プロセッサ内に複数ある演算コアに振り分けられて、並列処理で実行されてもよい。また、本開示の処理、手段等の一部又は全部は、ネットワークを介してコンピュータ7と通信可能なクラウド上に設けられたプロセッサ及び記憶装置の少なくとも一方により実行されてもよい。このように、前述した実施形態における解析装置100は、1台又は複数台のコンピュータによる並列コンピューティングの形態であってもよい。 Various operations of the analysis device 100 in the above-described embodiment may be executed in parallel using one or more processors or using multiple computers via a network. Also, various operations may be distributed to a plurality of operation cores in the processor and executed in parallel. Also, part or all of the processing, means, etc. of the present disclosure may be executed by at least one of a processor and a storage device provided on a cloud capable of communicating with the computer 7 via a network. Thus, the analysis device 100 in the above-described embodiments may be in the form of parallel computing by one or more computers.
 プロセッサ71は、コンピュータの制御装置及び演算装置を含む電子回路(処理回路、Processing circuit、Processing circuitry、CPU、GPU、FPGA、又はASIC等)であってもよい。また、プロセッサ71は、専用の処理回路を含む半導体装置等であってもよい。プロセッサ71は、電子論理素子を用いた電子回路に限定されるものではなく、光論理素子を用いた光回路により実現されてもよい。また、プロセッサ71は、量子コンピューティングに基づく演算機能を含むものであってもよい。 The processor 71 may be an electronic circuit (processing circuit, processing circuit, CPU, GPU, FPGA, ASIC, etc.) including a computer control device and arithmetic device. Also, the processor 71 may be a semiconductor device or the like including a dedicated processing circuit. The processor 71 is not limited to an electronic circuit using electronic logic elements, and may be realized by an optical circuit using optical logic elements. Also, the processor 71 may include arithmetic functions based on quantum computing.
 プロセッサ71は、コンピュータ7の内部構成の各装置等から入力されたデータやソフトウェア(プログラム)に基づいて演算処理を行い、演算結果や制御信号を各装置等に出力することができる。プロセッサ71は、コンピュータ7のOS(Operating System)や、アプリケーション等を実行することにより、コンピュータ7を構成する各構成要素を制御してもよい。 The processor 71 can perform arithmetic processing based on the data and software (programs) input from each device, etc. of the internal configuration of the computer 7, and output the arithmetic result and control signal to each device, etc. The processor 71 may control each component of the computer 7 by executing the OS (Operating System) of the computer 7, applications, and the like.
 前述した実施形態における解析装置100は、1又は複数のプロセッサ71により実現されてもよい。ここで、プロセッサ71は、1チップ上に配置された1又は複数の電子回路を指してもよいし、2つ以上のチップあるいは2つ以上のデバイス上に配置された1又は複数の電子回路を指してもよい。複数の電子回路を用いる場合、各電子回路は有線又は無線により通信してもよい。 The analysis device 100 in the above-described embodiment may be realized by one or more processors 71. Here, the processor 71 may refer to one or more electronic circuits arranged on one chip, or one or more electronic circuits arranged on two or more chips or two or more devices. You can point When multiple electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.
 主記憶装置72は、プロセッサ71が実行する命令及び各種データ等を記憶する記憶装置であり、主記憶装置72に記憶された情報がプロセッサ71により読み出される。補助記憶装置73は、主記憶装置72以外の記憶装置である。なお、これらの記憶装置は、電子情報を格納可能な任意の電子部品を意味するものとし、半導体のメモリでもよい。半導体のメモリは、揮発性メモリ、不揮発性メモリのいずれでもよい。前述した実施形態における解析装置100において各種データを保存するための記憶装置は、主記憶装置72又は補助記憶装置73により実現されてもよく、プロセッサ71に内蔵される内蔵メモリにより実現されてもよい。例えば、前述した実施形態における記憶部72は、主記憶装置72又は補助記憶装置73により実現されてもよい。 The main storage device 72 is a storage device that stores commands executed by the processor 71 and various types of data. The auxiliary storage device 73 is a storage device other than the main storage device 72 . These storage devices mean any electronic components capable of storing electronic information, and may be semiconductor memories. The semiconductor memory may be either volatile memory or non-volatile memory. A storage device for storing various data in the analysis device 100 in the above-described embodiment may be implemented by the main storage device 72 or the auxiliary storage device 73, or may be implemented by a built-in memory built into the processor 71. . For example, the storage unit 72 in the above-described embodiment may be realized by the main storage device 72 or the auxiliary storage device 73.
 記憶装置(メモリ)1つに対して、複数のプロセッサが接続(結合)されてもよいし、単数のプロセッサが接続されてもよい。プロセッサ1つに対して、複数の記憶装置(メモリ)が接続(結合)されてもよい。前述した実施形態における解析装置100が、少なくとも1つの記憶装置(メモリ)とこの少なくとも1つの記憶装置(メモリ)に接続(結合)される複数のプロセッサで構成される場合、複数のプロセッサのうち少なくとも1つのプロセッサが、少なくとも1つの記憶装置(メモリ)に接続(結合)される構成を含んでもよい。また、複数台のコンピュータに含まれる記憶装置(メモリ)とプロセッサによって、この構成が実現されてもよい。さらに、記憶装置(メモリ)がプロセッサと一体になっている構成(例えば、L1キャッシュ、L2キャッシュを含むキャッシュメモリ)を含んでもよい。 A plurality of processors may be connected (coupled) to one storage device (memory), or a single processor may be connected. A plurality of storage devices (memories) may be connected (coupled) to one processor. When the analysis device 100 in the above-described embodiment is composed of at least one storage device (memory) and a plurality of processors connected (coupled) to this at least one storage device (memory), at least One processor may include a configuration that is connected (coupled) to at least one storage device (memory). Also, this configuration may be realized by storage devices (memory) and processors included in a plurality of computers. Furthermore, a configuration in which a storage device (memory) is integrated with a processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.
 ネットワークインタフェース74は、無線又は有線により、通信ネットワーク8に接続するためのインタフェースである。ネットワークインタフェース74は、既存の通信規格に適合したもの等、適切なインタフェースを用いればよい。ネットワークインタフェース74により、通信ネットワーク8を介して接続された外部装置9Aと情報のやり取りが行われてもよい。なお、通信ネットワーク8は、WAN(Wide Area Network)、LAN(Local Area Network)、PAN(Personal Area Network)等の何れか、又は、それらの組み合わせであってよく、コンピュータ7と外部装置9Aとの間で情報のやり取りが行われるものであればよい。WANの一例としてインターネット等があり、LANの一例としてIEEE802.11やイーサネット(登録商標)等があり、PANの一例としてBluetooth(登録商標)やNFC(Near Field Communication)等がある。 The network interface 74 is an interface for connecting to the communication network 8 wirelessly or by wire. As for the network interface 74, an appropriate interface such as one conforming to existing communication standards may be used. The network interface 74 may exchange information with the external device 9A connected via the communication network 8 . Note that the communication network 8 may be any one of WAN (Wide Area Network), LAN (Local Area Network), PAN (Personal Area Network), etc., or a combination of them. It is sufficient if information can be exchanged between them. Examples of WANs include the Internet, examples of LANs include IEEE 802.11 and Ethernet (registered trademark), and examples of PANs include Bluetooth (registered trademark) and NFC (Near Field Communication).
 デバイスインタフェース75は、外部装置9Bと直接接続するUSB等のインタフェースである。 The device interface 75 is an interface such as USB that directly connects with the external device 9B.
 外部装置9Aはコンピュータ7とネットワークを介して接続されている装置である。外部装置9Bはコンピュータ7と直接接続されている装置である。 The external device 9A is a device connected to the computer 7 via a network. The external device 9B is a device that is directly connected to the computer 7. FIG.
 外部装置9A又は外部装置9Bは、一例として、入力装置であってもよい。入力装置は、例えば、カメラ、マイクロフォン、モーションキャプチャ、各種センサ、キーボード、マウス、又はタッチパネル等のデバイスであり、取得した情報をコンピュータ7に与える。また、パーソナルコンピュータ、タブレット端末、又はスマートフォン等の入力部とメモリとプロセッサを備えるデバイスであってもよい。 For example, the external device 9A or the external device 9B may be an input device. The input device is, for example, a device such as a camera, microphone, motion capture, various sensors, keyboard, mouse, or touch panel, and provides the computer 7 with acquired information. Alternatively, a device such as a personal computer, a tablet terminal, or a smartphone including an input unit, a memory, and a processor may be used.
 また、外部装置9A又は外部装置9Bは、一例として、出力装置でもよい。出力装置は、例えば、LCD(Liquid Crystal Display)、CRT(Cathode Ray Tube)、PDP(Plasma Display Panel)、又は有機EL(Electro Luminescence)パネル等の表示装置であってもよいし、音声等を出力するスピーカ等であってもよい。また、パーソナルコンピュータ、タブレット端末、又はスマートフォン等の出力部とメモリとプロセッサを備えるデバイスであってもよい。 Also, the external device 9A or the external device 9B may be an output device as an example. The output device may be, for example, a display device such as LCD (Liquid Crystal Display), CRT (Cathode Ray Tube), PDP (Plasma Display Panel), or organic EL (Electro Luminescence) panel, and output audio etc. It may be a speaker or the like that Alternatively, a device such as a personal computer, a tablet terminal, or a smartphone including an output unit, a memory, and a processor may be used.
 また、外部装置9Aまた外部装置9Bは、記憶装置(メモリ)であってもよい。例えば、外部装置9Aはネットワークストレージ等であってもよく、外部装置9BはHDD等のストレージであってもよい。 Also, the external device 9A or the external device 9B may be a storage device (memory). For example, the external device 9A may be a network storage or the like, and the external device 9B may be a storage such as an HDD.
 また、外部装置9A又は外部装置9Bは、前述した実施形態における各装置(サーバ100又は端末200)の構成要素の一部の機能を有する装置でもよい。つまり、コンピュータ7は、外部装置9A又は外部装置9Bの処理結果の一部又は全部を送信又は受信してもよい。 Also, the external device 9A or the external device 9B may be a device having the functions of some of the components of each device (server 100 or terminal 200) in the above-described embodiments. That is, the computer 7 may transmit or receive part or all of the processing results of the external device 9A or the external device 9B.
 本明細書(請求項を含む)において、「a、b及びcの少なくとも1つ(一方)」又は「a、b又はcの少なくとも1つ(一方)」の表現(同様な表現を含む)が用いられる場合は、a、b、c、a-b、a-c、b-c、又はa-b-cのいずれかを含む。また、a-a、a-b-b、a-a-b-b-c-c等のように、いずれかの要素について複数のインスタンスを含んでもよい。さらに、a-b-c-dのようにdを有する等、列挙された要素(a、b及びc)以外の他の要素を加えることも含む。 In the present specification (including claims), the expression "at least one (one) of a, b and c" or "at least one (one) of a, b or c" (including similar expressions) Where used, includes any of a, b, c, a-b, ac, b-c, or a-b-c. Also, multiple instances of any element may be included, such as a-a, a-b-b, a-a-b-b-c-c, and so on. It also includes the addition of other elements than the listed elements (a, b and c), such as having d such as a-b-c-d.
 本明細書(請求項を含む)において、「データを入力として/データに基づいて/に従って/に応じて」等の表現(同様な表現を含む)が用いられる場合は、特に断りがない場合、各種データそのものを入力として用いる場合や、各種データに何らかの処理を行ったもの(例えば、ノイズ加算したもの、正規化したもの、各種データの中間表現等)を入力として用いる場合を含む。また「データに基づいて/に従って/に応じて」何らかの結果が得られる旨が記載されている場合、当該データのみに基づいて当該結果が得られる場合を含むとともに、当該データ以外の他のデータ、要因、条件、及び/又は状態等にも影響を受けて当該結果が得られる場合をも含み得る。また、「データを出力する」旨が記載されている場合、特に断りがない場合、各種データそのものを出力として用いる場合や、各種データに何らかの処理を行ったもの(例えば、ノイズ加算したもの、正規化したもの、各種データの中間表現等)を出力とする場合も含む。 In this specification (including claims), when expressions such as "data as input / based on data / according to / according to" (including similar expressions) are used, unless otherwise specified, It includes the case where various data itself is used as an input, and the case where various data subjected to some processing (for example, noise added, normalized, intermediate representation of various data, etc.) is used as an input. In addition, if it is stated that some result can be obtained "based on/according to/depending on the data", this includes cases where the result is obtained based only on the data, other data other than the data, It may also include cases where the result is obtained under the influence of factors, conditions, and/or states. In addition, if it is stated that "data will be output", unless otherwise specified, if the various data themselves are used as output, or if the various data have undergone some processing (for example, noise addition, normalization, etc.) This also includes the case where the output is a converted version, an intermediate representation of various data, etc.).
 本明細書(請求項を含む)において、「接続される(connected)」及び「結合される(coupled)」との用語が用いられる場合は、直接的な接続/結合、間接的な接続/結合、電気的(electrically)な接続/結合、通信的(communicatively)な接続/結合、機能的(operatively)な接続/結合、物理的(physically)な接続/結合等のいずれをも含む非限定的な用語として意図される。当該用語は、当該用語が用いられた文脈に応じて適宜解釈されるべきであるが、意図的に或いは当然に排除されるのではない接続/結合形態は、当該用語に含まれるものして非限定的に解釈されるべきである。 In this specification (including the claims), when the terms "connected" and "coupled" are used, they refer to direct connection/coupling, indirect connection/coupling , electrically connected/coupled, communicatively connected/coupled, operatively connected/coupled, physically connected/coupled, etc. intended as a term. The term should be interpreted appropriately according to the context in which the term is used, but any form of connection/bonding that is not intentionally or naturally excluded is not included in the term. should be interpreted restrictively.
 本明細書(請求項を含む)において、「AがBするよう構成される(A configured to B)」との表現が用いられる場合は、要素Aの物理的構造が、動作Bを実行可能な構成を有するとともに、要素Aの恒常的(permanent)又は一時的(temporary)な設定(setting/configuration)が、動作Bを実際に実行するように設定(configured/set)されていることを含んでよい。例えば、要素Aが汎用プロセッサである場合、当該プロセッサが動作Bを実行可能なハードウェア構成を有するとともに、恒常的(permanent)又は一時的(temporary)なプログラム(命令)の設定により、動作Bを実際に実行するように設定(configured)されていればよい。また、要素Aが専用プロセッサ又は専用演算回路等である場合、制御用命令及びデータが実際に付属しているか否かとは無関係に、当該プロセッサの回路的構造が動作Bを実際に実行するように構築(implemented)されていればよい。 In this specification (including claims), when the phrase "A configured to B" is used, the physical structure of element A is such that it is capable of performing operation B configuration, including that a permanent or temporary setting/configuration of element A is configured/set to actually perform action B good. For example, when element A is a general-purpose processor, the processor has a hardware configuration capable of executing operation B, and operation B is performed by setting a permanent or temporary program (instruction). It just needs to be configured to actually run. In addition, when the element A is a dedicated processor or a dedicated arithmetic circuit, etc., regardless of whether or not control instructions and data are actually attached, the circuit structure of the processor actually executes the operation B. It just needs to be implemented.
 本明細書(請求項を含む)において、含有又は所有を意味する用語(例えば、「含む(comprising/including)」及び有する「(having)等)」が用いられる場合は、当該用語の目的語により示される対象物以外の物を含有又は所有する場合を含む、open-endedな用語として意図される。これらの含有又は所有を意味する用語の目的語が数量を指定しない又は単数を示唆する表現(a又はanを冠詞とする表現)である場合は、当該表現は特定の数に限定されないものとして解釈されるべきである。 In this specification (including the claims), when terms denoting containing or possessing (e.g., "comprising/including" and "having, etc.") are used, by the object of the terms It is intended as an open-ended term, including the case of containing or possessing things other than the indicated object. When the object of these terms of inclusion or possession is an expression that does not specify a quantity or implies a singular number (an expression with the article a or an), the expression shall be construed as not being limited to a specific number. It should be.
 本明細書(請求項を含む)において、ある箇所において「1つ又は複数(one or more)」又は「少なくとも1つ(at least one)」等の表現が用いられ、他の箇所において数量を指定しない又は単数を示唆する表現(a又はanを冠詞とする表現)が用いられているとしても、後者の表現が「1つ」を意味することを意図しない。一般に、数量を指定しない又は単数を示唆する表現(a又はanを冠詞とする表現)は、必ずしも特定の数に限定されないものとして解釈されるべきである。 In the specification (including the claims), expressions such as "one or more" or "at least one" are used in some places and quantities are specified in other places. Where no or suggestive of the singular (a or an as an article) is used, the latter is not intended to mean "one." In general, expressions that do not specify a quantity or imply a singular number (indicative of the articles a or an) should be construed as not necessarily being limited to a particular number.
 本明細書において、ある実施例の有する特定の構成について特定の効果(advantage/result)が得られる旨が記載されている場合、別段の理由がない限り、当該構成を有する他の1つ又は複数の実施例についても当該効果が得られると理解されるべきである。但し当該効果の有無は、一般に種々の要因、条件、及び/又は状態等に依存し、当該構成により必ず当該効果が得られるものではないと理解されるべきである。当該効果は、種々の要因、条件、及び/又は状態等が満たされたときに実施例に記載の当該構成により得られるものに過ぎず、当該構成又は類似の構成を規定したクレームに係る発明において、当該効果が必ずしも得られるものではない。 In this specification, when it is stated that a particular configuration of an embodiment has a particular effect (advantage/result), unless there is a specific reason otherwise, one or more other having that configuration It should be understood that this effect can be obtained also for the embodiment of However, it should be understood that the presence or absence of the effect generally depends on various factors, conditions, and/or states, and that the configuration does not always provide the effect. The effect is only obtained by the configuration described in the embodiment when various factors, conditions, and/or states are satisfied, and in the claimed invention defining the configuration or a similar configuration , the effect is not necessarily obtained.
 本明細書(請求項を含む)において、「最大化(maximize)」等の用語が用いられる場合は、グローバルな最大値を求めること、グローバルな最大値の近似値を求めること、ローカルな最大値を求めること、及びローカルな最大値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最大値の近似値を確率的又はヒューリスティックに求めることを含む。同様に、「最小化(minimize)」等の用語が用いられる場合は、グローバルな最小値を求めること、グローバルな最小値の近似値を求めること、ローカルな最小値を求めること、及びローカルな最小値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最小値の近似値を確率的又はヒューリスティックに求めることを含む。同様に、「最適化(optimize)」等の用語が用いられる場合は、グローバルな最適値を求めること、グローバルな最適値の近似値を求めること、ローカルな最適値を求めること、及びローカルな最適値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最適値の近似値を確率的又はヒューリスティックに求めることを含む。 In this specification (including claims), when terms such as "maximize" are used, finding a global maximum, finding an approximation of a global maximum, finding a local maximum and approximating the local maximum, should be interpreted appropriately depending on the context in which the term is used. It also includes probabilistically or heuristically approximating these maximum values. Similarly, when terms such as "minimize" are used, finding a global minimum, finding an approximation of a global minimum, finding a local minimum, and finding a local minimum It includes approximations of values and should be interpreted accordingly depending on the context in which the term is used. It also includes stochastically or heuristically approximating these minimum values. Similarly, when terms such as "optimize" are used, finding a global optimum, finding an approximation of a global optimum, finding a local optimum, and finding a local optimum It includes approximations of values and should be interpreted accordingly depending on the context in which the term is used. It also includes stochastically or heuristically approximating these optimum values.
 本明細書(請求項を含む)において、複数のハードウェアが所定の処理を行う場合、各ハードウェアが協働して所定の処理を行ってもよいし、一部のハードウェアが所定の処理の全てを行ってもよい。また、一部のハードウェアが所定の処理の一部を行い、別のハードウェアが所定の処理の残りを行ってもよい。本明細書(請求項を含む)において、「1又は複数のハードウェアが第1の処理を行い、前記1又は複数のハードウェアが第2の処理を行う」等の表現が用いられている場合、第1の処理を行うハードウェアと第2の処理を行うハードウェアは同じものであってもよいし、異なるものであってもよい。つまり、第1の処理を行うハードウェア及び第2の処理を行うハードウェアが、前記1又は複数のハードウェアに含まれていればよい。なお、ハードウェアは、電子回路、又は電子回路を含む装置等を含んでよい。 In this specification (including claims), when a plurality of pieces of hardware perform predetermined processing, each piece of hardware may work together to perform the predetermined processing, or a part of the hardware may perform the predetermined processing. You may do all of Also, some hardware may perform a part of the predetermined processing, and another hardware may perform the rest of the predetermined processing. In the present specification (including claims), when expressions such as "one or more hardware performs a first process and the one or more hardware performs a second process" are used , the hardware that performs the first process and the hardware that performs the second process may be the same or different. In other words, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more pieces of hardware. Note that hardware may include an electronic circuit or a device including an electronic circuit.
 本明細書(請求項を含む)において、複数の記憶装置(メモリ)がデータの記憶を行う場合、複数の記憶装置(メモリ)のうち個々の記憶装置(メモリ)は、データの一部のみを記憶してもよいし、データの全体を記憶してもよい。 In this specification (including claims), when a plurality of storage devices (memories) store data, each storage device (memory) among the plurality of storage devices (memories) stores only part of the data. may be stored, or the entirety of the data may be stored.
 以上、本開示の実施形態について詳述したが、本開示は上記した個々の実施形態に限定されるものではない。特許請求の範囲に規定された内容及びその均等物から導き出される本発明の概念的な思想と趣旨を逸脱しない範囲において種々の追加、変更、置き換え及び部分的削除等が可能である。例えば、前述した全ての実施形態において、数値又は数式を説明に用いている場合は、一例として示したものであり、これらに限られるものではない。また、実施形態における各動作の順序は、一例として示したものであり、これらに限られるものではない。 Although the embodiments of the present disclosure have been described in detail above, the present disclosure is not limited to the individual embodiments described above. Various additions, changes, replacements, partial deletions, etc. are possible without departing from the conceptual idea and spirit of the present invention derived from the content defined in the claims and equivalents thereof. For example, in all the embodiments described above, when numerical values or formulas are used for explanation, they are shown as an example and are not limited to these. Also, the order of each operation in the embodiment is shown as an example, and is not limited to these.
 本出願は、2021年2月17日に出願された日本国特許出願第2021-023665号に基づきその優先権を主張するものであり、同日本国特許出願の全内容を参照することにより本願に援用する。 This application claims priority based on Japanese Patent Application No. 2021-023665 filed on February 17, 2021. By referring to the entire contents of the Japanese Patent Application, invoke.
 10 解析システム
 20 撮像装置
 30 ユーザ端末
 100 解析装置
 110 インタラクション検出部
 120 注目度推定部
REFERENCE SIGNS LIST 10 analysis system 20 imaging device 30 user terminal 100 analysis device 110 interaction detection unit 120 attention level estimation unit

Claims (12)

  1.  1つ以上のメモリと、
     1つ以上のプロセッサと、
    を有し、
     前記1つ以上のプロセッサは、
     売場に関する人の挙動の検出結果に基づき前記売場の注目度を推定する、
    解析装置。
    one or more memories;
    one or more processors;
    has
    The one or more processors
    estimating the degree of attention of the sales floor based on detection results of human behavior related to the sales floor;
    analysis equipment.
  2.  前記1つ以上のプロセッサは、売場映像に基づいて前記売場に関する人の挙動を検出する、請求項1記載の解析装置。 The analysis device according to claim 1, wherein the one or more processors detect behavior of a person regarding the sales floor based on sales floor images.
  3.  前記1つ以上のプロセッサは、前記挙動の検出に、前記売場映像に基づいて検出された売場の変化を利用する、請求項2記載の解析装置。 The analysis device according to claim 2, wherein the one or more processors use changes in the sales floor detected based on the sales floor video for detecting the behavior.
  4.  前記1つ以上のプロセッサは、前記挙動の検出に、前記売場映像に基づいて推定された売場の商品の量を利用する、請求項2又は3記載の解析装置。 4. The analysis device according to claim 2 or 3, wherein said one or more processors use the quantity of products on the sales floor estimated based on said sales floor image for said behavior detection.
  5.  前記1つ以上のプロセッサは、前記売場映像に基づいて撮像された来店客と店員とを判別し、判別結果を前記挙動の検出に利用する、請求項2乃至4何れか一項記載の解析装置。 5. The analysis device according to claim 2, wherein said one or more processors distinguish between customers and store clerks imaged based on said sales floor image, and use the discrimination result to detect said behavior. .
  6.  前記挙動の検出結果は、少なくとも、商品の売場の前を来店客が歩く、前記商品の売場の前で来店客が立ち止まる、来店客が前記商品を見る、来店客が前記商品を手に取る、又は、来店客が前記商品を戻す、のいずれか1つを含む、請求項2乃至5何れか一項記載の解析装置。 The behavior detection results include at least the customer walking in front of the product sales floor, the customer stopping in front of the product sales floor, the customer viewing the product, and the customer picking up the product. 6. The analysis apparatus according to any one of claims 2 to 5, comprising any one of: or the customer returns the product.
  7.  前記1つ以上のプロセッサは、前記挙動の検出結果に基づいて、来店客の挙動が検出された回数を売場ごとに集計することで、前記売場の注目度を推定する、請求項5記載の解析装置。 6. The analysis according to claim 5, wherein the one or more processors estimate the degree of attention of the sales floor by totaling the number of times the behavior of the store visitor is detected for each sales floor, based on the behavior detection result. Device.
  8.  前記1つ以上のプロセッサは、前記挙動の検出結果に基づいて、前記売場における店員の挙動による前記売場の注目度への影響を推定する、請求項7記載の解析装置。 8. The analysis device according to claim 7, wherein the one or more processors estimate, based on the detection result of the behavior, the influence of the behavior of the clerk in the sales floor on the attention level of the sales floor.
  9.  前記1つ以上のプロセッサは、前記挙動の検出をニューラルネットワークを用いて行う、請求項2乃至8何れか一項記載の解析装置。 The analysis device according to any one of claims 2 to 8, wherein said one or more processors detect said behavior using a neural network.
  10.  請求項2乃至9何れか一項記載の解析装置と、
     前記売場映像を取得する1つ以上の撮像装置と、
    を有する解析システム。
    an analysis device according to any one of claims 2 to 9;
    one or more imaging devices that acquire the sales floor video;
    analysis system.
  11.  1つ以上のプロセッサが、売場に関する人の挙動の検出結果に基づき前記売場の注目度を推定する、
    ことを有する解析方法。
    one or more processors estimate the attention level of the sales floor based on detection results of human behavior related to the sales floor;
    A method of analysis that has
  12.  1つ以上のプロセッサに、売場に関する人の挙動の検出結果に基づき前記売場の注目度を推定するステップを少なくとも実行させるプログラム。 A program that causes one or more processors to execute at least the step of estimating the attention level of the sales floor based on the detection result of human behavior on the sales floor.
PCT/JP2022/005374 2021-02-17 2022-02-10 Analysis device, analysis system, analysis method, and program WO2022176774A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-023665 2021-02-17
JP2021023665 2021-02-17

Publications (1)

Publication Number Publication Date
WO2022176774A1 true WO2022176774A1 (en) 2022-08-25

Family

ID=82931643

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/005374 WO2022176774A1 (en) 2021-02-17 2022-02-10 Analysis device, analysis system, analysis method, and program

Country Status (1)

Country Link
WO (1) WO2022176774A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1048008A (en) * 1996-08-02 1998-02-20 Omron Corp Attention information measuring method, instrument for the method and various system using the instrument
JP2012088878A (en) * 2010-10-19 2012-05-10 Jvc Kenwood Corp Customer special treatment management system
JP2014232362A (en) * 2013-05-28 2014-12-11 Kddi株式会社 System for analyzing and predicting moving object action
JP2018151963A (en) * 2017-03-14 2018-09-27 オムロン株式会社 Personal trend recording apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1048008A (en) * 1996-08-02 1998-02-20 Omron Corp Attention information measuring method, instrument for the method and various system using the instrument
JP2012088878A (en) * 2010-10-19 2012-05-10 Jvc Kenwood Corp Customer special treatment management system
JP2014232362A (en) * 2013-05-28 2014-12-11 Kddi株式会社 System for analyzing and predicting moving object action
JP2018151963A (en) * 2017-03-14 2018-09-27 オムロン株式会社 Personal trend recording apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Human behavior analysis service", C & C USER FORUM & IEXPO 2014; NOVEMBER 20TH - 21ST, 2014, NUA (NEC C&C SYSTEM USERS GROUP), NEC, JP, 1 November 2014 (2014-11-01) - 21 November 2014 (2014-11-21), JP, pages 1 - 2, XP009539133 *

Similar Documents

Publication Publication Date Title
US20200265494A1 (en) Remote sku on-boarding of products for subsequent video identification and sale
JP4972491B2 (en) Customer movement judgment system
JP6529078B2 (en) Customer behavior analysis system, customer behavior analysis method, customer behavior analysis program and shelf system
US11521248B2 (en) Method and system for tracking objects in an automated-checkout store based on distributed computing
JP5632512B1 (en) Human behavior analysis device, human behavior analysis system, human behavior analysis method, and monitoring device
CN106776619A (en) Method and apparatus for determining the attribute information of destination object
US9299229B2 (en) Detecting primitive events at checkout
US11301684B1 (en) Vision-based event detection
US20180293598A1 (en) Personal behavior analysis device, personal behavior analysis system, and personal behavior analysis method
WO2019005136A1 (en) Automated delivery of temporally limited targeted offers
JP2013144001A (en) Article display shelf, method for investigating action of person, and program for investigating action of person
US20240119500A1 (en) Optimization of Product Presentation
Falcão et al. Faim: Vision and weight sensing fusion framework for autonomous inventory monitoring in convenience stores
WO2022176774A1 (en) Analysis device, analysis system, analysis method, and program
US11615430B1 (en) Method and system for measuring in-store location effectiveness based on shopper response and behavior analysis
WO2022176776A1 (en) Analysis device, analysis system, analysis method, and program
WO2023152893A1 (en) Management device, management system, management method, and program
TWI652638B (en) Smart marketing system and method thereof
WO2021214986A1 (en) Processing device, processing method, and program
JP2021105945A (en) Processor, processing method, and program
US20230112584A1 (en) Multi-camera person re-identification
US11393122B1 (en) Method and system for determining contextual object position
EP4160533A1 (en) Estimation program, estimation method, and estimation device
Nagnath et al. Realtime Customer Merchandise Engagement Detection and Customer Attribute Estimation with Edge Device
EP4231252A1 (en) Information processing program, information processing method, and information processing apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22756094

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22756094

Country of ref document: EP

Kind code of ref document: A1