WO2022176774A1

WO2022176774A1 - Analysis device, analysis system, analysis method, and program

Info

Publication number: WO2022176774A1
Application number: PCT/JP2022/005374
Authority: WO
Inventors: 叡一松元; 俊太齋藤; 大輔西野; 良博山田; 義文丸山; 優一野々目
Original assignee: 株式会社Preferred Networks; 株式会社イトーヨーカ堂
Priority date: 2021-02-17
Filing date: 2022-02-10
Publication date: 2022-08-25

Abstract

Provided is a novel technology for analyzing the level of interest in a product. One embodiment of the present invention relates to an analysis device having one or more memories and one or more processors, wherein the one or more processors estimate the level of interest in a sales area on the basis of detection results regarding human behavior related to the sales area.

Description

Analysis device, analysis system, analysis method and program

The present disclosure relates to analysis devices, analysis systems, analysis methods, and programs.

The use of information technology is progressing in the retail industry such as supermarkets and convenience stores. For example, information technology has been utilized in the display of merchandise in stores.

JP 2020-71874 A

　The problem of the present disclosure is to provide a novel technique for analyzing the degree of attention of a product.

In order to solve the above problems, one aspect of the present disclosure includes one or more memories and one or more processors, and the one or more processors detect human behavior on the sales floor. The present invention relates to an analysis device for estimating the attention level of the sales floor.

1 is a schematic diagram illustrating an analysis system according to one embodiment of the present disclosure; FIG. FIG. 2 is a block diagram showing the functional configuration of the analysis device according to one embodiment of the present disclosure. FIG. 3 is a flowchart illustrating analysis processing according to one embodiment of the present disclosure. FIG. 4 is a block diagram showing the hardware configuration of the analysis device according to one embodiment of the present disclosure.

Embodiments of the present disclosure will be described below based on the drawings.

In the embodiment below, an analysis system is disclosed that captures images of a store's sales floor and uses a machine learning model to estimate the attention level of the sales floor based on the sales floor image.

[Analysis system]
First, an analysis system according to an embodiment of the present disclosure will be described with reference to FIG. 1 is a schematic diagram illustrating an analysis system according to one embodiment of the present disclosure; FIG.

As shown in FIG. 1, the analysis system 10 of the present embodiment includes, for example, an imaging device 20, a user terminal 30, and an analysis device 100, and when acquiring a sales floor image from the imaging device 20, the analysis device 100 acquires The video of the sales floor is analyzed, and the user terminal 30 is notified of the attention level of the sales floor and the products displayed in the sales floor. Note that the attention level of a product is an example of the attention level of a sales floor. Also, the degree of attention refers to the degree to which a sales floor or product is attractive.

The imaging device 20 may be, for example, a video camera installed in a store or the like, captures an image of the sales floor to be imaged, and transmits the sales floor image to the analysis device 100 . Typically, the imaging device 20 is installed near a sales floor to be imaged and used to observe the sales floor. The imaging device 20 may be fixed at a fixed location in the store, or may be a movable device provided on a robot or cart. Accordingly, various information can be obtained. Also, it is possible to reduce the number of imaging devices 20 to be installed. Also, a plurality of imaging devices 20 may be provided. As a result, it is possible to obtain an appropriate sales floor image even when there is a blind spot or the like.

The user terminal 30 may be, for example, an information processing device such as a personal computer, a tablet, or a smart phone provided in a store, etc., and an analysis device that analyzes information on the products on the sales floor and the attention level of the sales floor estimated based on the sales floor image. 100 or the analysis result of the analysis device 100 is acquired from a server or the like that has been saved. For example, the user terminal 30 may include software related to store management and business improvement, such as various software for supporting determination of optimal product placement and sales clerk business evaluation. You may provide the software which can browse an analysis result. The store clerk or the like uses the data obtained from the analysis device 100 such as the degree of attention of products, the layout of products with high attention, and data related to POP advertisements. A business evaluation may be performed for a store clerk or the like who made a decision or placed a product that achieved a high degree of attention.

The analysis device 100 may be, for example, a personal computer provided in a store, or an information processing device such as a server provided in a location different from the store, such as a headquarters that manages the store or on the cloud. From the acquired sales floor image, the degree of attention is estimated for each sales floor and each product type displayed in the sales floor. Note that the analysis apparatus 100 may acquire the sales floor image acquired from the imaging device 20, or may acquire data obtained by subjecting the sales floor image to predetermined processing. In such a case, the sales floor image acquired by the imaging device 20 is output to a predetermined processing device, and the data processed by the processing device is output to the analysis device 100 . This makes it possible to facilitate the transmission of information on the sales floor image via the network and the subsequent processing in the analysis device 100 . When a plurality of imaging devices 20 are installed, one processing device may be provided for the plurality of imaging devices 20 .

The analysis device 100 according to the present embodiment uses a machine learning model such as a neural network to detect the behavior of a store clerk, a visitor, or the like related to the sales floor based on the sales floor video, and from the behavior detection result, the product and / or the sales floor can be estimated. The estimated degree of attention can be used for subsequent sales promotion activities, or can be used for store management work and business improvement, such as the business evaluation of the store clerk who performed the display work.

For example, the analysis apparatus 100 can determine how many customers walked in front of the sales floor, how many customers stopped in front of the sales floor, how many customers stopped in front of the sales floor, and how many customers displayed products displayed in the sales floor. Interactions with products and/or the sales floor by customers, such as whether the customers picked up the products and how many customers returned the products they picked up to the sales floor; Reactions are detected, and attention levels of products and/or sales floors are estimated based on the detection results. Here, the analysis apparatus 100 may perform real-time processing or batch processing of the sales floor images acquired from the imaging device 20 .

According to this disclosure, it is possible to estimate the degree of attention of the sales floor and products based on the sales floor video, and determine appropriate product placement based on the estimated degree of attention. In addition, it is possible to estimate the effect of product display and POP advertisements by store clerks and the like based on the estimated degree of attention.

[analysis device]
Next, with reference to FIG. 2, the analysis device 100 according to one embodiment of the present disclosure will be described. FIG. 2 is a block diagram showing the functional configuration of the analysis device 100 according to one embodiment of the present disclosure.

As shown in FIG. 2, the analysis device 100 of this embodiment has an interaction detection unit 110 and an attention level estimation unit 120. FIG. The interaction detection unit 110 and the attention level estimation unit 120 are installed in the analysis device 100 and implemented by one or more processors executing one or more programs stored in one or more memories.

The interaction detection unit 110 detects human behavior regarding the sales floor based on the sales floor video. Specifically, when the sales floor image is acquired from the imaging device 20, the interaction detection unit 110 removes moving objects such as people and shopping carts from the sales floor image as preprocessing. Then, changes in the sales floor are detected from the pre-processed sales floor image, and interaction with the sales floor and products by the extracted person is detected. In the following description, the interaction detection unit 110 detects the behavior of a person on the sales floor based on the sales floor image. good too. For example, the interaction detection unit 110 may detect the interaction of the store clerk with the sales floor and products by tracking the computer terminal that the store clerk carries around in the store, or may be attached to a device such as a shopping cart that a customer uses in the store. Customer interaction with the sales floor and merchandise may be detected by tracking the computer terminals that are being used.

The interaction detection unit 110 detects human behavior related to the sales floor, such as interactions with products on the sales floor. Here, the interaction detection unit 110 may use a change in the sales floor detected based on the sales floor video for behavior detection. For example, as an example of preprocessing, the interaction detection unit 110 uses a known object detector such as Mask-RCNN (Regional Convolutional Neural Network) to detect moving objects such as people and shopping carts in the sales floor video. When a moving object is detected in the sales floor image, the interaction detection unit 110 removes the detected moving object from the sales floor image using a known moving object removal technique, and combines the sales floor image from which the moving object has been removed and the extracted moving object image. derive

Then, the interaction detection unit 110 calculates the difference between the frames of the preprocessed sales floor video, and determines whether the sales floor has changed based on the calculated difference. For example, the interaction detection unit 110 intermittently extracts frames from the sales floor video at predetermined time intervals, and calculates the difference between the extracted adjacent frames. Specifically, the interaction detection unit 110 may use the difference between the image data of the adjacent frames as the difference between the adjacent frames. In addition, the interaction detection unit 110 uses any appropriate machine learning model such as a convolutional neural network, inputs adjacent frames to the machine learning model, and uses the difference between the output feature maps as the difference between the adjacent frames. may be used. By comparing the feature quantity maps, it is expected that the effects of changes in lighting and vibrations on the sales floor can be effectively reduced. Alternatively, the interaction detection unit 110 may use any suitable machine learning model, such as a convolutional neural network trained to detect a portion of the two input frames having a difference equal to or greater than a predetermined threshold. Each adjacent frame may be input to the model and the detected difference portion may be used as the difference between the adjacent frames. By detecting the difference between adjacent frames in this way, the interaction detection unit 110 can identify the position and/or time when the change occurred in the counter.

Also, the interaction detection unit 110 may estimate the placement area for each product type from the sales floor video. Specifically, the interaction detection unit 110 performs area division for each product type on the frame of the sales floor image, and estimates an arrangement area for each product type. For example, the interaction detection unit 110 may use a trained machine learning model to perform region segmentation based on the product type for the frame of the sales floor video from which the moving object has been removed, and estimate the placement region for each product type. good. The machine learning model may be trained to, upon input of a frame of sales floor video from which moving objects have been removed, divide the frame into regions and output a product region map indicating placement regions for each product type. For example, the interaction detection unit 110 inputs the frame of the sales floor video from which the moving object has been removed to the trained machine learning model, acquires the product area map indicating the display area for each product type in the sales floor, and obtains the acquired product area map. A frame may be generated that is superimposed on the input frame and divided into regions for each product type.

Here, the machine learning model for region estimation may be realized as, for example, a neural network, and a pair of a frame of a sales floor image and an annotated frame with an arrangement region for each product type is used as training data. may be trained by supervised learning using as Specifically, the machine learning model may be an instance segmentation model such as Mask-RCNN, and for multiple products or product types in a frame, bounding boxes to be detected and corresponding segmentation It may be trained to predict the mask.

Alternatively, the machine learning model may be a convolutional neural network and may be trained to segment by clustering feature vectors in the feature map. In other words, areas with similar feature vectors can be considered areas in which products of the same type are displayed. Such a convolutional neural network may be trained by tuning such as a pretrained convolutional neural network on a large image dataset such as another Imagenet, or by assigning temporary labels to product regions, It may be trained to predict that label number.

However, the present disclosure is not limited to this, and any other appropriate area division technique for each product type may be used.

After acquiring the sales floor image segmented into regions for each product type in this manner, the interaction detection unit 110 estimates the product names and/or the number of products (including the quantity of products) displayed in the arrangement area. may Specifically, the interaction detection unit 110 uses a trained machine learning model to estimate the product name and/or the number of products in the product group included in the placement area for each product type. The machine learning model is trained to output the product name and/or the center position of the product contained in the frame when the motion-removed sales floor video frame is input. For example, a product name may be indicated by product identification information such as a product number assigned in advance to the product name. Also, the center position of each product may be indicated by a symbol (such as a circle) indicating the center of each product in the frame, or may be indicated by a product center heat map or the like. A machine learning model is trained to receive a frame of motion-removed sales floor video and output the product name and/or product center of the product imaged in that frame. Such a machine learning model may be realized, for example, as a neural network, and a frame of a sales floor image and an annotated frame with a product name for each product type and/or the center of each product in the frame. may be trained by supervised learning using pairs of as training data.

More specifically, when the interaction detection unit 110 uses a trained machine learning model to estimate the product name of a product group displayed in an arrangement area, the machine learning model captures images from the input frame into the frame. It may specify product identification information such as the product number of the purchased product. That is, the machine learning model is realized as a neural network, and is trained by supervised learning using pairs of sales floor video frames and frames with annotations that give product identification information for each product in the frames as training data. may be After acquiring the machine learning model trained in this way, the interaction detection unit 110 can use the machine learning model to estimate the product name of each product displayed in the frame of the sales floor video. Here, the input frame may be a frame divided into areas, or may be a frame not divided into areas.

Alternatively, the machine learning model may be a neural network that determines product feature values for each product type from the frames of the sales floor video. After estimating the feature amount of each product arranged in the frame using the machine learning model, the interaction detection unit 110 may specify the product name corresponding to the estimated feature amount as the product.

In addition, if the product does not correspond to any existing product type, the product may be determined as unknown. In addition, when external information such as store layout information and POS data is available, it is possible to narrow down the products to be placed in the analysis target sales floor from the external information, It is possible to acquire a machine learning model for each type of product classification (for example, vegetables, sweets, etc.) suitable for the product of the sales floor, etc., and improve the estimation accuracy.

Next, when the interaction detection unit 110 uses a trained machine learning model to estimate the number of products in the product group in the placement area, the machine learning model can, for example, estimate the number of products captured in the frame from the input frame. It may also specify the center. That is, the machine learning model is realized as a neural network and trained by supervised learning using pairs of frames of sales floor images and frames with annotations that give the center of each product in the frame as training data. good too.

After acquiring the machine learning model trained in this way, the interaction detection unit 110 uses the machine learning model to estimate the center of each product displayed in the frame of the sales floor image, and divides the frame into regions. , the number of products displayed in each placement area can be estimated based on the estimated number of centers in each placement area. For example, the interaction detection unit 110 uses both a machine learning model for identifying product names and a machine learning model for estimating product centers to determine the product names of products arranged in each placement region of the segmented frame. and the center of each item can be generated. The interaction detection unit 110 can estimate the product name and the number of products for each product type by counting the number of center items included in each placement area based on the frame.

It should be noted that the estimation of the number of products according to this disclosure is not limited to this. For example, instead of product center, a machine learning model may be used that detects by a bounding box indicating the position of each product within a frame. In this case, the interaction detection unit 110 may estimate the number of products by counting the number of bounding boxes included in each placement area. Alternatively, the number of items may be estimated by taking the item-centric heatmap as the item density and integrating the item-centric heatmap for each placement region. Alternatively, the interaction detection unit 110 may estimate the number of products in each placement region of the frame using a machine learning model trained to regress the number of products from the feature amount of the placement region. Estimating the number of items by regression of item density and number of items described above may also predict the number of hidden items that are not captured in the frame if the machine learning model is properly trained. In addition, by using a machine learning model that recognizes areas where no products are placed and a machine learning model that recognizes areas where products can be placed, the size of the area where products are placed is calculated backwards. You may

On the other hand, the interaction detection unit 110 identifies the person and movement in the image of the person extracted from the preprocessed sales floor image. For example, the interaction detection unit 110 may distinguish between a store clerk and a customer captured in the sales floor video. That is, when the interaction detection unit 110 receives an image of a person extracted from a sales floor image, the interaction detection unit 110 uses a machine learning model trained to determine whether the person is a customer or a salesclerk. good. The machine learning model may be implemented, for example, as a convolutional neural network, and trained using annotated image data of store employees and annotated image data of customers as training data. In general, a store clerk wears a predetermined uniform, name tag, etc., and the machine learning model is considered to be able to distinguish between a store clerk and a visitor by detecting these. In this way, the interaction detection unit 110 may discriminate between a store clerk and a customer captured based on the sales floor image, and use the discrimination result to estimate behavior.

Also, the interaction detection unit 110 may detect the movement of the person from the extracted video of the person. For example, the interaction detection unit 110 detects that the customer walks in front of the product sales floor, the customer stops in front of the product sales floor, the customer picks up the product, or the customer returns the product. Predetermined behaviors related to customer interactions with products and sales floors may be detected.

For example, the interaction detection unit 110 may generate trajectory data by tracking the position of the person using known tracking technology, and detect walking or staying of the person from the generated trajectory data. Here, the trajectory data may, for example, associate a position with the time when the person was at the position. In addition, as a known tracking technique, the bounding box of a person is detected for each frame, and in the detection results of temporally adjacent frames, if the difference in the feature amount corresponding to the detection area is small or if the overlap of the bounding boxes is large. By assigning the same ID to each detected person and assigning the same ID to each other, and applying this processing to all frames in the target video, the trajectory of each person's movement is derived. may Specifically, when the trajectory data of the customer indicates that the customer has passed in front of the sales floor to be analyzed, the interaction detection unit 110 may determine that the customer has walked in front of the sales floor to be analyzed. Further, when the customer's trajectory data indicates that the customer has stayed in front of the analysis target sales floor for a predetermined threshold time or longer, the interaction detection unit 110 determines that the customer stopped in front of the analysis target sales floor. good too.

Alternatively, the trajectory data may be configured as time-series trajectory data, and the interaction detection unit 110 detects whether the time-series trajectory data and the time-series trajectory data have passed through the sales floor, or have passed through the sales floor. Any appropriate machine learning model such as a neural network trained with a pair of annotations indicating whether the customer has interacted as training data may be used to determine whether the customer stopped in front of the sales floor to be analyzed. .

In addition, the interaction detection unit 110 detects the movement of body parts such as the hands of the person using a known pose estimation technique, and detects the interaction of the person with the product on the sales floor based on the detection result. good. Here, OpenPose, AlphaPose, or the like may be used as known pose estimation techniques. Specifically, the interaction detection unit 110 detects the position of the customer's hand from the image of the customer, and if the detected hand is in the area where the product is placed in the sales floor image for a predetermined threshold time or longer, the customer detects that the product is in question. It may be determined that the user has interacted with

Alternatively, the interaction detection unit 110 is trained using a pair of an image of a hand extracted by pose estimation and an annotation indicating whether the hand is picking up a product, returning the product, or something else as training data. A machine learning model, such as any suitable neural network, may be used to determine whether a customer has picked up or returned an item at the analyzed department.

Also, the interaction detection unit 110 may use an action estimator to estimate the action of the extracted person. Specifically, the action estimator takes as input an extracted video of a person, and may be any suitable machine learning model, such as a neural network, trained to determine which of the predetermined actions the person is performing. may be implemented. The machine learning model uses a video of a person and the products and the sales floor, such as when the person walks in front of the product sales floor, stops in front of the product sales floor, looks at the product, picks up the product, or returns the product. may be trained as training data pairs with predetermined behaviors associated with interactions with .

It should be noted that the interaction detection unit 110 may use an action estimator to detect not only the behavior of customers but also the behavior of store employees regarding products and sales floors. In this case, the action estimator uses a machine learning model trained to detect not only the predetermined interactions of shoppers with the products described above, but also the interactions of the store clerk with the product and the sales floor. Interactions by store clerks with products and sales floors may also be detected. For example, the interaction detection unit 110 is a machine learning model such as a neural network trained to detect interactions such as display work such as arranging, replenishing, and replacing products in an arrangement area by a store clerk, and sales promotion work such as presenting POP advertisements. may be used to detect the interaction of the clerk with the product from the image of the clerk.

In this way, when changes in the sales floor, product areas, movements with people, etc. are detected, the interaction detection unit 110 further detects interactions with products by customers and store clerks based on these detection results. Specifically, the interaction detection unit 110 identifies the position and time at which the change occurred in the sales floor from the detection result of the change in the sales floor, and from the detection result of the product area, the product name and the product displayed in the analysis target sales floor. It is possible to specify the number of people, and from the results of detecting people, it is possible to specify interactions with products by customers or store clerks. As a result, the interaction detection unit 110 identifies an increase or decrease in the number of products for each product type, and an interaction with a person or product who was in the sales floor at that time, from the sales floor images before and after the change in the position and time of the change in the sales floor. be able to.

For example, the interaction detection unit 110 may be able to detect, from the sales floor video before and after the change, that a customer picked up two products A and the number of products displayed on the sales floor decreased by two. Alternatively, the interaction detection unit 110 may be able to detect that the store clerk replenished product B and the number of products displayed in the sales floor increased from the sales floor images before and after the change. Further, the interaction detection unit 110 may be able to detect from the sales floor image that a customer has passed or stopped in front of the sales floor even when there is no change in the sales floor. In order to determine the interaction for the combination of the detection result of the change in the sales floor, the product area, and the movement with the person, the interaction detection unit 110 detects the change in the sales floor, the product area, and the movement with the person, and the interaction. A table showing the correspondence may be stored in advance. The interaction detection unit 110 may refer to the table and determine an interaction corresponding to a combination of detection results of changes in the sales floor, product areas, and movements of a person on a rule basis.

That is, the interaction detection unit 110 may use the amount of products in the sales floor estimated based on the sales floor for behavior detection. When the interaction with the product on the sales floor to be analyzed is detected in this way, the interaction detection unit 110 passes the detection result of the interaction to the attention level estimation unit 120 .

The attention level estimation unit 120 estimates the attention level of the product based on the interaction detection result. Specifically, for example, the attention level estimation unit 120 performs statistical processing such as normalization on the interactions detected within a predetermined period, and determines the product and/or sales floor that are the target of the interaction by the customer. Calculate attention. For example, the attention level estimation unit 120 counts the number of interactions in which customers stopped at the sales floor, normalizes the total number of visitors to the store during the period and the total number of customers who have passed the sales floor, and determines the sales floor and/or Or you may determine the attention degree of the goods currently displayed.

In addition, the attention degree estimation unit 120 counts the number of interactions in which customers pick up the products displayed in each placement area of the sales floor, and based on the relative number of interactions in each placement area, each placement area is counted. The prominence of items displayed in the area may be determined. For example, it is conceivable that an arrangement area with a relatively large number of interactions attracts a high degree of attention not only to the displayed products but also to the arrangement area. For this reason, the estimated attention level may be used to display actively sold products in an arrangement area with a high attention level.

Also, the attention level estimation unit 120 may estimate the relationship between the interaction by the clerk and the attention level. For example, in response to an interaction in which a store clerk performs display work such as replenishment, replacement, or tidying up of a product, the detection results of customer interaction with the product after that interaction are aggregated, and the store clerk is based on the aggregated interaction detection result. It may be estimated how much the interaction with the product affected the attention of the product. For example, the attention level estimation unit 120 calculates an increase or decrease in customer interaction with a certain product before and after the store clerk puts the sales floor in order. You may judge that it contributed to sales of goods.

Then, the attention degree estimation unit 120 may notify the store clerk or the department in charge of the estimated attention degree and use it for subsequent display strategies and sales promotion strategies. For example, when the attention level of a product is equal to or higher than a predetermined threshold, the attention level estimation unit 120 expands the arrangement area of the product, increases the order quantity of the product so as not to cause shortages, or The store clerk or department in charge may be notified to raise the evaluation of the store clerk who displayed the product. In addition, when the attention level of a certain sales floor is equal to or higher than a predetermined threshold, the attention level estimation unit 120 displays a product to be promoted in the sales floor, or instructs the sales clerk to raise the evaluation of the sales clerk who displayed the sales floor. You can notify the department in charge.

[Analysis processing]
Next, analysis processing according to an embodiment of the present disclosure will be described with reference to FIG. The analysis process is executed by the analysis device 100 described above, and can be realized by one or more processors executing a program stored in one or more memories of the analysis device 100, for example. FIG. 3 is a flowchart illustrating analysis processing according to one embodiment of the present disclosure.

As shown in FIG. 3, in step S101, the analysis device 100 acquires a sales floor image. Specifically, the analysis device 100 acquires the sales floor image from the imaging device 20 installed in the sales floor. Here, the analysis apparatus 100 may execute the following steps on the acquired sales floor image in real time, or temporarily store the acquired sales floor image and store the stored sales floor image at an appropriate timing. The following steps may be performed on the video.

In step S102, the analysis device 100 preprocesses the sales floor video. Specifically, analysis device 100 uses any known object detector to detect moving objects such as people and shopping carts in sales floor images, and uses any known moving object removal technology to detect To remove a moving object from a sales floor image. In addition, the analysis device 100 performs region segmentation for each product type on the sales floor video from which the moving object has been removed, and estimates an arrangement region for each product type.

In step S103, the analysis device 100 detects the behavior of sales staff and customers with respect to the sales floor and the products on the sales floor. Specifically, the analysis device 100 may calculate the difference between the frames of the sales floor video from which the moving object has been removed, and determine whether a change has occurred in the sales floor based on the calculated difference. Also, the analysis device 100 may estimate the product names and/or the quantity (the number of products) of the products displayed in the arrangement area. Also, the analysis device 100 may recognize a person or movement in an image of a person detected as a moving object. For example, the analysis device 100 may discriminate between a store clerk and a customer captured in the sales floor video. In addition, the analysis device 100 can detect, from the image of the person, that the customer walks in front of the product sales floor, the customer stops in front of the product sales floor, the customer looks at the product, the customer picks up the product, and so on. Alternatively, predetermined behaviors related to customer interaction with the product or the sales floor, such as the customer returning the product, may be detected. The analysis device 100 may also detect interaction by a salesclerk with a product or sales floor (for example, display work such as replenishment, replacement, and arrangement of products in the sales floor, sales promotion work such as presentation of POP advertisements, etc.). Detection of these interactions may be performed based on, for example, a machine learning model such as a neural network, and a machine learning model may be configured for each type of human behavior to be detected. An end-to-end machine learning model may be constructed that detects the desired type of behavior.

In step S104, the analysis device 100 may estimate the attention level of the sales floor based on the behavior detection results. Specifically, the analysis device 100 detects the attention level of the sales floor based on the detection results of various behaviors. For example, the analysis device 100 may perform statistical processing on behaviors detected within a predetermined period of time, and calculate the degree of attention of the products and/or sales floors targeted by the behaviors of customers and store clerks. The analysis apparatus 100 may notify the store clerk or the department in charge of the estimated degree of attention, and may use it for subsequent display strategies and sales promotion strategies.

[Hardware configuration]
Part or all of the analysis device 100 in the above-described embodiment may be configured by hardware, or information on software (program) executed by a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or the like. processing. In the case of software information processing, software that realizes at least part of the functions of each device in the above-described embodiments can be stored on a flexible disk, CD-ROM (Compact Disc-Read Only Memory), or USB (Universal Serial Bus) memory or other non-temporary storage medium (non-temporary computer-readable medium) and read by a computer to execute software information processing. Alternatively, the software may be downloaded via a communication network. Furthermore, information processing may be executed by hardware by implementing software in a circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

The type of storage medium that stores the software is not limited. The storage medium is not limited to a detachable one such as a magnetic disk or an optical disk, and may be a fixed storage medium such as a hard disk or memory. Also, the storage medium may be provided inside the computer, or may be provided outside the computer.

FIG. 4 is a block diagram showing an example of the hardware configuration of the analysis device 100 in the embodiment described above. The analysis device 100 includes, for example, a processor 71 , a main storage device 72 (memory), an auxiliary storage device 73 (memory), a network interface 74 , and a device interface 75 . It may also be implemented as a connected computer 7 .

Although the computer 7 in FIG. 4 has one of each component, it may have a plurality of the same components. In addition, although one computer 7 is shown in FIG. 4, the software is installed in a plurality of computers, and each of the plurality of computers executes the same or different processing of the software. good too. In this case, it may be in the form of distributed computing in which each computer communicates via the network interface 74 or the like to execute processing. In other words, the analysis apparatus 100 in the above-described embodiment may be configured as a system in which functions are realized by one or more computers executing instructions stored in one or more storage devices. Further, the information transmitted from the terminal may be processed by one or more computers provided on the cloud, and the processing result may be transmitted to the terminal.

Various operations of the analysis device 100 in the above-described embodiment may be executed in parallel using one or more processors or using multiple computers via a network. Also, various operations may be distributed to a plurality of operation cores in the processor and executed in parallel. Also, part or all of the processing, means, etc. of the present disclosure may be executed by at least one of a processor and a storage device provided on a cloud capable of communicating with the computer 7 via a network. Thus, the analysis device 100 in the above-described embodiments may be in the form of parallel computing by one or more computers.

The processor 71 may be an electronic circuit (processing circuit, processing circuit, CPU, GPU, FPGA, ASIC, etc.) including a computer control device and arithmetic device. Also, the processor 71 may be a semiconductor device or the like including a dedicated processing circuit. The processor 71 is not limited to an electronic circuit using electronic logic elements, and may be realized by an optical circuit using optical logic elements. Also, the processor 71 may include arithmetic functions based on quantum computing.

The processor 71 can perform arithmetic processing based on the data and software (programs) input from each device, etc. of the internal configuration of the computer 7, and output the arithmetic result and control signal to each device, etc. The processor 71 may control each component of the computer 7 by executing the OS (Operating System) of the computer 7, applications, and the like.

The analysis device 100 in the above-described embodiment may be realized by one or more processors 71. Here, the processor 71 may refer to one or more electronic circuits arranged on one chip, or one or more electronic circuits arranged on two or more chips or two or more devices. You can point When multiple electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.

The main storage device 72 is a storage device that stores commands executed by the processor 71 and various types of data. The auxiliary storage device 73 is a storage device other than the main storage device 72 . These storage devices mean any electronic components capable of storing electronic information, and may be semiconductor memories. The semiconductor memory may be either volatile memory or non-volatile memory. A storage device for storing various data in the analysis device 100 in the above-described embodiment may be implemented by the main storage device 72 or the auxiliary storage device 73, or may be implemented by a built-in memory built into the processor 71. . For example, the storage unit 72 in the above-described embodiment may be realized by the main storage device 72 or the auxiliary storage device 73.

A plurality of processors may be connected (coupled) to one storage device (memory), or a single processor may be connected. A plurality of storage devices (memories) may be connected (coupled) to one processor. When the analysis device 100 in the above-described embodiment is composed of at least one storage device (memory) and a plurality of processors connected (coupled) to this at least one storage device (memory), at least One processor may include a configuration that is connected (coupled) to at least one storage device (memory). Also, this configuration may be realized by storage devices (memory) and processors included in a plurality of computers. Furthermore, a configuration in which a storage device (memory) is integrated with a processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.

The network interface 74 is an interface for connecting to the communication network 8 wirelessly or by wire. As for the network interface 74, an appropriate interface such as one conforming to existing communication standards may be used. The network interface 74 may exchange information with the external device 9A connected via the communication network 8 . Note that the communication network 8 may be any one of WAN (Wide Area Network), LAN (Local Area Network), PAN (Personal Area Network), etc., or a combination of them. It is sufficient if information can be exchanged between them. Examples of WANs include the Internet, examples of LANs include IEEE 802.11 and Ethernet (registered trademark), and examples of PANs include Bluetooth (registered trademark) and NFC (Near Field Communication).

The device interface 75 is an interface such as USB that directly connects with the external device 9B.

The external device 9A is a device connected to the computer 7 via a network. The external device 9B is a device that is directly connected to the computer 7. FIG.

For example, the external device 9A or the external device 9B may be an input device. The input device is, for example, a device such as a camera, microphone, motion capture, various sensors, keyboard, mouse, or touch panel, and provides the computer 7 with acquired information. Alternatively, a device such as a personal computer, a tablet terminal, or a smartphone including an input unit, a memory, and a processor may be used.

Also, the external device 9A or the external device 9B may be an output device as an example. The output device may be, for example, a display device such as LCD (Liquid Crystal Display), CRT (Cathode Ray Tube), PDP (Plasma Display Panel), or organic EL (Electro Luminescence) panel, and output audio etc. It may be a speaker or the like that Alternatively, a device such as a personal computer, a tablet terminal, or a smartphone including an output unit, a memory, and a processor may be used.

Also, the external device 9A or the external device 9B may be a storage device (memory). For example, the external device 9A may be a network storage or the like, and the external device 9B may be a storage such as an HDD.

Also, the external device 9A or the external device 9B may be a device having the functions of some of the components of each device (server 100 or terminal 200) in the above-described embodiments. That is, the computer 7 may transmit or receive part or all of the processing results of the external device 9A or the external device 9B.

In the present specification (including claims), the expression "at least one (one) of a, b and c" or "at least one (one) of a, b or c" (including similar expressions) Where used, includes any of a, b, c, a-b, ac, b-c, or a-b-c. Also, multiple instances of any element may be included, such as a-a, a-b-b, a-a-b-b-c-c, and so on. It also includes the addition of other elements than the listed elements (a, b and c), such as having d such as a-b-c-d.

In this specification (including claims), when expressions such as "data as input / based on data / according to / according to" (including similar expressions) are used, unless otherwise specified, It includes the case where various data itself is used as an input, and the case where various data subjected to some processing (for example, noise added, normalized, intermediate representation of various data, etc.) is used as an input. In addition, if it is stated that some result can be obtained "based on/according to/depending on the data", this includes cases where the result is obtained based only on the data, other data other than the data, It may also include cases where the result is obtained under the influence of factors, conditions, and/or states. In addition, if it is stated that "data will be output", unless otherwise specified, if the various data themselves are used as output, or if the various data have undergone some processing (for example, noise addition, normalization, etc.) This also includes the case where the output is a converted version, an intermediate representation of various data, etc.).

In this specification (including the claims), when the terms "connected" and "coupled" are used, they refer to direct connection/coupling, indirect connection/coupling , electrically connected/coupled, communicatively connected/coupled, operatively connected/coupled, physically connected/coupled, etc. intended as a term. The term should be interpreted appropriately according to the context in which the term is used, but any form of connection/bonding that is not intentionally or naturally excluded is not included in the term. should be interpreted restrictively.

In this specification (including claims), when the phrase "A configured to B" is used, the physical structure of element A is such that it is capable of performing operation B configuration, including that a permanent or temporary setting/configuration of element A is configured/set to actually perform action B good. For example, when element A is a general-purpose processor, the processor has a hardware configuration capable of executing operation B, and operation B is performed by setting a permanent or temporary program (instruction). It just needs to be configured to actually run. In addition, when the element A is a dedicated processor or a dedicated arithmetic circuit, etc., regardless of whether or not control instructions and data are actually attached, the circuit structure of the processor actually executes the operation B. It just needs to be implemented.

In this specification (including the claims), when terms denoting containing or possessing (e.g., "comprising/including" and "having, etc.") are used, by the object of the terms It is intended as an open-ended term, including the case of containing or possessing things other than the indicated object. When the object of these terms of inclusion or possession is an expression that does not specify a quantity or implies a singular number (an expression with the article a or an), the expression shall be construed as not being limited to a specific number. It should be.

In the specification (including the claims), expressions such as "one or more" or "at least one" are used in some places and quantities are specified in other places. Where no or suggestive of the singular (a or an as an article) is used, the latter is not intended to mean "one." In general, expressions that do not specify a quantity or imply a singular number (indicative of the articles a or an) should be construed as not necessarily being limited to a particular number.

In this specification, when it is stated that a particular configuration of an embodiment has a particular effect (advantage/result), unless there is a specific reason otherwise, one or more other having that configuration It should be understood that this effect can be obtained also for the embodiment of However, it should be understood that the presence or absence of the effect generally depends on various factors, conditions, and/or states, and that the configuration does not always provide the effect. The effect is only obtained by the configuration described in the embodiment when various factors, conditions, and/or states are satisfied, and in the claimed invention defining the configuration or a similar configuration , the effect is not necessarily obtained.

In this specification (including claims), when terms such as "maximize" are used, finding a global maximum, finding an approximation of a global maximum, finding a local maximum and approximating the local maximum, should be interpreted appropriately depending on the context in which the term is used. It also includes probabilistically or heuristically approximating these maximum values. Similarly, when terms such as "minimize" are used, finding a global minimum, finding an approximation of a global minimum, finding a local minimum, and finding a local minimum It includes approximations of values and should be interpreted accordingly depending on the context in which the term is used. It also includes stochastically or heuristically approximating these minimum values. Similarly, when terms such as "optimize" are used, finding a global optimum, finding an approximation of a global optimum, finding a local optimum, and finding a local optimum It includes approximations of values and should be interpreted accordingly depending on the context in which the term is used. It also includes stochastically or heuristically approximating these optimum values.

In this specification (including claims), when a plurality of pieces of hardware perform predetermined processing, each piece of hardware may work together to perform the predetermined processing, or a part of the hardware may perform the predetermined processing. You may do all of Also, some hardware may perform a part of the predetermined processing, and another hardware may perform the rest of the predetermined processing. In the present specification (including claims), when expressions such as "one or more hardware performs a first process and the one or more hardware performs a second process" are used , the hardware that performs the first process and the hardware that performs the second process may be the same or different. In other words, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more pieces of hardware. Note that hardware may include an electronic circuit or a device including an electronic circuit.

In this specification (including claims), when a plurality of storage devices (memories) store data, each storage device (memory) among the plurality of storage devices (memories) stores only part of the data. may be stored, or the entirety of the data may be stored.

Although the embodiments of the present disclosure have been described in detail above, the present disclosure is not limited to the individual embodiments described above. Various additions, changes, replacements, partial deletions, etc. are possible without departing from the conceptual idea and spirit of the present invention derived from the content defined in the claims and equivalents thereof. For example, in all the embodiments described above, when numerical values or formulas are used for explanation, they are shown as an example and are not limited to these. Also, the order of each operation in the embodiment is shown as an example, and is not limited to these.

This application claims priority based on Japanese Patent Application No. 2021-023665 filed on February 17, 2021. By referring to the entire contents of the Japanese Patent Application, invoke.

REFERENCE SIGNS LIST 10 analysis system 20 imaging device 30 user terminal 100 analysis device 110 interaction detection unit 120 attention level estimation unit

Claims

one or more memories;
one or more processors;
has
The one or more processors
estimating the degree of attention of the sales floor based on detection results of human behavior related to the sales floor;
analysis equipment.
The analysis device according to claim 1, wherein the one or more processors detect behavior of a person regarding the sales floor based on sales floor images.
The analysis device according to claim 2, wherein the one or more processors use changes in the sales floor detected based on the sales floor video for detecting the behavior.
4. The analysis device according to claim 2 or 3, wherein said one or more processors use the quantity of products on the sales floor estimated based on said sales floor image for said behavior detection.
5. The analysis device according to claim 2, wherein said one or more processors distinguish between customers and store clerks imaged based on said sales floor image, and use the discrimination result to detect said behavior. .
The behavior detection results include at least the customer walking in front of the product sales floor, the customer stopping in front of the product sales floor, the customer viewing the product, and the customer picking up the product. 6. The analysis apparatus according to any one of claims 2 to 5, comprising any one of: or the customer returns the product.
6. The analysis according to claim 5, wherein the one or more processors estimate the degree of attention of the sales floor by totaling the number of times the behavior of the store visitor is detected for each sales floor, based on the behavior detection result. Device.
8. The analysis device according to claim 7, wherein the one or more processors estimate, based on the detection result of the behavior, the influence of the behavior of the clerk in the sales floor on the attention level of the sales floor.
The analysis device according to any one of claims 2 to 8, wherein said one or more processors detect said behavior using a neural network.
an analysis device according to any one of claims 2 to 9;
one or more imaging devices that acquire the sales floor video;
analysis system.
one or more processors estimate the attention level of the sales floor based on detection results of human behavior related to the sales floor;
A method of analysis that has
A program that causes one or more processors to execute at least the step of estimating the attention level of the sales floor based on the detection result of human behavior on the sales floor.