US20240029273A1 - Information processing apparatus, control method, and program - Google Patents
Information processing apparatus, control method, and program Download PDFInfo
- Publication number
- US20240029273A1 US20240029273A1 US18/227,848 US202318227848A US2024029273A1 US 20240029273 A1 US20240029273 A1 US 20240029273A1 US 202318227848 A US202318227848 A US 202318227848A US 2024029273 A1 US2024029273 A1 US 2024029273A1
- Authority
- US
- United States
- Prior art keywords
- motion
- target region
- analysis target
- person
- hand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims description 56
- 238000004458 analytical method Methods 0.000 claims abstract description 102
- 230000033001 locomotion Effects 0.000 claims abstract description 87
- 238000001514 detection method Methods 0.000 description 34
- 238000010586 diagram Methods 0.000 description 23
- 238000003384 imaging method Methods 0.000 description 12
- 238000010191 image analysis Methods 0.000 description 9
- 238000010801 machine learning Methods 0.000 description 5
- 230000007423 decrease Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 210000000707 wrist Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/248—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/74—Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/255—Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G08—SIGNALLING
- G08B—SIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
- G08B13/00—Burglar, theft or intruder alarms
- G08B13/18—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
- G08B13/189—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
- G08B13/194—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
- G08B13/196—Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G06T2207/10021—Stereoscopic video; Stereoscopic image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30232—Surveillance
Definitions
- the present invention relates to image analysis.
- a customer takes out and purchases a product displayed in a display place (for example, a product shelf).
- the customer may return the product once picked up to the display place.
- Techniques of analyzing such an action of the customer related to the product displayed are developed.
- Patent Document 1 discloses a technique of detecting that an item (a hand of person) enters a determined region (shelf) using a depth image obtained from an imaging result by a depth camera and determining a motion of a customer using a color image near an entry position before and after the entry. Specifically, a color image including a hand of a person entering the determined region is compared with a color image including the hand of the person leaving the determined region to respectively determine the motions of the person as “acquisition of product” in a case where an increase in a color exceeds a threshold value, “return of product” in a case where a decrease in the color exceeds a threshold value, and “contact” in a case where a change in the color is less than a threshold value. Further, Patent Document 1 discloses a technique of deciding the increase or decrease in a volume of a subject from information on a size of the subject obtained from the imaging result of the depth camera to distinguish between the acquisition and the return of the product.
- a degree of increase or decrease in color or volume before and after the entry of the hand of the person into the display place is affected by, for example, changes in a size of the product or a pose of the hand of the person. For example, in a case where a small product is taken out from the display place, the increase in color and volume before and after that is small. Further, a motion of changing the pose of the hand may be erroneously recognized as the motion of acquiring the product.
- the present invention is made in view of the above problems.
- One of the objects of the present invention is to provide a technique of determining a motion of a person with respect to a displayed item with high accuracy.
- the information processing apparatus includes: 1) a detection unit that detects a reference position indicating a position of a hand of a person included in a captured image from the captured image in which a display place of an item is imaged; 2) a deciding unit that decides an analysis target region in the captured image using the detected reference position and decides the analysis target region; and 3) a determination unit that analyzes the decided analysis target region to determine a motion of the person.
- a control method of the present invention is executed by a computer.
- the control method includes: 1) a detection step of detecting a reference position indicating a position of a hand of a person included in a captured image from the captured image in which a display place of an item is imaged; 2) a deciding step of deciding an analysis target region in the captured image using the detected reference position and deciding the analysis target region; and 3) a determination step of analyzing the decided analysis target region to determine a motion of the person.
- a program of the present invention causes a computer to execute each step of the control method of the present invention.
- FIG. 1 is a diagram conceptually illustrating an operation of an information processing apparatus according to an example embodiment 1.
- FIG. 2 is a block diagram illustrating an example of a functional configuration of the information processing apparatus according to the example embodiment 1.
- FIG. 3 is a diagram illustrating a computer for forming the information processing apparatus.
- FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1.
- FIG. 5 is a first diagram illustrating an imaging range of a camera.
- FIG. 6 is a second diagram illustrating the imaging range of the camera.
- FIG. 7 is a diagram illustrating a case where a captured image includes a scene in which a product shelf is imaged from the right side as viewed from the front.
- FIGS. 8 A and 8 B are diagrams illustrating an analysis target region that is decided as a region having a predetermined shape defined with a reference position as a reference.
- FIG. 9 is a diagram illustrating a case where an orientation of the analysis target region is defined based on an orientation of a hand of a customer.
- FIG. 10 is a flowchart illustrating a flow of processing for determining a motion of a customer 20 .
- FIG. 11 is a flowchart illustrating the flow of the processing for determining the motion of the customer 20 .
- FIG. 12 is a diagram illustrating display information in a table format.
- FIG. 13 is a diagram illustrating a depth image generated by the camera.
- FIG. 14 is a diagram illustrating display information indicating a range of a distance from a camera for each stage of the product shelf.
- each block represents a configuration of functional units instead of a configuration of hardware units in each block diagram, unless otherwise described.
- FIG. 1 is a diagram conceptually illustrating an operation of an information processing apparatus according to an example embodiment 1 (information processing apparatus 2000 shown in FIG. 2 and the like described below). Note that FIG. 1 is an illustration for easily understanding the operation of the information processing apparatus 2000 and the operation of the information processing apparatus 2000 is not limited by FIG. 1 .
- the information processing apparatus 2000 analyzes a captured image 12 generated by a camera 10 to determine a motion of a person.
- the camera 10 is a camera that images a display place where an item is displayed.
- the camera 10 repeatedly performs imaging and generates a plurality of captured images 12 .
- the plurality of generated captured images 12 are, for example, a frame group that constitutes video data.
- the plurality of captured images 12 generated by the camera 10 do not necessarily need to constitute the video data and may be handled as individual still image data.
- An item to be imaged by the camera 10 can be any item that is displayed at the display place, and is taken out from the display place by a person or is placed (returned) in the display place by a person on the contrary.
- a specific item to be imaged by the camera 10 varies depending on a usage environment of the information processing apparatus 2000 .
- the information processing apparatus 2000 is used to determine the motion of a customer or a store clerk in a store.
- the item to be imaged by the camera 10 is a product sold in the store.
- the display place described above is, for example, a product shelf.
- the information processing apparatus 2000 is used to determine the motion of a customer 20 . Therefore, the person and the item to be imaged by the camera 10 are respectively the customer 20 and a product 40 . Further, the display place is a product shelf 50 .
- the information processing apparatus 2000 is used to determine the motion of a factory worker or the like.
- the person to be imaged by the camera 10 is the worker or the like.
- the item to be imaged by the camera 10 is a material, a tool, or the like which is used in the factory.
- the display place is a shelf installed in, for example, a warehouse of the factory.
- the information processing apparatus 2000 is used to determine the motion of the customer (customer 20 in FIG. 1 ) in the store will be described as an example, unless otherwise noted in this specification. Therefore, it is assumed that the “motion of person” determined by the determination unit 2060 is the “motion of customer”. Further, it is assumed that the “item” to be imaged by the camera is the “product”. Furthermore, it is assumed that the “display place” is the “product shelf”.
- the information processing apparatus 2000 detects a reference position 24 from the captured image 12 .
- the reference position 24 indicates a position of a hand of the person.
- the position of the hand of the person is, for example, a center position of the hand or a position of a fingertip.
- the information processing apparatus 2000 decides a region to be analyzed (analysis target region 30 ) in the captured image 12 using this reference position 24 .
- the information processing apparatus 2000 analyzes the analysis target region 30 to determine the motion of the customer 20 .
- the motion of the customer 20 is a motion of holding the product 40 , a motion of taking out the product 40 from the product shelf 50 , or a motion of placing the product 40 on the product shelf 50 .
- the information processing apparatus 2000 first detects the reference position 24 indicating the position of the hand of the customer 20 in the captured image 12 and decides the analysis target region 30 based on the reference position 24 . That is, the image analysis is performed near the hand of the customer 20 .
- FIG. 2 is a block diagram illustrating an example of a functional configuration of the information processing apparatus 2000 according to the example embodiment 1.
- the information processing apparatus 2000 has a detection unit 2020 , a deciding unit 2040 , and a determination unit 2060 .
- the detection unit 2020 detects the reference position 24 of the hand of the person included in the captured image 12 from the captured image 12 .
- the deciding unit 2040 decides the analysis target region 30 in the captured image 12 using the detected reference position 24 .
- the determination unit 2060 analyzes the decided analysis target region 30 to determine the motion of the person.
- Each functional configuration unit of the information processing apparatus 2000 may be formed by hardware (for example, a hard-wired electronic circuit) that forms each functional configuration unit or a combination of hardware and software (for example, a combination of an electronic circuit and a program that controls the circuit).
- hardware for example, a hard-wired electronic circuit
- software for example, a combination of an electronic circuit and a program that controls the circuit.
- FIG. 3 is a diagram illustrating a computer 1000 for forming the information processing apparatus 2000 .
- the computer 1000 is a variety of computers.
- the computer 1000 is a personal computer (PC), a server machine, a tablet terminal, or a smartphone.
- the computer 1000 may be the camera 10 .
- the computer 1000 may be a dedicated computer designed to form the information processing apparatus 2000 or may be a general-purpose computer.
- the computer 1000 has a bus 1020 , a processor 1040 , a memory 1060 , a storage device 1080 , an input and output interface 1100 , and a network interface 1120 .
- the bus 1020 is a data transmission path for the processor 1040 , the memory 1060 , the storage device 1080 , the input and output interface 1100 , and the network interface 1120 to mutually transmit and receive data.
- a method of mutually connecting the processors 1040 and the like is not limited to the bus connection.
- the processor 1040 is an arithmetic apparatus such as a central processing unit (CPU) or a graphics processing unit (GPU).
- the memory 1060 is a main storage apparatus formed by a random access memory (RAM) or the like.
- the storage device 1080 is an auxiliary storage apparatus formed by a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), or the like. However, the storage device 1080 may be configured by hardware similar to the hardware used to configure the main storage apparatus, such as the RAM.
- the input and output interface 1100 is an interface for connecting the computer 1000 to an input and output device.
- the network interface 1120 is an interface for connecting the computer 1000 to a communication network.
- This communication network is, for example, a local area network (LAN) or a wide area network (WAN).
- the method of connecting the network interface 1120 to the communication network may be a wireless connection or a wired connection.
- the computer 1000 is connected to the camera 10 in a communicable manner through a network.
- the method of connecting the computer 1000 to the camera 10 in a communicable manner is not limited to connection through the network.
- the computer 1000 does not necessarily need to be connected to the camera 10 in a communicable manner as long as the captured image 12 generated by the camera 10 is acquired.
- the storage device 1080 stores program modules to form the respective functional configuration units (the detection unit 2020 , the deciding unit 2040 , and the determination unit 2060 ) of the information processing apparatus 2000 .
- the processor 1040 reads each of the program modules into the memory 1060 and executes each program module to realize a function corresponding to each program module.
- the camera 10 is any camera that can repeatedly perform the imaging and generate the plurality of captured images 12 .
- the camera 10 may be a video camera that generates the video data or a still camera that generates still image data.
- the captured image 12 is a video frame constituting the video data in the former case.
- the camera 10 may be a two-dimensional camera or a three-dimensional camera (stereo camera or depth camera).
- the captured image 12 may be a depth image in a case where the camera 10 is the depth camera.
- the depth image is an image in which a value of each pixel of the image represents a distance between an imaged item and the camera.
- the camera 10 may be an infrared camera.
- the computer 1000 that forms the information processing apparatus 2000 may be the camera 10 .
- the camera 10 analyzes the captured image 12 generated by itself to determine the motion of the customer 20 .
- an intelligent camera for example, an intelligent camera, a network camera, or a camera called an Internet protocol (IP) camera can be used.
- IP Internet protocol
- FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus 2000 according to the example embodiment 1.
- the detection unit 2020 acquires the captured image 12 (S 102 ).
- the detection unit 2020 detects the reference position 24 of the hand of the product 40 from the acquired captured image 12 (S 104 ).
- the deciding unit 2040 decides the analysis target region 30 using the detected reference position 24 (S 106 ).
- the determination unit 2060 performs the image analysis on the decided analysis target region (S 108 ).
- the determination unit 2060 determines the motion of the customer 20 based on the result of the image analysis of the analysis target region 30 (S 108 ).
- the plurality of captured images 12 may be used to determine the motion of the customer 20 .
- the image analysis is performed on the analysis target regions 30 decided for each of the plurality of captured images 12 (image analysis is performed on a plurality of analysis target regions 30 ) to determine the motion of the customer 20 . That is, the processing of S 102 to S 108 is performed for each of the plurality of captured images 12 , and the processing of S 110 is performed using the result.
- the information processing apparatus 2000 executes a series of pieces of processing shown in FIG. 4 . For example, each time the captured image 12 is generated by the camera 10 , the information processing apparatus 2000 executes the series of pieces of processing shown in FIG. 4 for the captured image 12 .
- the information processing apparatus 2000 executes the series of pieces of processing shown in FIG. 4 at a predetermined time interval (for example, every second). In this case, for example, the information processing apparatus 2000 acquires the latest captured image 12 generated by the camera 10 at the timing of starting the series of pieces of processing shown in FIG. 4 .
- the detection unit 2020 acquires the captured image 12 (S 102 ). Any method of the detection unit 2020 to acquire the captured image 12 may be employed. For example, the detection unit 2020 receives the captured image 12 transmitted from the camera 10 . Further, for example, the detection unit 2020 accesses the camera 10 and acquires the captured image 12 stored in the camera 10 .
- the camera 10 may store the captured image 12 in a storage apparatus provided outside the camera 10 .
- the detection unit 2020 accesses the storage apparatus and acquires the captured image 12 .
- the information processing apparatus 2000 acquires the captured image 12 generated by the information processing apparatus 2000 itself.
- the captured image 12 is stored in, for example, the memory 1060 or the storage device 1080 (refer to FIG. 3 ) inside the information processing apparatus 2000 . Therefore, the detection unit 2020 acquires the captured image 12 from the memory 1060 or the storage device 1080 .
- the captured image 12 (that is, an imaging range of the camera 10 ) includes at least a range in front of the product shelf 50 .
- FIG. 5 is a first diagram illustrating the imaging range of the camera 10 .
- an imaging range 14 of the camera 10 includes a range of a distance d 1 from the front surface of the product shelf 50 to the front side.
- FIG. 6 is a second diagram illustrating the imaging range of the camera 10 .
- the imaging range 14 of the camera 10 includes a range from a position apart from the front surface of the product 40 to the front side by d 2 to a position apart from the front side of the product 40 to the front side by d 3 .
- the captured images 12 in FIGS. 5 and 6 include scenes in which the product shelf 50 is viewed down from above.
- the camera 10 is installed so as to image the product shelf 50 from above the product shelf 50 .
- the captured image 12 may not include the scene in which the product shelf 50 is viewed down from above.
- the captured image 12 may include a scene in which the product shelf 50 is imaged from the side.
- FIG. 7 is a diagram illustrating a case where the captured image 12 includes a scene in which the product shelf 50 is imaged from the right side as viewed from the front.
- the detection unit 2020 detects the reference position 24 from the captured image 12 (S 104 ).
- the reference position 24 indicates the position of the hand of the customer 20 .
- the position of the hand of the customer 20 is, for example, the center position of the hand or the position of the fingertip.
- the detection unit 2020 performs feature value matching using a feature value of the hand of the person, which is prepared in advance, to detect a region matching the feature value (with high similarity to the feature value) from the captured image 12 .
- the detection unit 2020 detects a predetermined position (for example, center position) of the detected region, that is, a region representing the hand as the reference position 24 of the hand.
- the detection unit 2020 may detect the reference position 24 using machine learning.
- the detection unit 2020 is configured as a detector using the machine learning.
- the detection unit 2020 is caused to learn in advance using one or more captured images (a set of a captured image and coordinates of the reference position 24 in the captured image) in which the reference positions 24 are known. With this, the detection unit 2020 can detect the reference position 24 from the acquired captured image 12 .
- various models such as a neural network can be used as a machine learning prediction model.
- the learning of the detection unit 2020 is preferably performed on the hand of the customer 20 in various poses. Specifically, captured images for learning are prepared for the hand of customers 20 in various poses. With this, it is possible to detect the reference position 24 from each captured image 12 with high accuracy even though the pose of the hand of the customer 20 is different for each captured image 12 .
- the detection unit 2020 may detect various parameters relating to the hand of the customer 20 in addition to the reference position 24 .
- the detection unit 2020 detects a width, length, and pose of the hand, and a distance between the reference position 24 and the camera 10 .
- the detection unit 2020 determines the width, length, pose, and the like of the hand from a shape and size of a detected hand region.
- the detection unit 2020 is caused to learn using one or more captured images in which the width, length, and pose of the hand, the distance between the reference position 24 and the camera 10 , and the like are known. With this, it is possible for the detection unit 2020 to detect various parameters such as the hand width in addition to the reference position 24 from the acquired captured image 12 .
- the deciding unit 2040 decides the analysis target region 30 using the detected reference position 24 (S 106 ). There are various methods for the deciding unit 2040 to decide the analysis target region 30 .
- the deciding unit 2040 is a region having a predetermined shape defined with the reference position 24 as a reference among the regions included in the captured image 12 .
- FIG. 8 are diagrams illustrating the analysis target region 30 that is decided as the region having the predetermined shape defined with the reference position 24 as the reference.
- FIG. 8 A represents a case where the reference position 24 is used as a position representing a predetermined position of the analysis target region 30 .
- the analysis target region in FIG. 8 A is a rectangle with the reference position 24 as the center.
- the analysis target region 30 is a rectangle having a height h and a width w.
- the reference position 24 may be used as a position that defines a position other than the center of the analysis target region 30 such as the upper left end or lower right end of the analysis target region 30 .
- FIG. 8 B represents a case where a predetermined position (center, upper left corner, or the like) of the analysis target region 30 is defined by a position having a predetermined relationship with the reference position 24 .
- the analysis target region 30 in FIG. 8 B is a rectangle with a position moved by a predetermined vector v from the reference position 24 as the center. The size and orientation of the rectangle are the same as the analysis target region in FIG. 8 A .
- the orientation of the analysis target region 30 is defined based on an axial direction of the captured image 12 . More specifically, a height direction of the analysis target region 30 is defined as a Y-axis direction of the captured image 12 . However, the orientation of the analysis target region 30 may be defined based on a direction other than the axial direction of the captured image 12 .
- the detection unit 2020 detects the pose of the hand of the customer 20 .
- the orientation of the analysis target region 30 may be defined based on the orientation of the hand.
- FIG. 9 is a diagram illustrating a case where the orientation of the analysis target region 30 is defined based on the orientation of the hand of the customer 20 .
- the orientation of the analysis target region 30 is defined as a depth direction of the hand (direction from the wrist to the fingertip).
- the orientation of the analysis target region 30 in each of the plurality of captured images 12 may be different in a case where the orientation of the analysis target region 30 is defined based on the orientation of the hand of the customer 20 as described above. Therefore, it is preferable that the deciding unit 2040 performs geometric conversion such that the orientations of the plurality of analysis target regions 30 are aligned. For example, the deciding unit 2040 extracts the analysis target region 30 from each captured image 12 and performs the geometric conversion on each extracted analysis target region 30 such that the depth direction of the hand of the customer 20 faces the Y-axis direction.
- the size of the analysis target region 30 may be defined statically or may be decided dynamically. In the latter case, the size of the analysis target region 30 is decided by, for example, the following equation (1).
- the h and w are respectively the height and width of the analysis target region 30 .
- the s b is a reference area defined in advance for the hand region.
- the h b and w b are respectively the height and width of the analysis target region 30 defined in advance in association with the reference area.
- the s r is an area of the hand region detected from the captured image 12 by the detection unit 2020 .
- the size of the analysis target region 30 may be dynamically decided using the following equation (2).
- the h and w are respectively the height and width of the analysis target region 30 .
- the d b is a reference distance value defined in advance.
- the h b and w b are respectively the height and width of the analysis target region 30 associated with the reference distance value.
- the d r is a distance value between the reference position 24 detected from the captured image 12 and the camera 10 .
- the detection unit 2020 determines the distance value dr based on a pixel value at the reference position 24 in the depth image generated by the depth camera.
- the detection unit 2020 may be configured to detect the distance between the reference position 24 and the camera 10 in addition to the reference position 24 when the detection unit 2020 is configured as the detector using the machine learning.
- each pixel of the analysis target region 30 decided by the above method may be corrected, and the corrected analysis target region 30 may be used for the image analysis by the determination unit 2060 .
- the deciding unit 2040 corrects each pixel in the analysis target region 30 using, for example, the following equation (3).
- the d (x,y)0 is a pixel value before the correction at coordinates (x,y) of the analysis target region 30 in the captured image 12 .
- the d (x,y)1 is a pixel value after the correction at the coordinates (x,y) of the analysis target region 30 in the captured image 12 .
- the determination unit 2060 performs the image analysis on the decided analysis target region 30 to determine the motion of the customer 20 (S 108 and S 110 ).
- the motion of the customer 20 is, for example, any of (1) motion of taking out the product 40 from the product shelf 50 , (2) motion of placing the product 40 on the product shelf 50 , (3) motion of not holding the product 40 both before and after the contact with the product shelf 50 , and (4) motion of holding the product 40 both before and after the contact with the product shelf 50 .
- the contact between the product shelf 50 and the customer 20 means that the image region of the product shelf 50 and the image region of the customer 20 overlap at least partially in the captured image 12 , and there is no need for the product shelf 50 and the customer to contact each other in the real space.
- a product 40 held by the customer 20 before the contact between the customer 20 and the product shelf 50 may be the same as or different from a product 40 held by the customer 20 after the contact between the customer 20 and the product shelf 50 .
- FIGS. 10 and 11 are flowcharts illustrating the flow of the processing for determining the motion of the customer 20 .
- the determination unit 2060 detects the captured image 12 including a scene in which the reference position 24 moves toward the product shelf 50 (S 202 ).
- the determination unit 2060 computes a distance between the reference position 24 and the product shelf 50 for each of the plurality of captured images 12 in a time series. In a case where the distance decreases over time in one or more captured images 12 , the captured images 12 are detected as the captured image 12 including the scene in which the reference position 24 moves toward the product shelf 50 .
- the determination unit 2060 decides whether the product 40 is included in the analysis target region 30 in the captured image 12 detected in S 202 (S 204 ). In a case where the product 40 is included in the analysis target region 30 (YES in S 204 ), the processing in FIG. 10 proceeds to S 206 . On the other hand, in a case where the product 40 is not included in the analysis target region 30 (NO in S 204 ), the processing in FIG. 10 proceeds to S 216 .
- the determination unit 2060 detects a captured image 12 including a scene in which the reference position 24 moves in a direction away from the product shelf 50 from among the captured images 12 generated later than the captured image 12 detected in S 202 (S 206 ). For example, the determination unit 2060 computes the distance between the reference position 24 and the product shelf 50 for each of the plurality of captured images 12 in a time series generated later than the captured image 12 detected in S 202 . In a case where the distance increases over time in one or more captured images 12 , the captured images 12 are detected as the captured image 12 including the scene in which the reference position 24 moves in the direction away from the product shelf 50 .
- the determination unit 2060 decides whether the product 40 is included in the analysis target region 30 in the captured image 12 detected in S 206 (S 208 ). In a case where the product 40 is included in the analysis target region 30 (YES in S 208 ), the product 40 is held in both a hand moving toward the product shelf 50 and a hand moving in the direction away from the product shelf 50 . Therefore, the determination unit 2060 determines that the motion of the customer 20 is “(4) motion of holding the product 40 both before and after the contact with the product shelf 50 ” (S 210 ).
- the determination unit 2060 determines that the motion of the customer 20 is “(2) motion of placing the product 40 on the product shelf 50 ” (S 212 ).
- the determination unit 2060 detects the captured image 12 including the scene in which the reference position 24 moves in the direction away from the product shelf 50 from among the captured images 12 generated later than the captured image 12 detected in S 202 .
- the detection method is the same as the method executed in S 206 .
- the determination unit 2060 decides whether the product 40 is included in the analysis target region 30 in the captured image 12 detected in S 214 (S 216 ). In a case where the product 40 is included in the analysis target region 30 (YES in S 216 ), the product 40 is held in the hand moving in the direction away from the product shelf 50 while the product 40 is not held in the hand moving toward the product shelf 50 . Therefore, the determination unit 2060 determines that the motion of the customer 20 is “(1) motion of taking out the product 40 from the product shelf 50 ” (S 218 ).
- the determination unit 2060 determines that the motion of the customer 20 is “(3) motion of not holding the product 40 both before and after contact with the product shelf 50 ” (S 220 ).
- the determination unit 2060 first extracts an image region excluding a background region, that is, a foreground region, from the analysis target region 30 decided for each of the plurality of captured images 12 in a time series. Note that an existing technique can be used as a technique of determining the background region for a captured image to be imaged by the camera 10 installed at a predetermined place.
- the determination unit 2060 decides that the product 40 is included in the analysis target region 30 in a case where the foreground region includes a region other than the image region representing the hand of the customer 20 .
- the determination unit 2060 may decide that the product 40 is included in the analysis target region 30 only in a case where a size of the image region excluding the image region representing the hand in the foreground region is equal to or larger than a predetermined size. With this, it is possible to prevent the noise included in the captured image 12 from being erroneously detected as the product 40 .
- the method of deciding whether the product 40 is included in the analysis target region is not limited to the method described above. Various existing methods can be used as the method of deciding whether the product 40 is included in the analysis target region 30 , that is, whether the hand of the person included in the image has the product.
- the determination unit 2060 may determine the motion of the customer 20 from one captured image 12 . For example, in this case, the determination unit 2060 determines the motion of the customer 20 as “holding the product 40 ” or “not holding the product 40 ”.
- the determination unit 2060 may determine the taken-out product 40 when the customer takes out the product 40 from the product shelf 50 .
- the determination of the product 40 means, for example, that information for identifying the product 40 from other products 40 (for example, an identifier or a name of the product 40 ) is determined.
- the information for identifying the product 40 is referred to as product identification information.
- the determination unit 2060 determines a place in the product shelf 50 where the customer 20 takes out the product 40 to determine the taken-out product 40 .
- the display place of the product 40 is defined in advance.
- information indicating which product is displayed at each position of the product shelf 50 is referred to as display information.
- the determination unit 2060 determines a place in the product shelf 50 from which the customer 20 takes out a product 40 using the captured image 12 and determines the taken-out product 40 using the determined place and the display information.
- FIG. 12 is a diagram illustrating the display information in a table format.
- a table shown in FIG. 12 is referred to as a table 200 .
- the table 200 is created for each product shelf 50 .
- the table 200 has two columns of a stage 202 and product identification information 204 .
- the product identification information 204 indicates the identifier of the product 40 .
- a record in a first row displays a product 40 determined by an identifier i 0001 in a first stage of the product shelf 50 .
- the determination unit 2060 determines the stage of the product shelf 50 from which the product 40 is taken out, using the captured image 12 .
- the determination unit 2060 acquires the product identification information associated with the stage in display information to determine the product 40 taken out from the product shelf 50 .
- several methods of determining the stage of the product shelf 50 from which the product 40 is taken out will be illustrated.
- the captured image 12 includes a scene in which the product shelf 50 is imaged from above (refer to FIG. 5 ).
- the camera 10 images the product shelf 50 from above.
- the depth camera is used as the camera 10 .
- the depth camera generates a depth image in addition to or instead of a common captured image.
- the depth image is an image in which the value of each pixel of the image represents the distance between the imaged item and the camera.
- FIG. 13 is a diagram illustrating a depth image generated by the camera 10 .
- pixels representing an item closer to the camera 10 are closer to white (brighter) and pixels representing an item farther from the camera 10 are closer to black (darker). Note that darker portions are densely drawn with larger black dots and brighter portions are sparsely drawn with smaller black dots in FIG. 13 for convenience of illustration.
- the determination unit 2060 determines a stage of the product shelf 50 where the reference position 24 is present, based on the value of the pixel representing the reference position 24 in the depth image. At this time, a range of a distance from the camera 10 for each stage of the product shelf 50 is defined in advance in the display information.
- FIG. 14 is a diagram illustrating display information indicating the range of the distance from the camera 10 for each stage of the product shelf 50 .
- the table 200 in FIG. 14 indicates that the range of the distance between a first shelf of the product shelf 50 and the camera 10 is equal to or larger than d 1 and less than d 2 . In other words, the distance between the top of the first shelf and the camera 10 is d 1 , and the distance between the top of a second shelf and the camera 10 is d 2 .
- the determination unit 2060 determines a stage of the product shelf 50 where the reference position 24 is present, based on the reference position 24 of the depth image including a scene in which the customer 20 takes out the product 40 and the display information shown in FIG. 14 .
- the determined stage is defined as the stage from which the product 40 is taken out.
- the pixel at the reference position 24 in the depth image indicates that the distance between the reference position 24 and the camera 10 is a.
- a is equal to or larger than d 1 and equal to or less than d 2 .
- the determination unit 2060 determines that the reference position 24 is present on the first shelf of the product shelf 50 based on the display information shown in FIG. 14 . That is, the determination unit 2060 determines that the shelf from which the product 40 is taken out is the first shelf of the product shelf 50 .
- the captured image 12 includes a scene in which the product shelf 50 is viewed from the side.
- the camera 10 images the product shelf 50 from the lateral direction.
- the determination unit 2060 determines a stage in the product shelf 50 where a position of the reference position 24 in the height direction (Y coordinates) detected from the captured image 12 is present.
- the determined stage is defined as the stage of the product shelf 50 from which the product 40 is taken out.
- the captured image 12 may be a depth image or a common image.
- a plurality of types of products may be displayed on one stage by dividing one stage of the product shelf 50 into a plurality of columns in the horizontal direction.
- the determination unit 2060 respectively determines a position in the horizontal direction and a position in the height direction for the reference position 24 of the hand of the customer 20 who takes out the product 40 from the product shelf 50 to determine the product 40 .
- the product identification information is shown for each combination of stage and column in the display information.
- the position of the reference position 24 in the horizontal direction is determined by the X coordinates of the reference position 24 in the captured image 12 .
- the determination unit 2060 determines the position of the reference position 24 in the horizontal direction using the depth image.
- the method of determining the position of the reference position 24 in the horizontal direction, using the depth image including a scene in which the product shelf 50 is imaged from the lateral direction is the same as the method of determining the position of the reference position 24 in the height direction, using the depth image including the scene in which the product shelf 50 is imaged from above.
- the determination unit 2060 may determine the product 40 to be placed on the product shelf 50 by the similar method. However, in this case, the determination unit 2060 uses a captured image 12 including a scene in which the product 40 is placed on the product shelf 50 .
- the determination unit 2060 may decide whether the products 40 held by the customer 20 before and after the contact between the customer 20 and the product shelf 50 are the same based on the method of determining the product 40 described above. For example, the determination unit 2060 determines the product 40 before the contact between the customer 20 and the product shelf by the same method as the method of determining the product 40 to be placed on the product shelf 50 . Furthermore, the determination unit 2060 determines the product 40 after the contact between the customer 20 and the product shelf 50 by the same method as the method of determining the product 40 to be taken out from the product shelf 50 .
- the determination unit 2060 decides that the products 40 held by the customer 20 before and after the contact between the customer 20 and the product shelf 50 are the same. In this case, it can be said that the motion of the customer 20 is a “motion of reaching for the product shelf 50 to place the product 40 , but not placing the product 40 ”.
- the determination unit 2060 decides that the products 40 held by the customer 20 before and after the contact between the customer 20 and the product shelf 50 are different from each other. In this case, it can be said that the motion of the customer 20 is a “motion of placing the held product 40 and taking out another product 40 ”.
- the determination unit 2060 computes magnitude of a difference (difference in area or color) between the foreground region of the analysis target region 30 before the contact between the customer 20 and the product shelf 50 and the foreground region of the analysis target region 30 after the contact between the customer 20 and the product shelf and decides that the products 40 before and after the contact are different from each other in a case where the magnitude of the computed difference is equal to or larger than a predetermined value.
- the determination unit 2060 decides that the products 40 before and after the contact are the same in a case where the magnitude of the difference is less than the predetermined value.
- the determination unit 2060 decides whether the products 40 before and after the contact are the same based on the difference in the reference positions 24 before and after the contact between the customer 20 and the product shelf 50 .
- the determination unit 2060 respectively determines, using the display information described above, a stage of the product shelf 50 where the reference position 24 is present before the contact between the customer 20 and the product shelf 50 and a stage of the product shelf 50 where the reference position 24 is present after the contact between the customer 20 and the product shelf 50 .
- the determination unit 2060 decides that the products 40 before and after the contact are different from each other.
- the determination unit 2060 decides that the products 40 before and after the contact are the same.
- the motion of the customer 20 determined by the determination unit 2060 can be used to analyze an action performed in front of the product shelf 50 (so-called front-shelf action) by the customer 20 .
- the determination unit 2060 outputs various pieces of information such as a motion performed in front of the product shelf 50 by each customer 20 , a date and time when the motion is performed, and a product 40 subjected to the motion.
- This information is, for example, stored in a storage apparatus connected to the information processing apparatus 2000 or transmitted to a server apparatus connected to the information processing apparatus 2000 in a communicable manner.
- various existing methods can be used as the method of analyzing the front-shelf action based on various motions of the customer 20 performed in front of the product shelf 50 .
- a usage scene of the information processing apparatus 2000 is not limited to the determination of the motion of the customer in the store.
- the information processing apparatus 2000 can be used to determine the motion of a factory worker or the like.
- the motion of each worker determined by the information processing apparatus 2000 is compared with a motion of each worker defined in advance, and thus it is possible to confirm whether the worker correctly performs a predetermined job.
Abstract
An information processing apparatus (2000) analyzes a captured image (12) generated by a camera (10) to determine a motion of a person. The camera (10) is a camera that images a display place where an item is displayed. The information processing apparatus (2000) detects a reference position (24) from the captured image (12). The reference position (24) indicates a position of a hand of the person. The information processing apparatus (2000) decides an analysis target region (30) to be analyzed in the captured image (12) using the reference position (24). The information processing apparatus (2000) analyzes the analysis target region (30) to determine the motion of the person.
Description
- This application is a continuation application of U.S. patent application Ser. No. 17/349,045, filed on Jun. 16, 2021, which is a continuation application of U.S. patent application Ser. No. 16/623,656, filed on Dec. 17, 2019, which is a national stage application of International Application No. PCT/JP2017/022875, filed on Jun. 21, 2017, the disclosure of which is hereby incorporated by reference in its entirety.
- The present invention relates to image analysis.
- In a store, a customer takes out and purchases a product displayed in a display place (for example, a product shelf). The customer may return the product once picked up to the display place. Techniques of analyzing such an action of the customer related to the product displayed are developed.
- For example,
Patent Document 1 discloses a technique of detecting that an item (a hand of person) enters a determined region (shelf) using a depth image obtained from an imaging result by a depth camera and determining a motion of a customer using a color image near an entry position before and after the entry. Specifically, a color image including a hand of a person entering the determined region is compared with a color image including the hand of the person leaving the determined region to respectively determine the motions of the person as “acquisition of product” in a case where an increase in a color exceeds a threshold value, “return of product” in a case where a decrease in the color exceeds a threshold value, and “contact” in a case where a change in the color is less than a threshold value. Further,Patent Document 1 discloses a technique of deciding the increase or decrease in a volume of a subject from information on a size of the subject obtained from the imaging result of the depth camera to distinguish between the acquisition and the return of the product. -
- [Patent Document 1] US Patent Application No. 2014/0132728
- A degree of increase or decrease in color or volume before and after the entry of the hand of the person into the display place is affected by, for example, changes in a size of the product or a pose of the hand of the person. For example, in a case where a small product is taken out from the display place, the increase in color and volume before and after that is small. Further, a motion of changing the pose of the hand may be erroneously recognized as the motion of acquiring the product.
- The present invention is made in view of the above problems. One of the objects of the present invention is to provide a technique of determining a motion of a person with respect to a displayed item with high accuracy.
- The information processing apparatus according to the present invention includes: 1) a detection unit that detects a reference position indicating a position of a hand of a person included in a captured image from the captured image in which a display place of an item is imaged; 2) a deciding unit that decides an analysis target region in the captured image using the detected reference position and decides the analysis target region; and 3) a determination unit that analyzes the decided analysis target region to determine a motion of the person.
- A control method of the present invention is executed by a computer. The control method includes: 1) a detection step of detecting a reference position indicating a position of a hand of a person included in a captured image from the captured image in which a display place of an item is imaged; 2) a deciding step of deciding an analysis target region in the captured image using the detected reference position and deciding the analysis target region; and 3) a determination step of analyzing the decided analysis target region to determine a motion of the person.
- A program of the present invention causes a computer to execute each step of the control method of the present invention.
- According to this invention, there is provided the technique of determining the motion of the person with respect to the displayed item with high accuracy.
- The objects described above and other objects, features, and advantages will become more apparent from preferred example embodiments described below and the following drawings accompanying the example embodiments.
-
FIG. 1 is a diagram conceptually illustrating an operation of an information processing apparatus according to anexample embodiment 1. -
FIG. 2 is a block diagram illustrating an example of a functional configuration of the information processing apparatus according to theexample embodiment 1. -
FIG. 3 is a diagram illustrating a computer for forming the information processing apparatus. -
FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to theexample embodiment 1. -
FIG. 5 is a first diagram illustrating an imaging range of a camera. -
FIG. 6 is a second diagram illustrating the imaging range of the camera. -
FIG. 7 is a diagram illustrating a case where a captured image includes a scene in which a product shelf is imaged from the right side as viewed from the front. -
FIGS. 8A and 8B are diagrams illustrating an analysis target region that is decided as a region having a predetermined shape defined with a reference position as a reference. -
FIG. 9 is a diagram illustrating a case where an orientation of the analysis target region is defined based on an orientation of a hand of a customer. -
FIG. 10 is a flowchart illustrating a flow of processing for determining a motion of acustomer 20. -
FIG. 11 is a flowchart illustrating the flow of the processing for determining the motion of thecustomer 20. -
FIG. 12 is a diagram illustrating display information in a table format. -
FIG. 13 is a diagram illustrating a depth image generated by the camera. -
FIG. 14 is a diagram illustrating display information indicating a range of a distance from a camera for each stage of the product shelf. - Hereinafter, example embodiments of the present invention will be described with reference to drawings. Note that, in all the drawings, the same reference numeral is assigned to the same component and the description thereof will not be repeated. Further, each block represents a configuration of functional units instead of a configuration of hardware units in each block diagram, unless otherwise described.
- <Outline of Operation of Information Processing Apparatus 2000>
-
FIG. 1 is a diagram conceptually illustrating an operation of an information processing apparatus according to an example embodiment 1 (information processing apparatus 2000 shown inFIG. 2 and the like described below). Note thatFIG. 1 is an illustration for easily understanding the operation of theinformation processing apparatus 2000 and the operation of theinformation processing apparatus 2000 is not limited byFIG. 1 . - The
information processing apparatus 2000 analyzes a capturedimage 12 generated by acamera 10 to determine a motion of a person. Thecamera 10 is a camera that images a display place where an item is displayed. Thecamera 10 repeatedly performs imaging and generates a plurality of capturedimages 12. The plurality of generated capturedimages 12 are, for example, a frame group that constitutes video data. However, the plurality of capturedimages 12 generated by thecamera 10 do not necessarily need to constitute the video data and may be handled as individual still image data. - An item to be imaged by the
camera 10 can be any item that is displayed at the display place, and is taken out from the display place by a person or is placed (returned) in the display place by a person on the contrary. A specific item to be imaged by thecamera 10 varies depending on a usage environment of theinformation processing apparatus 2000. - For example, it is assumed that the
information processing apparatus 2000 is used to determine the motion of a customer or a store clerk in a store. In this case, the item to be imaged by thecamera 10 is a product sold in the store. Further, the display place described above is, for example, a product shelf. InFIG. 1 , theinformation processing apparatus 2000 is used to determine the motion of acustomer 20. Therefore, the person and the item to be imaged by thecamera 10 are respectively thecustomer 20 and aproduct 40. Further, the display place is aproduct shelf 50. - In addition, for example, it is assumed that the
information processing apparatus 2000 is used to determine the motion of a factory worker or the like. In this case, the person to be imaged by thecamera 10 is the worker or the like. Further, the item to be imaged by thecamera 10 is a material, a tool, or the like which is used in the factory. Furthermore, the display place is a shelf installed in, for example, a warehouse of the factory. - For ease of explanation, a case where the
information processing apparatus 2000 is used to determine the motion of the customer (customer 20 inFIG. 1 ) in the store will be described as an example, unless otherwise noted in this specification. Therefore, it is assumed that the “motion of person” determined by thedetermination unit 2060 is the “motion of customer”. Further, it is assumed that the “item” to be imaged by the camera is the “product”. Furthermore, it is assumed that the “display place” is the “product shelf”. - The
information processing apparatus 2000 detects areference position 24 from the capturedimage 12. Thereference position 24 indicates a position of a hand of the person. The position of the hand of the person is, for example, a center position of the hand or a position of a fingertip. Theinformation processing apparatus 2000 decides a region to be analyzed (analysis target region 30) in the capturedimage 12 using thisreference position 24. Theinformation processing apparatus 2000 analyzes theanalysis target region 30 to determine the motion of thecustomer 20. For example, the motion of thecustomer 20 is a motion of holding theproduct 40, a motion of taking out theproduct 40 from theproduct shelf 50, or a motion of placing theproduct 40 on theproduct shelf 50. - In a case where it is intended that image analysis is performed on the entire captured
image 12 to determine the motion of thecustomer 20, the motion may not be accurately determined when a size of theproduct 40 is small or when a pose of the hand of thecustomer 20 varies significantly. In this regard, theinformation processing apparatus 2000 first detects thereference position 24 indicating the position of the hand of thecustomer 20 in the capturedimage 12 and decides theanalysis target region 30 based on thereference position 24. That is, the image analysis is performed near the hand of thecustomer 20. Therefore, even when the size of theproduct 40 is small or when the pose of the hand of thecustomer 20 varies significantly, it is possible to determine the motion by the hand of thecustomer 20 such as acquiring theproduct 40, placing theproduct 40, or holding theproduct 40 with high accuracy. - Hereinafter, the
information processing apparatus 2000 according to the present example embodiment will be described in more detail. -
FIG. 2 is a block diagram illustrating an example of a functional configuration of theinformation processing apparatus 2000 according to theexample embodiment 1. Theinformation processing apparatus 2000 has adetection unit 2020, a decidingunit 2040, and adetermination unit 2060. Thedetection unit 2020 detects thereference position 24 of the hand of the person included in the capturedimage 12 from the capturedimage 12. The decidingunit 2040 decides theanalysis target region 30 in the capturedimage 12 using the detectedreference position 24. Thedetermination unit 2060 analyzes the decidedanalysis target region 30 to determine the motion of the person. - Each functional configuration unit of the
information processing apparatus 2000 may be formed by hardware (for example, a hard-wired electronic circuit) that forms each functional configuration unit or a combination of hardware and software (for example, a combination of an electronic circuit and a program that controls the circuit). Hereinafter, the case where each functional configuration unit of theinformation processing apparatus 2000 is formed by the combination of hardware and software will be further described. -
FIG. 3 is a diagram illustrating acomputer 1000 for forming theinformation processing apparatus 2000. Thecomputer 1000 is a variety of computers. For example, thecomputer 1000 is a personal computer (PC), a server machine, a tablet terminal, or a smartphone. In addition, for example, thecomputer 1000 may be thecamera 10. Thecomputer 1000 may be a dedicated computer designed to form theinformation processing apparatus 2000 or may be a general-purpose computer. - The
computer 1000 has abus 1020, aprocessor 1040, amemory 1060, astorage device 1080, an input andoutput interface 1100, and anetwork interface 1120. Thebus 1020 is a data transmission path for theprocessor 1040, thememory 1060, thestorage device 1080, the input andoutput interface 1100, and thenetwork interface 1120 to mutually transmit and receive data. However, a method of mutually connecting theprocessors 1040 and the like is not limited to the bus connection. Theprocessor 1040 is an arithmetic apparatus such as a central processing unit (CPU) or a graphics processing unit (GPU). Thememory 1060 is a main storage apparatus formed by a random access memory (RAM) or the like. Thestorage device 1080 is an auxiliary storage apparatus formed by a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), or the like. However, thestorage device 1080 may be configured by hardware similar to the hardware used to configure the main storage apparatus, such as the RAM. - The input and
output interface 1100 is an interface for connecting thecomputer 1000 to an input and output device. Thenetwork interface 1120 is an interface for connecting thecomputer 1000 to a communication network. This communication network is, for example, a local area network (LAN) or a wide area network (WAN). The method of connecting thenetwork interface 1120 to the communication network may be a wireless connection or a wired connection. - For example, the
computer 1000 is connected to thecamera 10 in a communicable manner through a network. However, the method of connecting thecomputer 1000 to thecamera 10 in a communicable manner is not limited to connection through the network. However, thecomputer 1000 does not necessarily need to be connected to thecamera 10 in a communicable manner as long as the capturedimage 12 generated by thecamera 10 is acquired. - The
storage device 1080 stores program modules to form the respective functional configuration units (thedetection unit 2020, the decidingunit 2040, and the determination unit 2060) of theinformation processing apparatus 2000. Theprocessor 1040 reads each of the program modules into thememory 1060 and executes each program module to realize a function corresponding to each program module. - <
About Camera 10> - The
camera 10 is any camera that can repeatedly perform the imaging and generate the plurality of capturedimages 12. Thecamera 10 may be a video camera that generates the video data or a still camera that generates still image data. Note that the capturedimage 12 is a video frame constituting the video data in the former case. - The
camera 10 may be a two-dimensional camera or a three-dimensional camera (stereo camera or depth camera). Note that the capturedimage 12 may be a depth image in a case where thecamera 10 is the depth camera. The depth image is an image in which a value of each pixel of the image represents a distance between an imaged item and the camera. Furthermore, thecamera 10 may be an infrared camera. - As described above, the
computer 1000 that forms theinformation processing apparatus 2000 may be thecamera 10. In this case, thecamera 10 analyzes the capturedimage 12 generated by itself to determine the motion of thecustomer 20. As thecamera 10 having such a function, for example, an intelligent camera, a network camera, or a camera called an Internet protocol (IP) camera can be used. - <Flow of Processing>
-
FIG. 4 is a flowchart illustrating a flow of processing executed by theinformation processing apparatus 2000 according to theexample embodiment 1. Thedetection unit 2020 acquires the captured image 12 (S102). Thedetection unit 2020 detects thereference position 24 of the hand of theproduct 40 from the acquired captured image 12 (S104). The decidingunit 2040 decides theanalysis target region 30 using the detected reference position 24 (S106). Thedetermination unit 2060 performs the image analysis on the decided analysis target region (S108). Thedetermination unit 2060 determines the motion of thecustomer 20 based on the result of the image analysis of the analysis target region 30 (S108). - Here, the plurality of captured
images 12 may be used to determine the motion of thecustomer 20. In this case, the image analysis is performed on theanalysis target regions 30 decided for each of the plurality of captured images 12 (image analysis is performed on a plurality of analysis target regions 30) to determine the motion of thecustomer 20. That is, the processing of S102 to S108 is performed for each of the plurality of capturedimages 12, and the processing of S110 is performed using the result. - <Timing when
Information Processing Apparatus 2000 Executes Processing> - There are various timings when the
information processing apparatus 2000 executes a series of pieces of processing shown inFIG. 4 . For example, each time the capturedimage 12 is generated by thecamera 10, theinformation processing apparatus 2000 executes the series of pieces of processing shown inFIG. 4 for the capturedimage 12. - In addition, for example, the
information processing apparatus 2000 executes the series of pieces of processing shown inFIG. 4 at a predetermined time interval (for example, every second). In this case, for example, theinformation processing apparatus 2000 acquires the latest capturedimage 12 generated by thecamera 10 at the timing of starting the series of pieces of processing shown inFIG. 4 . - <Acquisition of Captured Image 12: S102>
- The
detection unit 2020 acquires the captured image 12 (S102). Any method of thedetection unit 2020 to acquire the capturedimage 12 may be employed. For example, thedetection unit 2020 receives the capturedimage 12 transmitted from thecamera 10. Further, for example, thedetection unit 2020 accesses thecamera 10 and acquires the capturedimage 12 stored in thecamera 10. - Note that the
camera 10 may store the capturedimage 12 in a storage apparatus provided outside thecamera 10. In this case, thedetection unit 2020 accesses the storage apparatus and acquires the capturedimage 12. - In a case where the
information processing apparatus 2000 is formed by thecamera 10, theinformation processing apparatus 2000 acquires the capturedimage 12 generated by theinformation processing apparatus 2000 itself. In this case, the capturedimage 12 is stored in, for example, thememory 1060 or the storage device 1080 (refer toFIG. 3 ) inside theinformation processing apparatus 2000. Therefore, thedetection unit 2020 acquires the capturedimage 12 from thememory 1060 or thestorage device 1080. - The captured image 12 (that is, an imaging range of the camera 10) includes at least a range in front of the
product shelf 50.FIG. 5 is a first diagram illustrating the imaging range of thecamera 10. InFIG. 5 , animaging range 14 of thecamera 10 includes a range of a distance d1 from the front surface of theproduct shelf 50 to the front side. - Note that the imaging range of the
camera 10 may not include theproduct shelf 50.FIG. 6 is a second diagram illustrating the imaging range of thecamera 10. InFIG. 6 , theimaging range 14 of thecamera 10 includes a range from a position apart from the front surface of theproduct 40 to the front side by d2 to a position apart from the front side of theproduct 40 to the front side by d3. - Further, the captured
images 12 inFIGS. 5 and 6 include scenes in which theproduct shelf 50 is viewed down from above. In other words, thecamera 10 is installed so as to image theproduct shelf 50 from above theproduct shelf 50. However, the capturedimage 12 may not include the scene in which theproduct shelf 50 is viewed down from above. For example, the capturedimage 12 may include a scene in which theproduct shelf 50 is imaged from the side.FIG. 7 is a diagram illustrating a case where the capturedimage 12 includes a scene in which theproduct shelf 50 is imaged from the right side as viewed from the front. - <Detection of Reference Position 24: S104>
- The
detection unit 2020 detects thereference position 24 from the captured image 12 (S104). As described above, thereference position 24 indicates the position of the hand of thecustomer 20. As described above, the position of the hand of thecustomer 20 is, for example, the center position of the hand or the position of the fingertip. There are various methods for thedetection unit 2020 to detect thereference position 24 from the capturedimage 12. For example, thedetection unit 2020 performs feature value matching using a feature value of the hand of the person, which is prepared in advance, to detect a region matching the feature value (with high similarity to the feature value) from the capturedimage 12. Thedetection unit 2020 detects a predetermined position (for example, center position) of the detected region, that is, a region representing the hand as thereference position 24 of the hand. - In addition, for example, the
detection unit 2020 may detect thereference position 24 using machine learning. Specifically, thedetection unit 2020 is configured as a detector using the machine learning. In this case, thedetection unit 2020 is caused to learn in advance using one or more captured images (a set of a captured image and coordinates of thereference position 24 in the captured image) in which the reference positions 24 are known. With this, thedetection unit 2020 can detect thereference position 24 from the acquired capturedimage 12. Note that various models such as a neural network can be used as a machine learning prediction model. - Here, the learning of the
detection unit 2020 is preferably performed on the hand of thecustomer 20 in various poses. Specifically, captured images for learning are prepared for the hand ofcustomers 20 in various poses. With this, it is possible to detect thereference position 24 from each capturedimage 12 with high accuracy even though the pose of the hand of thecustomer 20 is different for each capturedimage 12. - Here, the
detection unit 2020 may detect various parameters relating to the hand of thecustomer 20 in addition to thereference position 24. For example, thedetection unit 2020 detects a width, length, and pose of the hand, and a distance between thereference position 24 and thecamera 10. In a case where the feature value matching is used, thedetection unit 2020 determines the width, length, pose, and the like of the hand from a shape and size of a detected hand region. In a case where the machine learning is used, thedetection unit 2020 is caused to learn using one or more captured images in which the width, length, and pose of the hand, the distance between thereference position 24 and thecamera 10, and the like are known. With this, it is possible for thedetection unit 2020 to detect various parameters such as the hand width in addition to thereference position 24 from the acquired capturedimage 12. - <Decision of Analysis Target Region 30: S106>
- The deciding
unit 2040 decides theanalysis target region 30 using the detected reference position 24 (S106). There are various methods for the decidingunit 2040 to decide theanalysis target region 30. For example, the decidingunit 2040 is a region having a predetermined shape defined with thereference position 24 as a reference among the regions included in the capturedimage 12. -
FIG. 8 are diagrams illustrating theanalysis target region 30 that is decided as the region having the predetermined shape defined with thereference position 24 as the reference.FIG. 8A represents a case where thereference position 24 is used as a position representing a predetermined position of theanalysis target region 30. Specifically, the analysis target region inFIG. 8A is a rectangle with thereference position 24 as the center. Theanalysis target region 30 is a rectangle having a height h and a width w. Note that thereference position 24 may be used as a position that defines a position other than the center of theanalysis target region 30 such as the upper left end or lower right end of theanalysis target region 30. -
FIG. 8B represents a case where a predetermined position (center, upper left corner, or the like) of theanalysis target region 30 is defined by a position having a predetermined relationship with thereference position 24. Specifically, theanalysis target region 30 inFIG. 8B is a rectangle with a position moved by a predetermined vector v from thereference position 24 as the center. The size and orientation of the rectangle are the same as the analysis target region inFIG. 8A . - In the example of
FIGS. 8 , the orientation of theanalysis target region 30 is defined based on an axial direction of the capturedimage 12. More specifically, a height direction of theanalysis target region 30 is defined as a Y-axis direction of the capturedimage 12. However, the orientation of theanalysis target region 30 may be defined based on a direction other than the axial direction of the capturedimage 12. - For example, it is assumed that the
detection unit 2020 detects the pose of the hand of thecustomer 20. In this case, the orientation of theanalysis target region 30 may be defined based on the orientation of the hand.FIG. 9 is a diagram illustrating a case where the orientation of theanalysis target region 30 is defined based on the orientation of the hand of thecustomer 20. InFIG. 9 , the orientation of theanalysis target region 30 is defined as a depth direction of the hand (direction from the wrist to the fingertip). - Note that the orientation of the
analysis target region 30 in each of the plurality of capturedimages 12 may be different in a case where the orientation of theanalysis target region 30 is defined based on the orientation of the hand of thecustomer 20 as described above. Therefore, it is preferable that the decidingunit 2040 performs geometric conversion such that the orientations of the plurality ofanalysis target regions 30 are aligned. For example, the decidingunit 2040 extracts theanalysis target region 30 from each capturedimage 12 and performs the geometric conversion on each extractedanalysis target region 30 such that the depth direction of the hand of thecustomer 20 faces the Y-axis direction. - The size of the
analysis target region 30 may be defined statically or may be decided dynamically. In the latter case, the size of theanalysis target region 30 is decided by, for example, the following equation (1). -
- The h and w are respectively the height and width of the
analysis target region 30. The sb is a reference area defined in advance for the hand region. The hb and wb are respectively the height and width of theanalysis target region 30 defined in advance in association with the reference area. The sr is an area of the hand region detected from the capturedimage 12 by thedetection unit 2020. - In addition, for example, the size of the
analysis target region 30 may be dynamically decided using the following equation (2). -
- The h and w are respectively the height and width of the
analysis target region 30. The db is a reference distance value defined in advance. The hb and wb are respectively the height and width of theanalysis target region 30 associated with the reference distance value. The dr is a distance value between thereference position 24 detected from the capturedimage 12 and thecamera 10. - There are various methods of determining the distance value dr. For example, the
detection unit 2020 determines the distance value dr based on a pixel value at thereference position 24 in the depth image generated by the depth camera. In addition, for example, thedetection unit 2020 may be configured to detect the distance between thereference position 24 and thecamera 10 in addition to thereference position 24 when thedetection unit 2020 is configured as the detector using the machine learning. - Here, each pixel of the
analysis target region 30 decided by the above method may be corrected, and the correctedanalysis target region 30 may be used for the image analysis by thedetermination unit 2060. The decidingunit 2040 corrects each pixel in theanalysis target region 30 using, for example, the following equation (3). -
[Equation 3] -
d (x,y)1 =d (x,y)0 +(d r −d b) (3) - The d(x,y)0 is a pixel value before the correction at coordinates (x,y) of the
analysis target region 30 in the capturedimage 12. The d(x,y)1 is a pixel value after the correction at the coordinates (x,y) of theanalysis target region 30 in the capturedimage 12. - <Determination of Motion of Customer 20: S108, S110>
- The
determination unit 2060 performs the image analysis on the decidedanalysis target region 30 to determine the motion of the customer 20 (S108 and S110). The motion of thecustomer 20 is, for example, any of (1) motion of taking out theproduct 40 from theproduct shelf 50, (2) motion of placing theproduct 40 on theproduct shelf 50, (3) motion of not holding theproduct 40 both before and after the contact with theproduct shelf 50, and (4) motion of holding theproduct 40 both before and after the contact with theproduct shelf 50. - Here, “the contact between the
product shelf 50 and thecustomer 20” means that the image region of theproduct shelf 50 and the image region of thecustomer 20 overlap at least partially in the capturedimage 12, and there is no need for theproduct shelf 50 and the customer to contact each other in the real space. Further, in (4) described above, aproduct 40 held by thecustomer 20 before the contact between thecustomer 20 and theproduct shelf 50 may be the same as or different from aproduct 40 held by thecustomer 20 after the contact between thecustomer 20 and theproduct shelf 50. - A flow of processing of discriminating the four motions described above is, for example, a flow shown in
FIG. 10 .FIGS. 10 and 11 are flowcharts illustrating the flow of the processing for determining the motion of thecustomer 20. First, thedetermination unit 2060 detects the capturedimage 12 including a scene in which thereference position 24 moves toward the product shelf 50 (S202). For example, thedetermination unit 2060 computes a distance between thereference position 24 and theproduct shelf 50 for each of the plurality of capturedimages 12 in a time series. In a case where the distance decreases over time in one or more capturedimages 12, the capturedimages 12 are detected as the capturedimage 12 including the scene in which thereference position 24 moves toward theproduct shelf 50. - Furthermore, the
determination unit 2060 decides whether theproduct 40 is included in theanalysis target region 30 in the capturedimage 12 detected in S202 (S204). In a case where theproduct 40 is included in the analysis target region 30 (YES in S204), the processing inFIG. 10 proceeds to S206. On the other hand, in a case where theproduct 40 is not included in the analysis target region 30 (NO in S204), the processing inFIG. 10 proceeds to S216. - In S206, the
determination unit 2060 detects a capturedimage 12 including a scene in which thereference position 24 moves in a direction away from theproduct shelf 50 from among the capturedimages 12 generated later than the capturedimage 12 detected in S202 (S206). For example, thedetermination unit 2060 computes the distance between thereference position 24 and theproduct shelf 50 for each of the plurality of capturedimages 12 in a time series generated later than the capturedimage 12 detected in S202. In a case where the distance increases over time in one or more capturedimages 12, the capturedimages 12 are detected as the capturedimage 12 including the scene in which thereference position 24 moves in the direction away from theproduct shelf 50. - Furthermore, the
determination unit 2060 decides whether theproduct 40 is included in theanalysis target region 30 in the capturedimage 12 detected in S206 (S208). In a case where theproduct 40 is included in the analysis target region 30 (YES in S208), theproduct 40 is held in both a hand moving toward theproduct shelf 50 and a hand moving in the direction away from theproduct shelf 50. Therefore, thedetermination unit 2060 determines that the motion of thecustomer 20 is “(4) motion of holding theproduct 40 both before and after the contact with theproduct shelf 50” (S210). - On the other hand, in a case where the
product 40 is not included in the analysis target region 30 (No in S208), theproduct 40 is not held in the hand moving in the direction away from theproduct shelf 50 while theproduct 40 is held in the hand moving toward theproduct shelf 50. Therefore, thedetermination unit 2060 determines that the motion of thecustomer 20 is “(2) motion of placing theproduct 40 on theproduct shelf 50” (S212). - In S214, the
determination unit 2060 detects the capturedimage 12 including the scene in which thereference position 24 moves in the direction away from theproduct shelf 50 from among the capturedimages 12 generated later than the capturedimage 12 detected in S202. The detection method is the same as the method executed in S206. - Furthermore, the
determination unit 2060 decides whether theproduct 40 is included in theanalysis target region 30 in the capturedimage 12 detected in S214 (S216). In a case where theproduct 40 is included in the analysis target region 30 (YES in S216), theproduct 40 is held in the hand moving in the direction away from theproduct shelf 50 while theproduct 40 is not held in the hand moving toward theproduct shelf 50. Therefore, thedetermination unit 2060 determines that the motion of thecustomer 20 is “(1) motion of taking out theproduct 40 from theproduct shelf 50” (S218). - On the other hand, in a case where the
product 40 is not included in the analysis target region 30 (NO in S216), theproduct 40 is not held in both the hand moving toward theproduct shelf 50 and the hand moving in the direction away from theproduct shelf 50. Therefore, thedetermination unit 2060 determines that the motion of thecustomer 20 is “(3) motion of not holding theproduct 40 both before and after contact with theproduct shelf 50” (S220). - Here, for example, there is the following method as the method of detecting whether the
product 40 is included in theanalysis target region 30. Thedetermination unit 2060 first extracts an image region excluding a background region, that is, a foreground region, from theanalysis target region 30 decided for each of the plurality of capturedimages 12 in a time series. Note that an existing technique can be used as a technique of determining the background region for a captured image to be imaged by thecamera 10 installed at a predetermined place. - The
determination unit 2060 decides that theproduct 40 is included in theanalysis target region 30 in a case where the foreground region includes a region other than the image region representing the hand of thecustomer 20. However, thedetermination unit 2060 may decide that theproduct 40 is included in theanalysis target region 30 only in a case where a size of the image region excluding the image region representing the hand in the foreground region is equal to or larger than a predetermined size. With this, it is possible to prevent the noise included in the capturedimage 12 from being erroneously detected as theproduct 40. - The method of deciding whether the
product 40 is included in the analysis target region is not limited to the method described above. Various existing methods can be used as the method of deciding whether theproduct 40 is included in theanalysis target region 30, that is, whether the hand of the person included in the image has the product. - Note that the
determination unit 2060 may determine the motion of thecustomer 20 from one capturedimage 12. For example, in this case, thedetermination unit 2060 determines the motion of thecustomer 20 as “holding theproduct 40” or “not holding theproduct 40”. - <Determination of
Product 40> - The
determination unit 2060 may determine the taken-outproduct 40 when the customer takes out theproduct 40 from theproduct shelf 50. The determination of theproduct 40 means, for example, that information for identifying theproduct 40 from other products 40 (for example, an identifier or a name of the product 40) is determined. Hereinafter, the information for identifying theproduct 40 is referred to as product identification information. - The
determination unit 2060 determines a place in theproduct shelf 50 where thecustomer 20 takes out theproduct 40 to determine the taken-outproduct 40. As a premise, it is assumed that the display place of theproduct 40 is defined in advance. Here, information indicating which product is displayed at each position of theproduct shelf 50 is referred to as display information. Thedetermination unit 2060 determines a place in theproduct shelf 50 from which thecustomer 20 takes out aproduct 40 using the capturedimage 12 and determines the taken-outproduct 40 using the determined place and the display information. - For example, it is assumed that a
determined product 40 is displayed for each stage in theproduct shelf 50. In this case, the display information indicates the product identification information in association with the stage of theproduct shelf 50.FIG. 12 is a diagram illustrating the display information in a table format. A table shown inFIG. 12 is referred to as a table 200. The table 200 is created for eachproduct shelf 50. The table 200 has two columns of astage 202 andproduct identification information 204. InFIG. 12 , theproduct identification information 204 indicates the identifier of theproduct 40. For example, in the table 200 representing the display information of theproduct shelf 50 determined by an identifier s0001, a record in a first row displays aproduct 40 determined by an identifier i0001 in a first stage of theproduct shelf 50. - The
determination unit 2060 determines the stage of theproduct shelf 50 from which theproduct 40 is taken out, using the capturedimage 12. Thedetermination unit 2060 acquires the product identification information associated with the stage in display information to determine theproduct 40 taken out from theproduct shelf 50. Hereinafter, several methods of determining the stage of theproduct shelf 50 from which theproduct 40 is taken out will be illustrated. - <<First Method>>
- As a premise, it is assumed that the captured
image 12 includes a scene in which theproduct shelf 50 is imaged from above (refer toFIG. 5 ). In other words, it is assumed that thecamera 10 images theproduct shelf 50 from above. In this case, the depth camera is used as thecamera 10. The depth camera generates a depth image in addition to or instead of a common captured image. As described above, the depth image is an image in which the value of each pixel of the image represents the distance between the imaged item and the camera.FIG. 13 is a diagram illustrating a depth image generated by thecamera 10. In the depth image inFIG. 13 , pixels representing an item closer to thecamera 10 are closer to white (brighter) and pixels representing an item farther from thecamera 10 are closer to black (darker). Note that darker portions are densely drawn with larger black dots and brighter portions are sparsely drawn with smaller black dots inFIG. 13 for convenience of illustration. - The
determination unit 2060 determines a stage of theproduct shelf 50 where thereference position 24 is present, based on the value of the pixel representing thereference position 24 in the depth image. At this time, a range of a distance from thecamera 10 for each stage of theproduct shelf 50 is defined in advance in the display information.FIG. 14 is a diagram illustrating display information indicating the range of the distance from thecamera 10 for each stage of theproduct shelf 50. For example, the table 200 inFIG. 14 indicates that the range of the distance between a first shelf of theproduct shelf 50 and thecamera 10 is equal to or larger than d1 and less than d2. In other words, the distance between the top of the first shelf and thecamera 10 is d1, and the distance between the top of a second shelf and thecamera 10 is d2. - The
determination unit 2060 determines a stage of theproduct shelf 50 where thereference position 24 is present, based on thereference position 24 of the depth image including a scene in which thecustomer 20 takes out theproduct 40 and the display information shown inFIG. 14 . The determined stage is defined as the stage from which theproduct 40 is taken out. For example, it is assumed that the pixel at thereference position 24 in the depth image indicates that the distance between thereference position 24 and thecamera 10 is a. It is assumed that a is equal to or larger than d1 and equal to or less than d2. In this case, thedetermination unit 2060 determines that thereference position 24 is present on the first shelf of theproduct shelf 50 based on the display information shown inFIG. 14 . That is, thedetermination unit 2060 determines that the shelf from which theproduct 40 is taken out is the first shelf of theproduct shelf 50. - <<Second Method>>
- As a premise, it is assumed that the captured
image 12 includes a scene in which theproduct shelf 50 is viewed from the side. In other words, it is assumed that thecamera 10 images theproduct shelf 50 from the lateral direction. In this case, thedetermination unit 2060 determines a stage in theproduct shelf 50 where a position of thereference position 24 in the height direction (Y coordinates) detected from the capturedimage 12 is present. The determined stage is defined as the stage of theproduct shelf 50 from which theproduct 40 is taken out. In this case, the capturedimage 12 may be a depth image or a common image. - <<About Case where Plurality of Types of
Products 40 are Displayed in One Stage>> - A plurality of types of products may be displayed on one stage by dividing one stage of the
product shelf 50 into a plurality of columns in the horizontal direction. In this case, thedetermination unit 2060 respectively determines a position in the horizontal direction and a position in the height direction for thereference position 24 of the hand of thecustomer 20 who takes out theproduct 40 from theproduct shelf 50 to determine theproduct 40. In this case, the product identification information is shown for each combination of stage and column in the display information. Hereinafter, a method of determining the position of thereference position 24 in the horizontal direction will be described. - It is assumed that the
camera 10 images theproduct shelf 50 from above. In this case, the position of thereference position 24 in the horizontal direction is determined by the X coordinates of thereference position 24 in the capturedimage 12. - On the other hand, it is assumed that the
camera 10 images theproduct shelf 50 from the lateral direction. In this case, thedetermination unit 2060 determines the position of thereference position 24 in the horizontal direction using the depth image. Here, the method of determining the position of thereference position 24 in the horizontal direction, using the depth image including a scene in which theproduct shelf 50 is imaged from the lateral direction, is the same as the method of determining the position of thereference position 24 in the height direction, using the depth image including the scene in which theproduct shelf 50 is imaged from above. - Note that the method of determining the
product 40 to be taken out from theproduct shelf 50 is described, but thedetermination unit 2060 may determine theproduct 40 to be placed on theproduct shelf 50 by the similar method. However, in this case, thedetermination unit 2060 uses a capturedimage 12 including a scene in which theproduct 40 is placed on theproduct shelf 50. - Here, it is assumed that “(4) motion of holding the
product 40 both before and after the contact with theproduct shelf 50” is determined as the motion of thecustomer 20. In this case, thedetermination unit 2060 may decide whether theproducts 40 held by thecustomer 20 before and after the contact between thecustomer 20 and theproduct shelf 50 are the same based on the method of determining theproduct 40 described above. For example, thedetermination unit 2060 determines theproduct 40 before the contact between thecustomer 20 and the product shelf by the same method as the method of determining theproduct 40 to be placed on theproduct shelf 50. Furthermore, thedetermination unit 2060 determines theproduct 40 after the contact between thecustomer 20 and theproduct shelf 50 by the same method as the method of determining theproduct 40 to be taken out from theproduct shelf 50. In a case where the twodetermined products 40 are the same, thedetermination unit 2060 decides that theproducts 40 held by thecustomer 20 before and after the contact between thecustomer 20 and theproduct shelf 50 are the same. In this case, it can be said that the motion of thecustomer 20 is a “motion of reaching for theproduct shelf 50 to place theproduct 40, but not placing theproduct 40”. On the other hand, in a case where the twodetermined products 40 are different from each other, thedetermination unit 2060 decides that theproducts 40 held by thecustomer 20 before and after the contact between thecustomer 20 and theproduct shelf 50 are different from each other. In this case, it can be said that the motion of thecustomer 20 is a “motion of placing the heldproduct 40 and taking out anotherproduct 40”. - However, the above determination may be performed without specifically determining the
product 40. For example, thedetermination unit 2060 computes magnitude of a difference (difference in area or color) between the foreground region of theanalysis target region 30 before the contact between thecustomer 20 and theproduct shelf 50 and the foreground region of theanalysis target region 30 after the contact between thecustomer 20 and the product shelf and decides that theproducts 40 before and after the contact are different from each other in a case where the magnitude of the computed difference is equal to or larger than a predetermined value. On the other hand, thedetermination unit 2060 decides that theproducts 40 before and after the contact are the same in a case where the magnitude of the difference is less than the predetermined value. - In addition, for example, the
determination unit 2060 decides whether theproducts 40 before and after the contact are the same based on the difference in the reference positions 24 before and after the contact between thecustomer 20 and theproduct shelf 50. In this case, thedetermination unit 2060 respectively determines, using the display information described above, a stage of theproduct shelf 50 where thereference position 24 is present before the contact between thecustomer 20 and theproduct shelf 50 and a stage of theproduct shelf 50 where thereference position 24 is present after the contact between thecustomer 20 and theproduct shelf 50. In a case where the stages of theproduct shelf 50 determined respectively before and after the contact between thecustomer 20 and theproduct shelf 50 are different from each other, thedetermination unit 2060 decides that theproducts 40 before and after the contact are different from each other. On the other hand, in a case where the stages of theproduct shelf 50 determined respectively before and after the contact are the same, thedetermination unit 2060 decides that theproducts 40 before and after the contact are the same. - <Utilization Method of Motion of
Customer 20 Determined byDetermination Unit 2060> - The motion of the
customer 20 determined by thedetermination unit 2060 can be used to analyze an action performed in front of the product shelf 50 (so-called front-shelf action) by thecustomer 20. For this reason, thedetermination unit 2060 outputs various pieces of information such as a motion performed in front of theproduct shelf 50 by eachcustomer 20, a date and time when the motion is performed, and aproduct 40 subjected to the motion. This information is, for example, stored in a storage apparatus connected to theinformation processing apparatus 2000 or transmitted to a server apparatus connected to theinformation processing apparatus 2000 in a communicable manner. Here, various existing methods can be used as the method of analyzing the front-shelf action based on various motions of thecustomer 20 performed in front of theproduct shelf 50. - Note that a usage scene of the
information processing apparatus 2000 is not limited to the determination of the motion of the customer in the store. For example, as described above, theinformation processing apparatus 2000 can be used to determine the motion of a factory worker or the like. In this case, for example, the motion of each worker determined by theinformation processing apparatus 2000 is compared with a motion of each worker defined in advance, and thus it is possible to confirm whether the worker correctly performs a predetermined job. - The example embodiments of the present invention are described with reference to the drawings. However, the example embodiments are only examples of the present invention, and various configurations other than the above can be employed.
Claims (20)
1. An information processing apparatus comprising:
at least one memory configured to store instructions; and
at least one processor configured to execute the instructions to:
detect a pose of the hand of the person from the captured image;
set an analysis region of a predetermined shape on the captured image, the analysis region having a direction determined based on a direction of the hand defined by the pose of the hand; and
determine a motion of the person by analyzing the analysis target region,
wherein the analysis region has two first parallel sides that are parallel to the direction of the hand defined by the pose of the hand and two second parallel sides that are perpendicular to the direction of the hand.
2. The information processing apparatus according to claim 1 ,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person holding the item by analyzing the analysis target region.
3. The information processing apparatus according to claim 1 ,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
4. The information processing apparatus according to claim 1 ,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person not holding the item by analyzing the analysis target region.
5. The information processing apparatus according to claim 4 ,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person not holding the item both before and after contact with the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
6. The information processing apparatus according to claim 1 ,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person placing the item on the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
7. The information processing apparatus according to claim 2 ,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
8. A control method executed by a computer, the method comprising:
detecting a pose of the hand of the person from the captured image;
setting an analysis region of a predetermined shape on the captured image, the analysis region having a direction determined based on a direction of the hand defined by the pose of the hand; and
determining a motion of the person by analyzing the analysis target region,
wherein the analysis region has two first parallel sides that are parallel to the direction of the hand defined by the pose of the hand and two second parallel sides that are perpendicular to the direction of the hand.
9. The control method according to claim 8 ,
wherein the method comprises determining a motion of the person holding the item by analyzing the analysis target region.
10. The control method according to claim 8 ,
wherein the method comprises determining a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
11. The control method according to claim 8 ,
wherein the method comprises determining a motion of the person not holding the item by analyzing the analysis target region.
12. The control method according to claim 11 ,
wherein the method comprises determining a motion of the person not holding the item both before and after contact with the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
13. The control method according to claim 8 ,
wherein the method comprises determining a motion of the person placing the item on the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
14. The control method according to claim 9 ,
wherein the method comprises determining a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
15. A non-transitory computer readable medium storing a program for causing a computer to perform operations, the operations comprising:
detecting a pose of the hand of the person from the captured image;
setting an analysis region of a predetermined shape on the captured image, the analysis region having a direction determined based on a direction of the hand defined by the pose of the hand; and
determining a motion of the person by analyzing the analysis target region,
wherein the analysis region has two first parallel sides that are parallel to the direction of the hand defined by the pose of the hand and two second parallel sides that are perpendicular to the direction of the hand.
16. The non-transitory computer readable medium according to claim 15 ,
wherein the operations comprise determining a motion of the person holding the item by analyzing the analysis target region.
17. The non-transitory computer readable medium according to claim 15 ,
wherein the operations comprise determining a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
18. The non-transitory computer readable medium according to claim 15 ,
wherein the operations comprise determining a motion of the person not holding the item by analyzing the analysis target region.
19. The non-transitory computer readable medium according to claim 18 ,
wherein the operations comprise determining a motion of the person not holding the item both before and after contact with the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
20. The non-transitory computer readable medium according to claim 15 ,
wherein the operations comprise determining a motion of the person placing the item on the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/227,848 US20240029273A1 (en) | 2017-06-21 | 2023-07-28 | Information processing apparatus, control method, and program |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2017/022875 WO2018235198A1 (en) | 2017-06-21 | 2017-06-21 | Information processing device, control method, and program |
US201916623656A | 2019-12-17 | 2019-12-17 | |
US17/349,045 US11763463B2 (en) | 2017-06-21 | 2021-06-16 | Information processing apparatus, control method, and program |
US18/227,848 US20240029273A1 (en) | 2017-06-21 | 2023-07-28 | Information processing apparatus, control method, and program |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/349,045 Continuation US11763463B2 (en) | 2017-06-21 | 2021-06-16 | Information processing apparatus, control method, and program |
Publications (1)
Publication Number | Publication Date |
---|---|
US20240029273A1 true US20240029273A1 (en) | 2024-01-25 |
Family
ID=64737005
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/623,656 Abandoned US20210142490A1 (en) | 2017-06-21 | 2017-06-21 | Information processing apparatus, control method, and program |
US17/349,045 Active 2037-08-17 US11763463B2 (en) | 2017-06-21 | 2021-06-16 | Information processing apparatus, control method, and program |
US18/227,848 Pending US20240029273A1 (en) | 2017-06-21 | 2023-07-28 | Information processing apparatus, control method, and program |
US18/227,850 Pending US20230410321A1 (en) | 2017-06-21 | 2023-07-28 | Information processing apparatus, control method, and program |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/623,656 Abandoned US20210142490A1 (en) | 2017-06-21 | 2017-06-21 | Information processing apparatus, control method, and program |
US17/349,045 Active 2037-08-17 US11763463B2 (en) | 2017-06-21 | 2021-06-16 | Information processing apparatus, control method, and program |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/227,850 Pending US20230410321A1 (en) | 2017-06-21 | 2023-07-28 | Information processing apparatus, control method, and program |
Country Status (3)
Country | Link |
---|---|
US (4) | US20210142490A1 (en) |
JP (2) | JP7197171B2 (en) |
WO (1) | WO2018235198A1 (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111507125A (en) * | 2019-01-30 | 2020-08-07 | 佳能株式会社 | Detection device and method, image processing device and system |
JP6982259B2 (en) * | 2019-09-19 | 2021-12-17 | キヤノンマーケティングジャパン株式会社 | Information processing equipment, information processing methods, programs |
US20230080815A1 (en) * | 2020-02-28 | 2023-03-16 | Nec Corporation | Customer analysis apparatus, customer analysis method, and non-transitory storage medium |
KR20220144889A (en) * | 2020-03-20 | 2022-10-27 | 후아웨이 테크놀러지 컴퍼니 리미티드 | Method and system for hand gesture-based control of a device |
WO2021255894A1 (en) * | 2020-06-18 | 2021-12-23 | 日本電気株式会社 | Control device, control method, and program |
US11881045B2 (en) * | 2020-07-07 | 2024-01-23 | Rakuten Group, Inc. | Region extraction device, region extraction method, and region extraction program |
US20240127303A1 (en) * | 2021-09-29 | 2024-04-18 | Nec Corporation | Reporting system, method, and recording medium |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000331170A (en) * | 1999-05-21 | 2000-11-30 | Atr Media Integration & Communications Res Lab | Hand motion recognizing device |
US7227526B2 (en) * | 2000-07-24 | 2007-06-05 | Gesturetek, Inc. | Video-based image control system |
WO2010004719A1 (en) | 2008-07-08 | 2010-01-14 | パナソニック株式会社 | Article estimation device, article position estimation device, article estimation method, and article estimation program |
US9904845B2 (en) * | 2009-02-25 | 2018-02-27 | Honda Motor Co., Ltd. | Body feature detection and human pose estimation using inner distance shape contexts |
JP5361530B2 (en) * | 2009-05-20 | 2013-12-04 | キヤノン株式会社 | Image recognition apparatus, imaging apparatus, and image recognition method |
WO2011007390A1 (en) * | 2009-07-15 | 2011-01-20 | 株式会社 東芝 | Image-processing device and interface device |
JP5272213B2 (en) * | 2010-04-30 | 2013-08-28 | 日本電信電話株式会社 | ADVERTISEMENT EFFECT MEASUREMENT DEVICE, ADVERTISEMENT EFFECT MEASUREMENT METHOD, AND PROGRAM |
JP5573379B2 (en) * | 2010-06-07 | 2014-08-20 | ソニー株式会社 | Information display device and display image control method |
JP5561124B2 (en) | 2010-11-26 | 2014-07-30 | 富士通株式会社 | Information processing apparatus and pricing method |
JP2013164834A (en) * | 2012-01-13 | 2013-08-22 | Sony Corp | Image processing device, method thereof, and program |
US10049281B2 (en) | 2012-11-12 | 2018-08-14 | Shopperception, Inc. | Methods and systems for measuring human interaction |
US10268983B2 (en) * | 2013-06-26 | 2019-04-23 | Amazon Technologies, Inc. | Detecting item interaction and movement |
JP6134607B2 (en) * | 2013-08-21 | 2017-05-24 | 株式会社Nttドコモ | User observation system |
JP6194777B2 (en) | 2013-11-29 | 2017-09-13 | 富士通株式会社 | Operation determination method, operation determination apparatus, and operation determination program |
JP2016162072A (en) * | 2015-02-27 | 2016-09-05 | 株式会社東芝 | Feature quantity extraction apparatus |
JP6618301B2 (en) * | 2015-08-31 | 2019-12-11 | キヤノン株式会社 | Information processing apparatus, control method therefor, program, and storage medium |
-
2017
- 2017-06-21 US US16/623,656 patent/US20210142490A1/en not_active Abandoned
- 2017-06-21 JP JP2019524778A patent/JP7197171B2/en active Active
- 2017-06-21 WO PCT/JP2017/022875 patent/WO2018235198A1/en active Application Filing
-
2021
- 2021-06-16 US US17/349,045 patent/US11763463B2/en active Active
- 2021-07-07 JP JP2021112713A patent/JP7332183B2/en active Active
-
2023
- 2023-07-28 US US18/227,848 patent/US20240029273A1/en active Pending
- 2023-07-28 US US18/227,850 patent/US20230410321A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
WO2018235198A1 (en) | 2018-12-27 |
JP2021177399A (en) | 2021-11-11 |
US20210142490A1 (en) | 2021-05-13 |
JPWO2018235198A1 (en) | 2020-04-09 |
JP7197171B2 (en) | 2022-12-27 |
US20230410321A1 (en) | 2023-12-21 |
JP7332183B2 (en) | 2023-08-23 |
US11763463B2 (en) | 2023-09-19 |
US20210343026A1 (en) | 2021-11-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11763463B2 (en) | Information processing apparatus, control method, and program | |
CN107358149B (en) | Human body posture detection method and device | |
US9866820B1 (en) | Online calibration of cameras | |
JP6176388B2 (en) | Image identification device, image sensor, and image identification method | |
US20160117824A1 (en) | Posture estimation method and robot | |
US10063843B2 (en) | Image processing apparatus and image processing method for estimating three-dimensional position of object in image | |
CN108573471B (en) | Image processing apparatus, image processing method, and recording medium | |
US10515459B2 (en) | Image processing apparatus for processing images captured by a plurality of imaging units, image processing method, and storage medium storing program therefor | |
US10496874B2 (en) | Facial detection device, facial detection system provided with same, and facial detection method | |
US20050139782A1 (en) | Face image detecting method, face image detecting system and face image detecting program | |
US20220012514A1 (en) | Identification information assignment apparatus, identification information assignment method, and program | |
US20240104769A1 (en) | Information processing apparatus, control method, and non-transitory storage medium | |
JP2007241477A (en) | Image processor | |
US9639763B2 (en) | Image target detecting apparatus and method | |
US20210042576A1 (en) | Image processing system | |
US11812131B2 (en) | Determination of appropriate image suitable for feature extraction of object from among captured images in which object is detected | |
CN106406507B (en) | Image processing method and electronic device | |
JP5217917B2 (en) | Object detection and tracking device, object detection and tracking method, and object detection and tracking program | |
JP6772059B2 (en) | Electronic control devices, electronic control systems and electronic control methods | |
US11521330B2 (en) | Image processing apparatus, image processing method, and storage medium | |
JP2013250604A (en) | Object detecting device and object detecting method | |
KR20220083347A (en) | Method, apparatus, and computer program for measuring volume of objects by using image | |
US20170200383A1 (en) | Automated review of forms through augmented reality | |
CN113344904A (en) | Surface defect detection method and related product | |
CN111047644A (en) | Method for identifying object position and orientation relation between objects in picture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |