US20240029273A1 - Information processing apparatus, control method, and program - Google Patents

Information processing apparatus, control method, and program Download PDF

Info

Publication number
US20240029273A1
US20240029273A1 US18/227,848 US202318227848A US2024029273A1 US 20240029273 A1 US20240029273 A1 US 20240029273A1 US 202318227848 A US202318227848 A US 202318227848A US 2024029273 A1 US2024029273 A1 US 2024029273A1
Authority
US
United States
Prior art keywords
motion
target region
analysis target
person
hand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/227,848
Inventor
Soma Shiraishi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to US18/227,848 priority Critical patent/US20240029273A1/en
Publication of US20240029273A1 publication Critical patent/US20240029273A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B13/00Burglar, theft or intruder alarms
    • G08B13/18Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength
    • G08B13/189Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems
    • G08B13/194Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems
    • G08B13/196Actuation by interference with heat, light, or radiation of shorter wavelength; Actuation by intruding sources of heat, light, or radiation of shorter wavelength using passive radiation detection systems using image scanning and comparing systems using television cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • G06T2207/10021Stereoscopic video; Stereoscopic image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Definitions

  • the present invention relates to image analysis.
  • a customer takes out and purchases a product displayed in a display place (for example, a product shelf).
  • the customer may return the product once picked up to the display place.
  • Techniques of analyzing such an action of the customer related to the product displayed are developed.
  • Patent Document 1 discloses a technique of detecting that an item (a hand of person) enters a determined region (shelf) using a depth image obtained from an imaging result by a depth camera and determining a motion of a customer using a color image near an entry position before and after the entry. Specifically, a color image including a hand of a person entering the determined region is compared with a color image including the hand of the person leaving the determined region to respectively determine the motions of the person as “acquisition of product” in a case where an increase in a color exceeds a threshold value, “return of product” in a case where a decrease in the color exceeds a threshold value, and “contact” in a case where a change in the color is less than a threshold value. Further, Patent Document 1 discloses a technique of deciding the increase or decrease in a volume of a subject from information on a size of the subject obtained from the imaging result of the depth camera to distinguish between the acquisition and the return of the product.
  • a degree of increase or decrease in color or volume before and after the entry of the hand of the person into the display place is affected by, for example, changes in a size of the product or a pose of the hand of the person. For example, in a case where a small product is taken out from the display place, the increase in color and volume before and after that is small. Further, a motion of changing the pose of the hand may be erroneously recognized as the motion of acquiring the product.
  • the present invention is made in view of the above problems.
  • One of the objects of the present invention is to provide a technique of determining a motion of a person with respect to a displayed item with high accuracy.
  • the information processing apparatus includes: 1) a detection unit that detects a reference position indicating a position of a hand of a person included in a captured image from the captured image in which a display place of an item is imaged; 2) a deciding unit that decides an analysis target region in the captured image using the detected reference position and decides the analysis target region; and 3) a determination unit that analyzes the decided analysis target region to determine a motion of the person.
  • a control method of the present invention is executed by a computer.
  • the control method includes: 1) a detection step of detecting a reference position indicating a position of a hand of a person included in a captured image from the captured image in which a display place of an item is imaged; 2) a deciding step of deciding an analysis target region in the captured image using the detected reference position and deciding the analysis target region; and 3) a determination step of analyzing the decided analysis target region to determine a motion of the person.
  • a program of the present invention causes a computer to execute each step of the control method of the present invention.
  • FIG. 1 is a diagram conceptually illustrating an operation of an information processing apparatus according to an example embodiment 1.
  • FIG. 2 is a block diagram illustrating an example of a functional configuration of the information processing apparatus according to the example embodiment 1.
  • FIG. 3 is a diagram illustrating a computer for forming the information processing apparatus.
  • FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1.
  • FIG. 5 is a first diagram illustrating an imaging range of a camera.
  • FIG. 6 is a second diagram illustrating the imaging range of the camera.
  • FIG. 7 is a diagram illustrating a case where a captured image includes a scene in which a product shelf is imaged from the right side as viewed from the front.
  • FIGS. 8 A and 8 B are diagrams illustrating an analysis target region that is decided as a region having a predetermined shape defined with a reference position as a reference.
  • FIG. 9 is a diagram illustrating a case where an orientation of the analysis target region is defined based on an orientation of a hand of a customer.
  • FIG. 10 is a flowchart illustrating a flow of processing for determining a motion of a customer 20 .
  • FIG. 11 is a flowchart illustrating the flow of the processing for determining the motion of the customer 20 .
  • FIG. 12 is a diagram illustrating display information in a table format.
  • FIG. 13 is a diagram illustrating a depth image generated by the camera.
  • FIG. 14 is a diagram illustrating display information indicating a range of a distance from a camera for each stage of the product shelf.
  • each block represents a configuration of functional units instead of a configuration of hardware units in each block diagram, unless otherwise described.
  • FIG. 1 is a diagram conceptually illustrating an operation of an information processing apparatus according to an example embodiment 1 (information processing apparatus 2000 shown in FIG. 2 and the like described below). Note that FIG. 1 is an illustration for easily understanding the operation of the information processing apparatus 2000 and the operation of the information processing apparatus 2000 is not limited by FIG. 1 .
  • the information processing apparatus 2000 analyzes a captured image 12 generated by a camera 10 to determine a motion of a person.
  • the camera 10 is a camera that images a display place where an item is displayed.
  • the camera 10 repeatedly performs imaging and generates a plurality of captured images 12 .
  • the plurality of generated captured images 12 are, for example, a frame group that constitutes video data.
  • the plurality of captured images 12 generated by the camera 10 do not necessarily need to constitute the video data and may be handled as individual still image data.
  • An item to be imaged by the camera 10 can be any item that is displayed at the display place, and is taken out from the display place by a person or is placed (returned) in the display place by a person on the contrary.
  • a specific item to be imaged by the camera 10 varies depending on a usage environment of the information processing apparatus 2000 .
  • the information processing apparatus 2000 is used to determine the motion of a customer or a store clerk in a store.
  • the item to be imaged by the camera 10 is a product sold in the store.
  • the display place described above is, for example, a product shelf.
  • the information processing apparatus 2000 is used to determine the motion of a customer 20 . Therefore, the person and the item to be imaged by the camera 10 are respectively the customer 20 and a product 40 . Further, the display place is a product shelf 50 .
  • the information processing apparatus 2000 is used to determine the motion of a factory worker or the like.
  • the person to be imaged by the camera 10 is the worker or the like.
  • the item to be imaged by the camera 10 is a material, a tool, or the like which is used in the factory.
  • the display place is a shelf installed in, for example, a warehouse of the factory.
  • the information processing apparatus 2000 is used to determine the motion of the customer (customer 20 in FIG. 1 ) in the store will be described as an example, unless otherwise noted in this specification. Therefore, it is assumed that the “motion of person” determined by the determination unit 2060 is the “motion of customer”. Further, it is assumed that the “item” to be imaged by the camera is the “product”. Furthermore, it is assumed that the “display place” is the “product shelf”.
  • the information processing apparatus 2000 detects a reference position 24 from the captured image 12 .
  • the reference position 24 indicates a position of a hand of the person.
  • the position of the hand of the person is, for example, a center position of the hand or a position of a fingertip.
  • the information processing apparatus 2000 decides a region to be analyzed (analysis target region 30 ) in the captured image 12 using this reference position 24 .
  • the information processing apparatus 2000 analyzes the analysis target region 30 to determine the motion of the customer 20 .
  • the motion of the customer 20 is a motion of holding the product 40 , a motion of taking out the product 40 from the product shelf 50 , or a motion of placing the product 40 on the product shelf 50 .
  • the information processing apparatus 2000 first detects the reference position 24 indicating the position of the hand of the customer 20 in the captured image 12 and decides the analysis target region 30 based on the reference position 24 . That is, the image analysis is performed near the hand of the customer 20 .
  • FIG. 2 is a block diagram illustrating an example of a functional configuration of the information processing apparatus 2000 according to the example embodiment 1.
  • the information processing apparatus 2000 has a detection unit 2020 , a deciding unit 2040 , and a determination unit 2060 .
  • the detection unit 2020 detects the reference position 24 of the hand of the person included in the captured image 12 from the captured image 12 .
  • the deciding unit 2040 decides the analysis target region 30 in the captured image 12 using the detected reference position 24 .
  • the determination unit 2060 analyzes the decided analysis target region 30 to determine the motion of the person.
  • Each functional configuration unit of the information processing apparatus 2000 may be formed by hardware (for example, a hard-wired electronic circuit) that forms each functional configuration unit or a combination of hardware and software (for example, a combination of an electronic circuit and a program that controls the circuit).
  • hardware for example, a hard-wired electronic circuit
  • software for example, a combination of an electronic circuit and a program that controls the circuit.
  • FIG. 3 is a diagram illustrating a computer 1000 for forming the information processing apparatus 2000 .
  • the computer 1000 is a variety of computers.
  • the computer 1000 is a personal computer (PC), a server machine, a tablet terminal, or a smartphone.
  • the computer 1000 may be the camera 10 .
  • the computer 1000 may be a dedicated computer designed to form the information processing apparatus 2000 or may be a general-purpose computer.
  • the computer 1000 has a bus 1020 , a processor 1040 , a memory 1060 , a storage device 1080 , an input and output interface 1100 , and a network interface 1120 .
  • the bus 1020 is a data transmission path for the processor 1040 , the memory 1060 , the storage device 1080 , the input and output interface 1100 , and the network interface 1120 to mutually transmit and receive data.
  • a method of mutually connecting the processors 1040 and the like is not limited to the bus connection.
  • the processor 1040 is an arithmetic apparatus such as a central processing unit (CPU) or a graphics processing unit (GPU).
  • the memory 1060 is a main storage apparatus formed by a random access memory (RAM) or the like.
  • the storage device 1080 is an auxiliary storage apparatus formed by a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), or the like. However, the storage device 1080 may be configured by hardware similar to the hardware used to configure the main storage apparatus, such as the RAM.
  • the input and output interface 1100 is an interface for connecting the computer 1000 to an input and output device.
  • the network interface 1120 is an interface for connecting the computer 1000 to a communication network.
  • This communication network is, for example, a local area network (LAN) or a wide area network (WAN).
  • the method of connecting the network interface 1120 to the communication network may be a wireless connection or a wired connection.
  • the computer 1000 is connected to the camera 10 in a communicable manner through a network.
  • the method of connecting the computer 1000 to the camera 10 in a communicable manner is not limited to connection through the network.
  • the computer 1000 does not necessarily need to be connected to the camera 10 in a communicable manner as long as the captured image 12 generated by the camera 10 is acquired.
  • the storage device 1080 stores program modules to form the respective functional configuration units (the detection unit 2020 , the deciding unit 2040 , and the determination unit 2060 ) of the information processing apparatus 2000 .
  • the processor 1040 reads each of the program modules into the memory 1060 and executes each program module to realize a function corresponding to each program module.
  • the camera 10 is any camera that can repeatedly perform the imaging and generate the plurality of captured images 12 .
  • the camera 10 may be a video camera that generates the video data or a still camera that generates still image data.
  • the captured image 12 is a video frame constituting the video data in the former case.
  • the camera 10 may be a two-dimensional camera or a three-dimensional camera (stereo camera or depth camera).
  • the captured image 12 may be a depth image in a case where the camera 10 is the depth camera.
  • the depth image is an image in which a value of each pixel of the image represents a distance between an imaged item and the camera.
  • the camera 10 may be an infrared camera.
  • the computer 1000 that forms the information processing apparatus 2000 may be the camera 10 .
  • the camera 10 analyzes the captured image 12 generated by itself to determine the motion of the customer 20 .
  • an intelligent camera for example, an intelligent camera, a network camera, or a camera called an Internet protocol (IP) camera can be used.
  • IP Internet protocol
  • FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus 2000 according to the example embodiment 1.
  • the detection unit 2020 acquires the captured image 12 (S 102 ).
  • the detection unit 2020 detects the reference position 24 of the hand of the product 40 from the acquired captured image 12 (S 104 ).
  • the deciding unit 2040 decides the analysis target region 30 using the detected reference position 24 (S 106 ).
  • the determination unit 2060 performs the image analysis on the decided analysis target region (S 108 ).
  • the determination unit 2060 determines the motion of the customer 20 based on the result of the image analysis of the analysis target region 30 (S 108 ).
  • the plurality of captured images 12 may be used to determine the motion of the customer 20 .
  • the image analysis is performed on the analysis target regions 30 decided for each of the plurality of captured images 12 (image analysis is performed on a plurality of analysis target regions 30 ) to determine the motion of the customer 20 . That is, the processing of S 102 to S 108 is performed for each of the plurality of captured images 12 , and the processing of S 110 is performed using the result.
  • the information processing apparatus 2000 executes a series of pieces of processing shown in FIG. 4 . For example, each time the captured image 12 is generated by the camera 10 , the information processing apparatus 2000 executes the series of pieces of processing shown in FIG. 4 for the captured image 12 .
  • the information processing apparatus 2000 executes the series of pieces of processing shown in FIG. 4 at a predetermined time interval (for example, every second). In this case, for example, the information processing apparatus 2000 acquires the latest captured image 12 generated by the camera 10 at the timing of starting the series of pieces of processing shown in FIG. 4 .
  • the detection unit 2020 acquires the captured image 12 (S 102 ). Any method of the detection unit 2020 to acquire the captured image 12 may be employed. For example, the detection unit 2020 receives the captured image 12 transmitted from the camera 10 . Further, for example, the detection unit 2020 accesses the camera 10 and acquires the captured image 12 stored in the camera 10 .
  • the camera 10 may store the captured image 12 in a storage apparatus provided outside the camera 10 .
  • the detection unit 2020 accesses the storage apparatus and acquires the captured image 12 .
  • the information processing apparatus 2000 acquires the captured image 12 generated by the information processing apparatus 2000 itself.
  • the captured image 12 is stored in, for example, the memory 1060 or the storage device 1080 (refer to FIG. 3 ) inside the information processing apparatus 2000 . Therefore, the detection unit 2020 acquires the captured image 12 from the memory 1060 or the storage device 1080 .
  • the captured image 12 (that is, an imaging range of the camera 10 ) includes at least a range in front of the product shelf 50 .
  • FIG. 5 is a first diagram illustrating the imaging range of the camera 10 .
  • an imaging range 14 of the camera 10 includes a range of a distance d 1 from the front surface of the product shelf 50 to the front side.
  • FIG. 6 is a second diagram illustrating the imaging range of the camera 10 .
  • the imaging range 14 of the camera 10 includes a range from a position apart from the front surface of the product 40 to the front side by d 2 to a position apart from the front side of the product 40 to the front side by d 3 .
  • the captured images 12 in FIGS. 5 and 6 include scenes in which the product shelf 50 is viewed down from above.
  • the camera 10 is installed so as to image the product shelf 50 from above the product shelf 50 .
  • the captured image 12 may not include the scene in which the product shelf 50 is viewed down from above.
  • the captured image 12 may include a scene in which the product shelf 50 is imaged from the side.
  • FIG. 7 is a diagram illustrating a case where the captured image 12 includes a scene in which the product shelf 50 is imaged from the right side as viewed from the front.
  • the detection unit 2020 detects the reference position 24 from the captured image 12 (S 104 ).
  • the reference position 24 indicates the position of the hand of the customer 20 .
  • the position of the hand of the customer 20 is, for example, the center position of the hand or the position of the fingertip.
  • the detection unit 2020 performs feature value matching using a feature value of the hand of the person, which is prepared in advance, to detect a region matching the feature value (with high similarity to the feature value) from the captured image 12 .
  • the detection unit 2020 detects a predetermined position (for example, center position) of the detected region, that is, a region representing the hand as the reference position 24 of the hand.
  • the detection unit 2020 may detect the reference position 24 using machine learning.
  • the detection unit 2020 is configured as a detector using the machine learning.
  • the detection unit 2020 is caused to learn in advance using one or more captured images (a set of a captured image and coordinates of the reference position 24 in the captured image) in which the reference positions 24 are known. With this, the detection unit 2020 can detect the reference position 24 from the acquired captured image 12 .
  • various models such as a neural network can be used as a machine learning prediction model.
  • the learning of the detection unit 2020 is preferably performed on the hand of the customer 20 in various poses. Specifically, captured images for learning are prepared for the hand of customers 20 in various poses. With this, it is possible to detect the reference position 24 from each captured image 12 with high accuracy even though the pose of the hand of the customer 20 is different for each captured image 12 .
  • the detection unit 2020 may detect various parameters relating to the hand of the customer 20 in addition to the reference position 24 .
  • the detection unit 2020 detects a width, length, and pose of the hand, and a distance between the reference position 24 and the camera 10 .
  • the detection unit 2020 determines the width, length, pose, and the like of the hand from a shape and size of a detected hand region.
  • the detection unit 2020 is caused to learn using one or more captured images in which the width, length, and pose of the hand, the distance between the reference position 24 and the camera 10 , and the like are known. With this, it is possible for the detection unit 2020 to detect various parameters such as the hand width in addition to the reference position 24 from the acquired captured image 12 .
  • the deciding unit 2040 decides the analysis target region 30 using the detected reference position 24 (S 106 ). There are various methods for the deciding unit 2040 to decide the analysis target region 30 .
  • the deciding unit 2040 is a region having a predetermined shape defined with the reference position 24 as a reference among the regions included in the captured image 12 .
  • FIG. 8 are diagrams illustrating the analysis target region 30 that is decided as the region having the predetermined shape defined with the reference position 24 as the reference.
  • FIG. 8 A represents a case where the reference position 24 is used as a position representing a predetermined position of the analysis target region 30 .
  • the analysis target region in FIG. 8 A is a rectangle with the reference position 24 as the center.
  • the analysis target region 30 is a rectangle having a height h and a width w.
  • the reference position 24 may be used as a position that defines a position other than the center of the analysis target region 30 such as the upper left end or lower right end of the analysis target region 30 .
  • FIG. 8 B represents a case where a predetermined position (center, upper left corner, or the like) of the analysis target region 30 is defined by a position having a predetermined relationship with the reference position 24 .
  • the analysis target region 30 in FIG. 8 B is a rectangle with a position moved by a predetermined vector v from the reference position 24 as the center. The size and orientation of the rectangle are the same as the analysis target region in FIG. 8 A .
  • the orientation of the analysis target region 30 is defined based on an axial direction of the captured image 12 . More specifically, a height direction of the analysis target region 30 is defined as a Y-axis direction of the captured image 12 . However, the orientation of the analysis target region 30 may be defined based on a direction other than the axial direction of the captured image 12 .
  • the detection unit 2020 detects the pose of the hand of the customer 20 .
  • the orientation of the analysis target region 30 may be defined based on the orientation of the hand.
  • FIG. 9 is a diagram illustrating a case where the orientation of the analysis target region 30 is defined based on the orientation of the hand of the customer 20 .
  • the orientation of the analysis target region 30 is defined as a depth direction of the hand (direction from the wrist to the fingertip).
  • the orientation of the analysis target region 30 in each of the plurality of captured images 12 may be different in a case where the orientation of the analysis target region 30 is defined based on the orientation of the hand of the customer 20 as described above. Therefore, it is preferable that the deciding unit 2040 performs geometric conversion such that the orientations of the plurality of analysis target regions 30 are aligned. For example, the deciding unit 2040 extracts the analysis target region 30 from each captured image 12 and performs the geometric conversion on each extracted analysis target region 30 such that the depth direction of the hand of the customer 20 faces the Y-axis direction.
  • the size of the analysis target region 30 may be defined statically or may be decided dynamically. In the latter case, the size of the analysis target region 30 is decided by, for example, the following equation (1).
  • the h and w are respectively the height and width of the analysis target region 30 .
  • the s b is a reference area defined in advance for the hand region.
  • the h b and w b are respectively the height and width of the analysis target region 30 defined in advance in association with the reference area.
  • the s r is an area of the hand region detected from the captured image 12 by the detection unit 2020 .
  • the size of the analysis target region 30 may be dynamically decided using the following equation (2).
  • the h and w are respectively the height and width of the analysis target region 30 .
  • the d b is a reference distance value defined in advance.
  • the h b and w b are respectively the height and width of the analysis target region 30 associated with the reference distance value.
  • the d r is a distance value between the reference position 24 detected from the captured image 12 and the camera 10 .
  • the detection unit 2020 determines the distance value dr based on a pixel value at the reference position 24 in the depth image generated by the depth camera.
  • the detection unit 2020 may be configured to detect the distance between the reference position 24 and the camera 10 in addition to the reference position 24 when the detection unit 2020 is configured as the detector using the machine learning.
  • each pixel of the analysis target region 30 decided by the above method may be corrected, and the corrected analysis target region 30 may be used for the image analysis by the determination unit 2060 .
  • the deciding unit 2040 corrects each pixel in the analysis target region 30 using, for example, the following equation (3).
  • the d (x,y)0 is a pixel value before the correction at coordinates (x,y) of the analysis target region 30 in the captured image 12 .
  • the d (x,y)1 is a pixel value after the correction at the coordinates (x,y) of the analysis target region 30 in the captured image 12 .
  • the determination unit 2060 performs the image analysis on the decided analysis target region 30 to determine the motion of the customer 20 (S 108 and S 110 ).
  • the motion of the customer 20 is, for example, any of (1) motion of taking out the product 40 from the product shelf 50 , (2) motion of placing the product 40 on the product shelf 50 , (3) motion of not holding the product 40 both before and after the contact with the product shelf 50 , and (4) motion of holding the product 40 both before and after the contact with the product shelf 50 .
  • the contact between the product shelf 50 and the customer 20 means that the image region of the product shelf 50 and the image region of the customer 20 overlap at least partially in the captured image 12 , and there is no need for the product shelf 50 and the customer to contact each other in the real space.
  • a product 40 held by the customer 20 before the contact between the customer 20 and the product shelf 50 may be the same as or different from a product 40 held by the customer 20 after the contact between the customer 20 and the product shelf 50 .
  • FIGS. 10 and 11 are flowcharts illustrating the flow of the processing for determining the motion of the customer 20 .
  • the determination unit 2060 detects the captured image 12 including a scene in which the reference position 24 moves toward the product shelf 50 (S 202 ).
  • the determination unit 2060 computes a distance between the reference position 24 and the product shelf 50 for each of the plurality of captured images 12 in a time series. In a case where the distance decreases over time in one or more captured images 12 , the captured images 12 are detected as the captured image 12 including the scene in which the reference position 24 moves toward the product shelf 50 .
  • the determination unit 2060 decides whether the product 40 is included in the analysis target region 30 in the captured image 12 detected in S 202 (S 204 ). In a case where the product 40 is included in the analysis target region 30 (YES in S 204 ), the processing in FIG. 10 proceeds to S 206 . On the other hand, in a case where the product 40 is not included in the analysis target region 30 (NO in S 204 ), the processing in FIG. 10 proceeds to S 216 .
  • the determination unit 2060 detects a captured image 12 including a scene in which the reference position 24 moves in a direction away from the product shelf 50 from among the captured images 12 generated later than the captured image 12 detected in S 202 (S 206 ). For example, the determination unit 2060 computes the distance between the reference position 24 and the product shelf 50 for each of the plurality of captured images 12 in a time series generated later than the captured image 12 detected in S 202 . In a case where the distance increases over time in one or more captured images 12 , the captured images 12 are detected as the captured image 12 including the scene in which the reference position 24 moves in the direction away from the product shelf 50 .
  • the determination unit 2060 decides whether the product 40 is included in the analysis target region 30 in the captured image 12 detected in S 206 (S 208 ). In a case where the product 40 is included in the analysis target region 30 (YES in S 208 ), the product 40 is held in both a hand moving toward the product shelf 50 and a hand moving in the direction away from the product shelf 50 . Therefore, the determination unit 2060 determines that the motion of the customer 20 is “(4) motion of holding the product 40 both before and after the contact with the product shelf 50 ” (S 210 ).
  • the determination unit 2060 determines that the motion of the customer 20 is “(2) motion of placing the product 40 on the product shelf 50 ” (S 212 ).
  • the determination unit 2060 detects the captured image 12 including the scene in which the reference position 24 moves in the direction away from the product shelf 50 from among the captured images 12 generated later than the captured image 12 detected in S 202 .
  • the detection method is the same as the method executed in S 206 .
  • the determination unit 2060 decides whether the product 40 is included in the analysis target region 30 in the captured image 12 detected in S 214 (S 216 ). In a case where the product 40 is included in the analysis target region 30 (YES in S 216 ), the product 40 is held in the hand moving in the direction away from the product shelf 50 while the product 40 is not held in the hand moving toward the product shelf 50 . Therefore, the determination unit 2060 determines that the motion of the customer 20 is “(1) motion of taking out the product 40 from the product shelf 50 ” (S 218 ).
  • the determination unit 2060 determines that the motion of the customer 20 is “(3) motion of not holding the product 40 both before and after contact with the product shelf 50 ” (S 220 ).
  • the determination unit 2060 first extracts an image region excluding a background region, that is, a foreground region, from the analysis target region 30 decided for each of the plurality of captured images 12 in a time series. Note that an existing technique can be used as a technique of determining the background region for a captured image to be imaged by the camera 10 installed at a predetermined place.
  • the determination unit 2060 decides that the product 40 is included in the analysis target region 30 in a case where the foreground region includes a region other than the image region representing the hand of the customer 20 .
  • the determination unit 2060 may decide that the product 40 is included in the analysis target region 30 only in a case where a size of the image region excluding the image region representing the hand in the foreground region is equal to or larger than a predetermined size. With this, it is possible to prevent the noise included in the captured image 12 from being erroneously detected as the product 40 .
  • the method of deciding whether the product 40 is included in the analysis target region is not limited to the method described above. Various existing methods can be used as the method of deciding whether the product 40 is included in the analysis target region 30 , that is, whether the hand of the person included in the image has the product.
  • the determination unit 2060 may determine the motion of the customer 20 from one captured image 12 . For example, in this case, the determination unit 2060 determines the motion of the customer 20 as “holding the product 40 ” or “not holding the product 40 ”.
  • the determination unit 2060 may determine the taken-out product 40 when the customer takes out the product 40 from the product shelf 50 .
  • the determination of the product 40 means, for example, that information for identifying the product 40 from other products 40 (for example, an identifier or a name of the product 40 ) is determined.
  • the information for identifying the product 40 is referred to as product identification information.
  • the determination unit 2060 determines a place in the product shelf 50 where the customer 20 takes out the product 40 to determine the taken-out product 40 .
  • the display place of the product 40 is defined in advance.
  • information indicating which product is displayed at each position of the product shelf 50 is referred to as display information.
  • the determination unit 2060 determines a place in the product shelf 50 from which the customer 20 takes out a product 40 using the captured image 12 and determines the taken-out product 40 using the determined place and the display information.
  • FIG. 12 is a diagram illustrating the display information in a table format.
  • a table shown in FIG. 12 is referred to as a table 200 .
  • the table 200 is created for each product shelf 50 .
  • the table 200 has two columns of a stage 202 and product identification information 204 .
  • the product identification information 204 indicates the identifier of the product 40 .
  • a record in a first row displays a product 40 determined by an identifier i 0001 in a first stage of the product shelf 50 .
  • the determination unit 2060 determines the stage of the product shelf 50 from which the product 40 is taken out, using the captured image 12 .
  • the determination unit 2060 acquires the product identification information associated with the stage in display information to determine the product 40 taken out from the product shelf 50 .
  • several methods of determining the stage of the product shelf 50 from which the product 40 is taken out will be illustrated.
  • the captured image 12 includes a scene in which the product shelf 50 is imaged from above (refer to FIG. 5 ).
  • the camera 10 images the product shelf 50 from above.
  • the depth camera is used as the camera 10 .
  • the depth camera generates a depth image in addition to or instead of a common captured image.
  • the depth image is an image in which the value of each pixel of the image represents the distance between the imaged item and the camera.
  • FIG. 13 is a diagram illustrating a depth image generated by the camera 10 .
  • pixels representing an item closer to the camera 10 are closer to white (brighter) and pixels representing an item farther from the camera 10 are closer to black (darker). Note that darker portions are densely drawn with larger black dots and brighter portions are sparsely drawn with smaller black dots in FIG. 13 for convenience of illustration.
  • the determination unit 2060 determines a stage of the product shelf 50 where the reference position 24 is present, based on the value of the pixel representing the reference position 24 in the depth image. At this time, a range of a distance from the camera 10 for each stage of the product shelf 50 is defined in advance in the display information.
  • FIG. 14 is a diagram illustrating display information indicating the range of the distance from the camera 10 for each stage of the product shelf 50 .
  • the table 200 in FIG. 14 indicates that the range of the distance between a first shelf of the product shelf 50 and the camera 10 is equal to or larger than d 1 and less than d 2 . In other words, the distance between the top of the first shelf and the camera 10 is d 1 , and the distance between the top of a second shelf and the camera 10 is d 2 .
  • the determination unit 2060 determines a stage of the product shelf 50 where the reference position 24 is present, based on the reference position 24 of the depth image including a scene in which the customer 20 takes out the product 40 and the display information shown in FIG. 14 .
  • the determined stage is defined as the stage from which the product 40 is taken out.
  • the pixel at the reference position 24 in the depth image indicates that the distance between the reference position 24 and the camera 10 is a.
  • a is equal to or larger than d 1 and equal to or less than d 2 .
  • the determination unit 2060 determines that the reference position 24 is present on the first shelf of the product shelf 50 based on the display information shown in FIG. 14 . That is, the determination unit 2060 determines that the shelf from which the product 40 is taken out is the first shelf of the product shelf 50 .
  • the captured image 12 includes a scene in which the product shelf 50 is viewed from the side.
  • the camera 10 images the product shelf 50 from the lateral direction.
  • the determination unit 2060 determines a stage in the product shelf 50 where a position of the reference position 24 in the height direction (Y coordinates) detected from the captured image 12 is present.
  • the determined stage is defined as the stage of the product shelf 50 from which the product 40 is taken out.
  • the captured image 12 may be a depth image or a common image.
  • a plurality of types of products may be displayed on one stage by dividing one stage of the product shelf 50 into a plurality of columns in the horizontal direction.
  • the determination unit 2060 respectively determines a position in the horizontal direction and a position in the height direction for the reference position 24 of the hand of the customer 20 who takes out the product 40 from the product shelf 50 to determine the product 40 .
  • the product identification information is shown for each combination of stage and column in the display information.
  • the position of the reference position 24 in the horizontal direction is determined by the X coordinates of the reference position 24 in the captured image 12 .
  • the determination unit 2060 determines the position of the reference position 24 in the horizontal direction using the depth image.
  • the method of determining the position of the reference position 24 in the horizontal direction, using the depth image including a scene in which the product shelf 50 is imaged from the lateral direction is the same as the method of determining the position of the reference position 24 in the height direction, using the depth image including the scene in which the product shelf 50 is imaged from above.
  • the determination unit 2060 may determine the product 40 to be placed on the product shelf 50 by the similar method. However, in this case, the determination unit 2060 uses a captured image 12 including a scene in which the product 40 is placed on the product shelf 50 .
  • the determination unit 2060 may decide whether the products 40 held by the customer 20 before and after the contact between the customer 20 and the product shelf 50 are the same based on the method of determining the product 40 described above. For example, the determination unit 2060 determines the product 40 before the contact between the customer 20 and the product shelf by the same method as the method of determining the product 40 to be placed on the product shelf 50 . Furthermore, the determination unit 2060 determines the product 40 after the contact between the customer 20 and the product shelf 50 by the same method as the method of determining the product 40 to be taken out from the product shelf 50 .
  • the determination unit 2060 decides that the products 40 held by the customer 20 before and after the contact between the customer 20 and the product shelf 50 are the same. In this case, it can be said that the motion of the customer 20 is a “motion of reaching for the product shelf 50 to place the product 40 , but not placing the product 40 ”.
  • the determination unit 2060 decides that the products 40 held by the customer 20 before and after the contact between the customer 20 and the product shelf 50 are different from each other. In this case, it can be said that the motion of the customer 20 is a “motion of placing the held product 40 and taking out another product 40 ”.
  • the determination unit 2060 computes magnitude of a difference (difference in area or color) between the foreground region of the analysis target region 30 before the contact between the customer 20 and the product shelf 50 and the foreground region of the analysis target region 30 after the contact between the customer 20 and the product shelf and decides that the products 40 before and after the contact are different from each other in a case where the magnitude of the computed difference is equal to or larger than a predetermined value.
  • the determination unit 2060 decides that the products 40 before and after the contact are the same in a case where the magnitude of the difference is less than the predetermined value.
  • the determination unit 2060 decides whether the products 40 before and after the contact are the same based on the difference in the reference positions 24 before and after the contact between the customer 20 and the product shelf 50 .
  • the determination unit 2060 respectively determines, using the display information described above, a stage of the product shelf 50 where the reference position 24 is present before the contact between the customer 20 and the product shelf 50 and a stage of the product shelf 50 where the reference position 24 is present after the contact between the customer 20 and the product shelf 50 .
  • the determination unit 2060 decides that the products 40 before and after the contact are different from each other.
  • the determination unit 2060 decides that the products 40 before and after the contact are the same.
  • the motion of the customer 20 determined by the determination unit 2060 can be used to analyze an action performed in front of the product shelf 50 (so-called front-shelf action) by the customer 20 .
  • the determination unit 2060 outputs various pieces of information such as a motion performed in front of the product shelf 50 by each customer 20 , a date and time when the motion is performed, and a product 40 subjected to the motion.
  • This information is, for example, stored in a storage apparatus connected to the information processing apparatus 2000 or transmitted to a server apparatus connected to the information processing apparatus 2000 in a communicable manner.
  • various existing methods can be used as the method of analyzing the front-shelf action based on various motions of the customer 20 performed in front of the product shelf 50 .
  • a usage scene of the information processing apparatus 2000 is not limited to the determination of the motion of the customer in the store.
  • the information processing apparatus 2000 can be used to determine the motion of a factory worker or the like.
  • the motion of each worker determined by the information processing apparatus 2000 is compared with a motion of each worker defined in advance, and thus it is possible to confirm whether the worker correctly performs a predetermined job.

Abstract

An information processing apparatus (2000) analyzes a captured image (12) generated by a camera (10) to determine a motion of a person. The camera (10) is a camera that images a display place where an item is displayed. The information processing apparatus (2000) detects a reference position (24) from the captured image (12). The reference position (24) indicates a position of a hand of the person. The information processing apparatus (2000) decides an analysis target region (30) to be analyzed in the captured image (12) using the reference position (24). The information processing apparatus (2000) analyzes the analysis target region (30) to determine the motion of the person.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation application of U.S. patent application Ser. No. 17/349,045, filed on Jun. 16, 2021, which is a continuation application of U.S. patent application Ser. No. 16/623,656, filed on Dec. 17, 2019, which is a national stage application of International Application No. PCT/JP2017/022875, filed on Jun. 21, 2017, the disclosure of which is hereby incorporated by reference in its entirety.
  • TECHNICAL FIELD Technical Field
  • The present invention relates to image analysis.
  • Background Art
  • In a store, a customer takes out and purchases a product displayed in a display place (for example, a product shelf). The customer may return the product once picked up to the display place. Techniques of analyzing such an action of the customer related to the product displayed are developed.
  • For example, Patent Document 1 discloses a technique of detecting that an item (a hand of person) enters a determined region (shelf) using a depth image obtained from an imaging result by a depth camera and determining a motion of a customer using a color image near an entry position before and after the entry. Specifically, a color image including a hand of a person entering the determined region is compared with a color image including the hand of the person leaving the determined region to respectively determine the motions of the person as “acquisition of product” in a case where an increase in a color exceeds a threshold value, “return of product” in a case where a decrease in the color exceeds a threshold value, and “contact” in a case where a change in the color is less than a threshold value. Further, Patent Document 1 discloses a technique of deciding the increase or decrease in a volume of a subject from information on a size of the subject obtained from the imaging result of the depth camera to distinguish between the acquisition and the return of the product.
  • RELATED DOCUMENT Patent Document
    • [Patent Document 1] US Patent Application No. 2014/0132728
    SUMMARY OF THE INVENTION Technical Problem
  • A degree of increase or decrease in color or volume before and after the entry of the hand of the person into the display place is affected by, for example, changes in a size of the product or a pose of the hand of the person. For example, in a case where a small product is taken out from the display place, the increase in color and volume before and after that is small. Further, a motion of changing the pose of the hand may be erroneously recognized as the motion of acquiring the product.
  • The present invention is made in view of the above problems. One of the objects of the present invention is to provide a technique of determining a motion of a person with respect to a displayed item with high accuracy.
  • Solution to Problem
  • The information processing apparatus according to the present invention includes: 1) a detection unit that detects a reference position indicating a position of a hand of a person included in a captured image from the captured image in which a display place of an item is imaged; 2) a deciding unit that decides an analysis target region in the captured image using the detected reference position and decides the analysis target region; and 3) a determination unit that analyzes the decided analysis target region to determine a motion of the person.
  • A control method of the present invention is executed by a computer. The control method includes: 1) a detection step of detecting a reference position indicating a position of a hand of a person included in a captured image from the captured image in which a display place of an item is imaged; 2) a deciding step of deciding an analysis target region in the captured image using the detected reference position and deciding the analysis target region; and 3) a determination step of analyzing the decided analysis target region to determine a motion of the person.
  • A program of the present invention causes a computer to execute each step of the control method of the present invention.
  • Advantageous Effects of Invention
  • According to this invention, there is provided the technique of determining the motion of the person with respect to the displayed item with high accuracy.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The objects described above and other objects, features, and advantages will become more apparent from preferred example embodiments described below and the following drawings accompanying the example embodiments.
  • FIG. 1 is a diagram conceptually illustrating an operation of an information processing apparatus according to an example embodiment 1.
  • FIG. 2 is a block diagram illustrating an example of a functional configuration of the information processing apparatus according to the example embodiment 1.
  • FIG. 3 is a diagram illustrating a computer for forming the information processing apparatus.
  • FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus according to the example embodiment 1.
  • FIG. 5 is a first diagram illustrating an imaging range of a camera.
  • FIG. 6 is a second diagram illustrating the imaging range of the camera.
  • FIG. 7 is a diagram illustrating a case where a captured image includes a scene in which a product shelf is imaged from the right side as viewed from the front.
  • FIGS. 8A and 8B are diagrams illustrating an analysis target region that is decided as a region having a predetermined shape defined with a reference position as a reference.
  • FIG. 9 is a diagram illustrating a case where an orientation of the analysis target region is defined based on an orientation of a hand of a customer.
  • FIG. 10 is a flowchart illustrating a flow of processing for determining a motion of a customer 20.
  • FIG. 11 is a flowchart illustrating the flow of the processing for determining the motion of the customer 20.
  • FIG. 12 is a diagram illustrating display information in a table format.
  • FIG. 13 is a diagram illustrating a depth image generated by the camera.
  • FIG. 14 is a diagram illustrating display information indicating a range of a distance from a camera for each stage of the product shelf.
  • DESCRIPTION OF EMBODIMENTS
  • Hereinafter, example embodiments of the present invention will be described with reference to drawings. Note that, in all the drawings, the same reference numeral is assigned to the same component and the description thereof will not be repeated. Further, each block represents a configuration of functional units instead of a configuration of hardware units in each block diagram, unless otherwise described.
  • Example Embodiment 1
  • <Outline of Operation of Information Processing Apparatus 2000>
  • FIG. 1 is a diagram conceptually illustrating an operation of an information processing apparatus according to an example embodiment 1 (information processing apparatus 2000 shown in FIG. 2 and the like described below). Note that FIG. 1 is an illustration for easily understanding the operation of the information processing apparatus 2000 and the operation of the information processing apparatus 2000 is not limited by FIG. 1 .
  • The information processing apparatus 2000 analyzes a captured image 12 generated by a camera 10 to determine a motion of a person. The camera 10 is a camera that images a display place where an item is displayed. The camera 10 repeatedly performs imaging and generates a plurality of captured images 12. The plurality of generated captured images 12 are, for example, a frame group that constitutes video data. However, the plurality of captured images 12 generated by the camera 10 do not necessarily need to constitute the video data and may be handled as individual still image data.
  • An item to be imaged by the camera 10 can be any item that is displayed at the display place, and is taken out from the display place by a person or is placed (returned) in the display place by a person on the contrary. A specific item to be imaged by the camera 10 varies depending on a usage environment of the information processing apparatus 2000.
  • For example, it is assumed that the information processing apparatus 2000 is used to determine the motion of a customer or a store clerk in a store. In this case, the item to be imaged by the camera 10 is a product sold in the store. Further, the display place described above is, for example, a product shelf. In FIG. 1 , the information processing apparatus 2000 is used to determine the motion of a customer 20. Therefore, the person and the item to be imaged by the camera 10 are respectively the customer 20 and a product 40. Further, the display place is a product shelf 50.
  • In addition, for example, it is assumed that the information processing apparatus 2000 is used to determine the motion of a factory worker or the like. In this case, the person to be imaged by the camera 10 is the worker or the like. Further, the item to be imaged by the camera 10 is a material, a tool, or the like which is used in the factory. Furthermore, the display place is a shelf installed in, for example, a warehouse of the factory.
  • For ease of explanation, a case where the information processing apparatus 2000 is used to determine the motion of the customer (customer 20 in FIG. 1 ) in the store will be described as an example, unless otherwise noted in this specification. Therefore, it is assumed that the “motion of person” determined by the determination unit 2060 is the “motion of customer”. Further, it is assumed that the “item” to be imaged by the camera is the “product”. Furthermore, it is assumed that the “display place” is the “product shelf”.
  • The information processing apparatus 2000 detects a reference position 24 from the captured image 12. The reference position 24 indicates a position of a hand of the person. The position of the hand of the person is, for example, a center position of the hand or a position of a fingertip. The information processing apparatus 2000 decides a region to be analyzed (analysis target region 30) in the captured image 12 using this reference position 24. The information processing apparatus 2000 analyzes the analysis target region 30 to determine the motion of the customer 20. For example, the motion of the customer 20 is a motion of holding the product 40, a motion of taking out the product 40 from the product shelf 50, or a motion of placing the product 40 on the product shelf 50.
  • Advantageous Effect
  • In a case where it is intended that image analysis is performed on the entire captured image 12 to determine the motion of the customer 20, the motion may not be accurately determined when a size of the product 40 is small or when a pose of the hand of the customer 20 varies significantly. In this regard, the information processing apparatus 2000 first detects the reference position 24 indicating the position of the hand of the customer 20 in the captured image 12 and decides the analysis target region 30 based on the reference position 24. That is, the image analysis is performed near the hand of the customer 20. Therefore, even when the size of the product 40 is small or when the pose of the hand of the customer 20 varies significantly, it is possible to determine the motion by the hand of the customer 20 such as acquiring the product 40, placing the product 40, or holding the product 40 with high accuracy.
  • Hereinafter, the information processing apparatus 2000 according to the present example embodiment will be described in more detail.
  • Example of Functional Configuration of Information Processing Apparatus 2000
  • FIG. 2 is a block diagram illustrating an example of a functional configuration of the information processing apparatus 2000 according to the example embodiment 1. The information processing apparatus 2000 has a detection unit 2020, a deciding unit 2040, and a determination unit 2060. The detection unit 2020 detects the reference position 24 of the hand of the person included in the captured image 12 from the captured image 12. The deciding unit 2040 decides the analysis target region 30 in the captured image 12 using the detected reference position 24. The determination unit 2060 analyzes the decided analysis target region 30 to determine the motion of the person.
  • Example of Hardware Configuration of Information Processing Apparatus 2000
  • Each functional configuration unit of the information processing apparatus 2000 may be formed by hardware (for example, a hard-wired electronic circuit) that forms each functional configuration unit or a combination of hardware and software (for example, a combination of an electronic circuit and a program that controls the circuit). Hereinafter, the case where each functional configuration unit of the information processing apparatus 2000 is formed by the combination of hardware and software will be further described.
  • FIG. 3 is a diagram illustrating a computer 1000 for forming the information processing apparatus 2000. The computer 1000 is a variety of computers. For example, the computer 1000 is a personal computer (PC), a server machine, a tablet terminal, or a smartphone. In addition, for example, the computer 1000 may be the camera 10. The computer 1000 may be a dedicated computer designed to form the information processing apparatus 2000 or may be a general-purpose computer.
  • The computer 1000 has a bus 1020, a processor 1040, a memory 1060, a storage device 1080, an input and output interface 1100, and a network interface 1120. The bus 1020 is a data transmission path for the processor 1040, the memory 1060, the storage device 1080, the input and output interface 1100, and the network interface 1120 to mutually transmit and receive data. However, a method of mutually connecting the processors 1040 and the like is not limited to the bus connection. The processor 1040 is an arithmetic apparatus such as a central processing unit (CPU) or a graphics processing unit (GPU). The memory 1060 is a main storage apparatus formed by a random access memory (RAM) or the like. The storage device 1080 is an auxiliary storage apparatus formed by a hard disk, a solid state drive (SSD), a memory card, a read only memory (ROM), or the like. However, the storage device 1080 may be configured by hardware similar to the hardware used to configure the main storage apparatus, such as the RAM.
  • The input and output interface 1100 is an interface for connecting the computer 1000 to an input and output device. The network interface 1120 is an interface for connecting the computer 1000 to a communication network. This communication network is, for example, a local area network (LAN) or a wide area network (WAN). The method of connecting the network interface 1120 to the communication network may be a wireless connection or a wired connection.
  • For example, the computer 1000 is connected to the camera 10 in a communicable manner through a network. However, the method of connecting the computer 1000 to the camera 10 in a communicable manner is not limited to connection through the network. However, the computer 1000 does not necessarily need to be connected to the camera 10 in a communicable manner as long as the captured image 12 generated by the camera 10 is acquired.
  • The storage device 1080 stores program modules to form the respective functional configuration units (the detection unit 2020, the deciding unit 2040, and the determination unit 2060) of the information processing apparatus 2000. The processor 1040 reads each of the program modules into the memory 1060 and executes each program module to realize a function corresponding to each program module.
  • <About Camera 10>
  • The camera 10 is any camera that can repeatedly perform the imaging and generate the plurality of captured images 12. The camera 10 may be a video camera that generates the video data or a still camera that generates still image data. Note that the captured image 12 is a video frame constituting the video data in the former case.
  • The camera 10 may be a two-dimensional camera or a three-dimensional camera (stereo camera or depth camera). Note that the captured image 12 may be a depth image in a case where the camera 10 is the depth camera. The depth image is an image in which a value of each pixel of the image represents a distance between an imaged item and the camera. Furthermore, the camera 10 may be an infrared camera.
  • As described above, the computer 1000 that forms the information processing apparatus 2000 may be the camera 10. In this case, the camera 10 analyzes the captured image 12 generated by itself to determine the motion of the customer 20. As the camera 10 having such a function, for example, an intelligent camera, a network camera, or a camera called an Internet protocol (IP) camera can be used.
  • <Flow of Processing>
  • FIG. 4 is a flowchart illustrating a flow of processing executed by the information processing apparatus 2000 according to the example embodiment 1. The detection unit 2020 acquires the captured image 12 (S102). The detection unit 2020 detects the reference position 24 of the hand of the product 40 from the acquired captured image 12 (S104). The deciding unit 2040 decides the analysis target region 30 using the detected reference position 24 (S106). The determination unit 2060 performs the image analysis on the decided analysis target region (S108). The determination unit 2060 determines the motion of the customer 20 based on the result of the image analysis of the analysis target region 30 (S108).
  • Here, the plurality of captured images 12 may be used to determine the motion of the customer 20. In this case, the image analysis is performed on the analysis target regions 30 decided for each of the plurality of captured images 12 (image analysis is performed on a plurality of analysis target regions 30) to determine the motion of the customer 20. That is, the processing of S102 to S108 is performed for each of the plurality of captured images 12, and the processing of S110 is performed using the result.
  • <Timing when Information Processing Apparatus 2000 Executes Processing>
  • There are various timings when the information processing apparatus 2000 executes a series of pieces of processing shown in FIG. 4 . For example, each time the captured image 12 is generated by the camera 10, the information processing apparatus 2000 executes the series of pieces of processing shown in FIG. 4 for the captured image 12.
  • In addition, for example, the information processing apparatus 2000 executes the series of pieces of processing shown in FIG. 4 at a predetermined time interval (for example, every second). In this case, for example, the information processing apparatus 2000 acquires the latest captured image 12 generated by the camera 10 at the timing of starting the series of pieces of processing shown in FIG. 4 .
  • <Acquisition of Captured Image 12: S102>
  • The detection unit 2020 acquires the captured image 12 (S102). Any method of the detection unit 2020 to acquire the captured image 12 may be employed. For example, the detection unit 2020 receives the captured image 12 transmitted from the camera 10. Further, for example, the detection unit 2020 accesses the camera 10 and acquires the captured image 12 stored in the camera 10.
  • Note that the camera 10 may store the captured image 12 in a storage apparatus provided outside the camera 10. In this case, the detection unit 2020 accesses the storage apparatus and acquires the captured image 12.
  • In a case where the information processing apparatus 2000 is formed by the camera 10, the information processing apparatus 2000 acquires the captured image 12 generated by the information processing apparatus 2000 itself. In this case, the captured image 12 is stored in, for example, the memory 1060 or the storage device 1080 (refer to FIG. 3 ) inside the information processing apparatus 2000. Therefore, the detection unit 2020 acquires the captured image 12 from the memory 1060 or the storage device 1080.
  • The captured image 12 (that is, an imaging range of the camera 10) includes at least a range in front of the product shelf 50. FIG. 5 is a first diagram illustrating the imaging range of the camera 10. In FIG. 5 , an imaging range 14 of the camera 10 includes a range of a distance d1 from the front surface of the product shelf 50 to the front side.
  • Note that the imaging range of the camera 10 may not include the product shelf 50. FIG. 6 is a second diagram illustrating the imaging range of the camera 10. In FIG. 6 , the imaging range 14 of the camera 10 includes a range from a position apart from the front surface of the product 40 to the front side by d2 to a position apart from the front side of the product 40 to the front side by d3.
  • Further, the captured images 12 in FIGS. 5 and 6 include scenes in which the product shelf 50 is viewed down from above. In other words, the camera 10 is installed so as to image the product shelf 50 from above the product shelf 50. However, the captured image 12 may not include the scene in which the product shelf 50 is viewed down from above. For example, the captured image 12 may include a scene in which the product shelf 50 is imaged from the side. FIG. 7 is a diagram illustrating a case where the captured image 12 includes a scene in which the product shelf 50 is imaged from the right side as viewed from the front.
  • <Detection of Reference Position 24: S104>
  • The detection unit 2020 detects the reference position 24 from the captured image 12 (S104). As described above, the reference position 24 indicates the position of the hand of the customer 20. As described above, the position of the hand of the customer 20 is, for example, the center position of the hand or the position of the fingertip. There are various methods for the detection unit 2020 to detect the reference position 24 from the captured image 12. For example, the detection unit 2020 performs feature value matching using a feature value of the hand of the person, which is prepared in advance, to detect a region matching the feature value (with high similarity to the feature value) from the captured image 12. The detection unit 2020 detects a predetermined position (for example, center position) of the detected region, that is, a region representing the hand as the reference position 24 of the hand.
  • In addition, for example, the detection unit 2020 may detect the reference position 24 using machine learning. Specifically, the detection unit 2020 is configured as a detector using the machine learning. In this case, the detection unit 2020 is caused to learn in advance using one or more captured images (a set of a captured image and coordinates of the reference position 24 in the captured image) in which the reference positions 24 are known. With this, the detection unit 2020 can detect the reference position 24 from the acquired captured image 12. Note that various models such as a neural network can be used as a machine learning prediction model.
  • Here, the learning of the detection unit 2020 is preferably performed on the hand of the customer 20 in various poses. Specifically, captured images for learning are prepared for the hand of customers 20 in various poses. With this, it is possible to detect the reference position 24 from each captured image 12 with high accuracy even though the pose of the hand of the customer 20 is different for each captured image 12.
  • Here, the detection unit 2020 may detect various parameters relating to the hand of the customer 20 in addition to the reference position 24. For example, the detection unit 2020 detects a width, length, and pose of the hand, and a distance between the reference position 24 and the camera 10. In a case where the feature value matching is used, the detection unit 2020 determines the width, length, pose, and the like of the hand from a shape and size of a detected hand region. In a case where the machine learning is used, the detection unit 2020 is caused to learn using one or more captured images in which the width, length, and pose of the hand, the distance between the reference position 24 and the camera 10, and the like are known. With this, it is possible for the detection unit 2020 to detect various parameters such as the hand width in addition to the reference position 24 from the acquired captured image 12.
  • <Decision of Analysis Target Region 30: S106>
  • The deciding unit 2040 decides the analysis target region 30 using the detected reference position 24 (S106). There are various methods for the deciding unit 2040 to decide the analysis target region 30. For example, the deciding unit 2040 is a region having a predetermined shape defined with the reference position 24 as a reference among the regions included in the captured image 12.
  • FIG. 8 are diagrams illustrating the analysis target region 30 that is decided as the region having the predetermined shape defined with the reference position 24 as the reference. FIG. 8A represents a case where the reference position 24 is used as a position representing a predetermined position of the analysis target region 30. Specifically, the analysis target region in FIG. 8A is a rectangle with the reference position 24 as the center. The analysis target region 30 is a rectangle having a height h and a width w. Note that the reference position 24 may be used as a position that defines a position other than the center of the analysis target region 30 such as the upper left end or lower right end of the analysis target region 30.
  • FIG. 8B represents a case where a predetermined position (center, upper left corner, or the like) of the analysis target region 30 is defined by a position having a predetermined relationship with the reference position 24. Specifically, the analysis target region 30 in FIG. 8B is a rectangle with a position moved by a predetermined vector v from the reference position 24 as the center. The size and orientation of the rectangle are the same as the analysis target region in FIG. 8A.
  • In the example of FIGS. 8 , the orientation of the analysis target region 30 is defined based on an axial direction of the captured image 12. More specifically, a height direction of the analysis target region 30 is defined as a Y-axis direction of the captured image 12. However, the orientation of the analysis target region 30 may be defined based on a direction other than the axial direction of the captured image 12.
  • For example, it is assumed that the detection unit 2020 detects the pose of the hand of the customer 20. In this case, the orientation of the analysis target region 30 may be defined based on the orientation of the hand. FIG. 9 is a diagram illustrating a case where the orientation of the analysis target region 30 is defined based on the orientation of the hand of the customer 20. In FIG. 9 , the orientation of the analysis target region 30 is defined as a depth direction of the hand (direction from the wrist to the fingertip).
  • Note that the orientation of the analysis target region 30 in each of the plurality of captured images 12 may be different in a case where the orientation of the analysis target region 30 is defined based on the orientation of the hand of the customer 20 as described above. Therefore, it is preferable that the deciding unit 2040 performs geometric conversion such that the orientations of the plurality of analysis target regions 30 are aligned. For example, the deciding unit 2040 extracts the analysis target region 30 from each captured image 12 and performs the geometric conversion on each extracted analysis target region 30 such that the depth direction of the hand of the customer 20 faces the Y-axis direction.
  • The size of the analysis target region 30 may be defined statically or may be decided dynamically. In the latter case, the size of the analysis target region 30 is decided by, for example, the following equation (1).
  • [ Equation 1 ] h = h b × s r s b w = w b × s r s b ( 1 )
  • The h and w are respectively the height and width of the analysis target region 30. The sb is a reference area defined in advance for the hand region. The hb and wb are respectively the height and width of the analysis target region 30 defined in advance in association with the reference area. The sr is an area of the hand region detected from the captured image 12 by the detection unit 2020.
  • In addition, for example, the size of the analysis target region 30 may be dynamically decided using the following equation (2).
  • [ Equation 2 ] h = h b × d b d r w = w b × d b d r ( 2 )
  • The h and w are respectively the height and width of the analysis target region 30. The db is a reference distance value defined in advance. The hb and wb are respectively the height and width of the analysis target region 30 associated with the reference distance value. The dr is a distance value between the reference position 24 detected from the captured image 12 and the camera 10.
  • There are various methods of determining the distance value dr. For example, the detection unit 2020 determines the distance value dr based on a pixel value at the reference position 24 in the depth image generated by the depth camera. In addition, for example, the detection unit 2020 may be configured to detect the distance between the reference position 24 and the camera 10 in addition to the reference position 24 when the detection unit 2020 is configured as the detector using the machine learning.
  • Here, each pixel of the analysis target region 30 decided by the above method may be corrected, and the corrected analysis target region 30 may be used for the image analysis by the determination unit 2060. The deciding unit 2040 corrects each pixel in the analysis target region 30 using, for example, the following equation (3).

  • [Equation 3]

  • d (x,y)1 =d (x,y) 0 +(d r −d b)  (3)
  • The d(x,y)0 is a pixel value before the correction at coordinates (x,y) of the analysis target region 30 in the captured image 12. The d(x,y)1 is a pixel value after the correction at the coordinates (x,y) of the analysis target region 30 in the captured image 12.
  • <Determination of Motion of Customer 20: S108, S110>
  • The determination unit 2060 performs the image analysis on the decided analysis target region 30 to determine the motion of the customer 20 (S108 and S110). The motion of the customer 20 is, for example, any of (1) motion of taking out the product 40 from the product shelf 50, (2) motion of placing the product 40 on the product shelf 50, (3) motion of not holding the product 40 both before and after the contact with the product shelf 50, and (4) motion of holding the product 40 both before and after the contact with the product shelf 50.
  • Here, “the contact between the product shelf 50 and the customer 20” means that the image region of the product shelf 50 and the image region of the customer 20 overlap at least partially in the captured image 12, and there is no need for the product shelf 50 and the customer to contact each other in the real space. Further, in (4) described above, a product 40 held by the customer 20 before the contact between the customer 20 and the product shelf 50 may be the same as or different from a product 40 held by the customer 20 after the contact between the customer 20 and the product shelf 50.
  • A flow of processing of discriminating the four motions described above is, for example, a flow shown in FIG. 10 . FIGS. 10 and 11 are flowcharts illustrating the flow of the processing for determining the motion of the customer 20. First, the determination unit 2060 detects the captured image 12 including a scene in which the reference position 24 moves toward the product shelf 50 (S202). For example, the determination unit 2060 computes a distance between the reference position 24 and the product shelf 50 for each of the plurality of captured images 12 in a time series. In a case where the distance decreases over time in one or more captured images 12, the captured images 12 are detected as the captured image 12 including the scene in which the reference position 24 moves toward the product shelf 50.
  • Furthermore, the determination unit 2060 decides whether the product 40 is included in the analysis target region 30 in the captured image 12 detected in S202 (S204). In a case where the product 40 is included in the analysis target region 30 (YES in S204), the processing in FIG. 10 proceeds to S206. On the other hand, in a case where the product 40 is not included in the analysis target region 30 (NO in S204), the processing in FIG. 10 proceeds to S216.
  • In S206, the determination unit 2060 detects a captured image 12 including a scene in which the reference position 24 moves in a direction away from the product shelf 50 from among the captured images 12 generated later than the captured image 12 detected in S202 (S206). For example, the determination unit 2060 computes the distance between the reference position 24 and the product shelf 50 for each of the plurality of captured images 12 in a time series generated later than the captured image 12 detected in S202. In a case where the distance increases over time in one or more captured images 12, the captured images 12 are detected as the captured image 12 including the scene in which the reference position 24 moves in the direction away from the product shelf 50.
  • Furthermore, the determination unit 2060 decides whether the product 40 is included in the analysis target region 30 in the captured image 12 detected in S206 (S208). In a case where the product 40 is included in the analysis target region 30 (YES in S208), the product 40 is held in both a hand moving toward the product shelf 50 and a hand moving in the direction away from the product shelf 50. Therefore, the determination unit 2060 determines that the motion of the customer 20 is “(4) motion of holding the product 40 both before and after the contact with the product shelf 50” (S210).
  • On the other hand, in a case where the product 40 is not included in the analysis target region 30 (No in S208), the product 40 is not held in the hand moving in the direction away from the product shelf 50 while the product 40 is held in the hand moving toward the product shelf 50. Therefore, the determination unit 2060 determines that the motion of the customer 20 is “(2) motion of placing the product 40 on the product shelf 50” (S212).
  • In S214, the determination unit 2060 detects the captured image 12 including the scene in which the reference position 24 moves in the direction away from the product shelf 50 from among the captured images 12 generated later than the captured image 12 detected in S202. The detection method is the same as the method executed in S206.
  • Furthermore, the determination unit 2060 decides whether the product 40 is included in the analysis target region 30 in the captured image 12 detected in S214 (S216). In a case where the product 40 is included in the analysis target region 30 (YES in S216), the product 40 is held in the hand moving in the direction away from the product shelf 50 while the product 40 is not held in the hand moving toward the product shelf 50. Therefore, the determination unit 2060 determines that the motion of the customer 20 is “(1) motion of taking out the product 40 from the product shelf 50” (S218).
  • On the other hand, in a case where the product 40 is not included in the analysis target region 30 (NO in S216), the product 40 is not held in both the hand moving toward the product shelf 50 and the hand moving in the direction away from the product shelf 50. Therefore, the determination unit 2060 determines that the motion of the customer 20 is “(3) motion of not holding the product 40 both before and after contact with the product shelf 50” (S220).
  • Here, for example, there is the following method as the method of detecting whether the product 40 is included in the analysis target region 30. The determination unit 2060 first extracts an image region excluding a background region, that is, a foreground region, from the analysis target region 30 decided for each of the plurality of captured images 12 in a time series. Note that an existing technique can be used as a technique of determining the background region for a captured image to be imaged by the camera 10 installed at a predetermined place.
  • The determination unit 2060 decides that the product 40 is included in the analysis target region 30 in a case where the foreground region includes a region other than the image region representing the hand of the customer 20. However, the determination unit 2060 may decide that the product 40 is included in the analysis target region 30 only in a case where a size of the image region excluding the image region representing the hand in the foreground region is equal to or larger than a predetermined size. With this, it is possible to prevent the noise included in the captured image 12 from being erroneously detected as the product 40.
  • The method of deciding whether the product 40 is included in the analysis target region is not limited to the method described above. Various existing methods can be used as the method of deciding whether the product 40 is included in the analysis target region 30, that is, whether the hand of the person included in the image has the product.
  • Note that the determination unit 2060 may determine the motion of the customer 20 from one captured image 12. For example, in this case, the determination unit 2060 determines the motion of the customer 20 as “holding the product 40” or “not holding the product 40”.
  • <Determination of Product 40>
  • The determination unit 2060 may determine the taken-out product 40 when the customer takes out the product 40 from the product shelf 50. The determination of the product 40 means, for example, that information for identifying the product 40 from other products 40 (for example, an identifier or a name of the product 40) is determined. Hereinafter, the information for identifying the product 40 is referred to as product identification information.
  • The determination unit 2060 determines a place in the product shelf 50 where the customer 20 takes out the product 40 to determine the taken-out product 40. As a premise, it is assumed that the display place of the product 40 is defined in advance. Here, information indicating which product is displayed at each position of the product shelf 50 is referred to as display information. The determination unit 2060 determines a place in the product shelf 50 from which the customer 20 takes out a product 40 using the captured image 12 and determines the taken-out product 40 using the determined place and the display information.
  • For example, it is assumed that a determined product 40 is displayed for each stage in the product shelf 50. In this case, the display information indicates the product identification information in association with the stage of the product shelf 50. FIG. 12 is a diagram illustrating the display information in a table format. A table shown in FIG. 12 is referred to as a table 200. The table 200 is created for each product shelf 50. The table 200 has two columns of a stage 202 and product identification information 204. In FIG. 12 , the product identification information 204 indicates the identifier of the product 40. For example, in the table 200 representing the display information of the product shelf 50 determined by an identifier s0001, a record in a first row displays a product 40 determined by an identifier i0001 in a first stage of the product shelf 50.
  • The determination unit 2060 determines the stage of the product shelf 50 from which the product 40 is taken out, using the captured image 12. The determination unit 2060 acquires the product identification information associated with the stage in display information to determine the product 40 taken out from the product shelf 50. Hereinafter, several methods of determining the stage of the product shelf 50 from which the product 40 is taken out will be illustrated.
  • <<First Method>>
  • As a premise, it is assumed that the captured image 12 includes a scene in which the product shelf 50 is imaged from above (refer to FIG. 5 ). In other words, it is assumed that the camera 10 images the product shelf 50 from above. In this case, the depth camera is used as the camera 10. The depth camera generates a depth image in addition to or instead of a common captured image. As described above, the depth image is an image in which the value of each pixel of the image represents the distance between the imaged item and the camera. FIG. 13 is a diagram illustrating a depth image generated by the camera 10. In the depth image in FIG. 13 , pixels representing an item closer to the camera 10 are closer to white (brighter) and pixels representing an item farther from the camera 10 are closer to black (darker). Note that darker portions are densely drawn with larger black dots and brighter portions are sparsely drawn with smaller black dots in FIG. 13 for convenience of illustration.
  • The determination unit 2060 determines a stage of the product shelf 50 where the reference position 24 is present, based on the value of the pixel representing the reference position 24 in the depth image. At this time, a range of a distance from the camera 10 for each stage of the product shelf 50 is defined in advance in the display information. FIG. 14 is a diagram illustrating display information indicating the range of the distance from the camera 10 for each stage of the product shelf 50. For example, the table 200 in FIG. 14 indicates that the range of the distance between a first shelf of the product shelf 50 and the camera 10 is equal to or larger than d1 and less than d2. In other words, the distance between the top of the first shelf and the camera 10 is d1, and the distance between the top of a second shelf and the camera 10 is d2.
  • The determination unit 2060 determines a stage of the product shelf 50 where the reference position 24 is present, based on the reference position 24 of the depth image including a scene in which the customer 20 takes out the product 40 and the display information shown in FIG. 14 . The determined stage is defined as the stage from which the product 40 is taken out. For example, it is assumed that the pixel at the reference position 24 in the depth image indicates that the distance between the reference position 24 and the camera 10 is a. It is assumed that a is equal to or larger than d1 and equal to or less than d2. In this case, the determination unit 2060 determines that the reference position 24 is present on the first shelf of the product shelf 50 based on the display information shown in FIG. 14 . That is, the determination unit 2060 determines that the shelf from which the product 40 is taken out is the first shelf of the product shelf 50.
  • <<Second Method>>
  • As a premise, it is assumed that the captured image 12 includes a scene in which the product shelf 50 is viewed from the side. In other words, it is assumed that the camera 10 images the product shelf 50 from the lateral direction. In this case, the determination unit 2060 determines a stage in the product shelf 50 where a position of the reference position 24 in the height direction (Y coordinates) detected from the captured image 12 is present. The determined stage is defined as the stage of the product shelf 50 from which the product 40 is taken out. In this case, the captured image 12 may be a depth image or a common image.
  • <<About Case where Plurality of Types of Products 40 are Displayed in One Stage>>
  • A plurality of types of products may be displayed on one stage by dividing one stage of the product shelf 50 into a plurality of columns in the horizontal direction. In this case, the determination unit 2060 respectively determines a position in the horizontal direction and a position in the height direction for the reference position 24 of the hand of the customer 20 who takes out the product 40 from the product shelf 50 to determine the product 40. In this case, the product identification information is shown for each combination of stage and column in the display information. Hereinafter, a method of determining the position of the reference position 24 in the horizontal direction will be described.
  • It is assumed that the camera 10 images the product shelf 50 from above. In this case, the position of the reference position 24 in the horizontal direction is determined by the X coordinates of the reference position 24 in the captured image 12.
  • On the other hand, it is assumed that the camera 10 images the product shelf 50 from the lateral direction. In this case, the determination unit 2060 determines the position of the reference position 24 in the horizontal direction using the depth image. Here, the method of determining the position of the reference position 24 in the horizontal direction, using the depth image including a scene in which the product shelf 50 is imaged from the lateral direction, is the same as the method of determining the position of the reference position 24 in the height direction, using the depth image including the scene in which the product shelf 50 is imaged from above.
  • Note that the method of determining the product 40 to be taken out from the product shelf 50 is described, but the determination unit 2060 may determine the product 40 to be placed on the product shelf 50 by the similar method. However, in this case, the determination unit 2060 uses a captured image 12 including a scene in which the product 40 is placed on the product shelf 50.
  • Here, it is assumed that “(4) motion of holding the product 40 both before and after the contact with the product shelf 50” is determined as the motion of the customer 20. In this case, the determination unit 2060 may decide whether the products 40 held by the customer 20 before and after the contact between the customer 20 and the product shelf 50 are the same based on the method of determining the product 40 described above. For example, the determination unit 2060 determines the product 40 before the contact between the customer 20 and the product shelf by the same method as the method of determining the product 40 to be placed on the product shelf 50. Furthermore, the determination unit 2060 determines the product 40 after the contact between the customer 20 and the product shelf 50 by the same method as the method of determining the product 40 to be taken out from the product shelf 50. In a case where the two determined products 40 are the same, the determination unit 2060 decides that the products 40 held by the customer 20 before and after the contact between the customer 20 and the product shelf 50 are the same. In this case, it can be said that the motion of the customer 20 is a “motion of reaching for the product shelf 50 to place the product 40, but not placing the product 40”. On the other hand, in a case where the two determined products 40 are different from each other, the determination unit 2060 decides that the products 40 held by the customer 20 before and after the contact between the customer 20 and the product shelf 50 are different from each other. In this case, it can be said that the motion of the customer 20 is a “motion of placing the held product 40 and taking out another product 40”.
  • However, the above determination may be performed without specifically determining the product 40. For example, the determination unit 2060 computes magnitude of a difference (difference in area or color) between the foreground region of the analysis target region 30 before the contact between the customer 20 and the product shelf 50 and the foreground region of the analysis target region 30 after the contact between the customer 20 and the product shelf and decides that the products 40 before and after the contact are different from each other in a case where the magnitude of the computed difference is equal to or larger than a predetermined value. On the other hand, the determination unit 2060 decides that the products 40 before and after the contact are the same in a case where the magnitude of the difference is less than the predetermined value.
  • In addition, for example, the determination unit 2060 decides whether the products 40 before and after the contact are the same based on the difference in the reference positions 24 before and after the contact between the customer 20 and the product shelf 50. In this case, the determination unit 2060 respectively determines, using the display information described above, a stage of the product shelf 50 where the reference position 24 is present before the contact between the customer 20 and the product shelf 50 and a stage of the product shelf 50 where the reference position 24 is present after the contact between the customer 20 and the product shelf 50. In a case where the stages of the product shelf 50 determined respectively before and after the contact between the customer 20 and the product shelf 50 are different from each other, the determination unit 2060 decides that the products 40 before and after the contact are different from each other. On the other hand, in a case where the stages of the product shelf 50 determined respectively before and after the contact are the same, the determination unit 2060 decides that the products 40 before and after the contact are the same.
  • <Utilization Method of Motion of Customer 20 Determined by Determination Unit 2060>
  • The motion of the customer 20 determined by the determination unit 2060 can be used to analyze an action performed in front of the product shelf 50 (so-called front-shelf action) by the customer 20. For this reason, the determination unit 2060 outputs various pieces of information such as a motion performed in front of the product shelf 50 by each customer 20, a date and time when the motion is performed, and a product 40 subjected to the motion. This information is, for example, stored in a storage apparatus connected to the information processing apparatus 2000 or transmitted to a server apparatus connected to the information processing apparatus 2000 in a communicable manner. Here, various existing methods can be used as the method of analyzing the front-shelf action based on various motions of the customer 20 performed in front of the product shelf 50.
  • Note that a usage scene of the information processing apparatus 2000 is not limited to the determination of the motion of the customer in the store. For example, as described above, the information processing apparatus 2000 can be used to determine the motion of a factory worker or the like. In this case, for example, the motion of each worker determined by the information processing apparatus 2000 is compared with a motion of each worker defined in advance, and thus it is possible to confirm whether the worker correctly performs a predetermined job.
  • The example embodiments of the present invention are described with reference to the drawings. However, the example embodiments are only examples of the present invention, and various configurations other than the above can be employed.

Claims (20)

1. An information processing apparatus comprising:
at least one memory configured to store instructions; and
at least one processor configured to execute the instructions to:
detect a pose of the hand of the person from the captured image;
set an analysis region of a predetermined shape on the captured image, the analysis region having a direction determined based on a direction of the hand defined by the pose of the hand; and
determine a motion of the person by analyzing the analysis target region,
wherein the analysis region has two first parallel sides that are parallel to the direction of the hand defined by the pose of the hand and two second parallel sides that are perpendicular to the direction of the hand.
2. The information processing apparatus according to claim 1,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person holding the item by analyzing the analysis target region.
3. The information processing apparatus according to claim 1,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
4. The information processing apparatus according to claim 1,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person not holding the item by analyzing the analysis target region.
5. The information processing apparatus according to claim 4,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person not holding the item both before and after contact with the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
6. The information processing apparatus according to claim 1,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person placing the item on the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
7. The information processing apparatus according to claim 2,
wherein the at least one processor is further configured to execute the instructions to determine a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
8. A control method executed by a computer, the method comprising:
detecting a pose of the hand of the person from the captured image;
setting an analysis region of a predetermined shape on the captured image, the analysis region having a direction determined based on a direction of the hand defined by the pose of the hand; and
determining a motion of the person by analyzing the analysis target region,
wherein the analysis region has two first parallel sides that are parallel to the direction of the hand defined by the pose of the hand and two second parallel sides that are perpendicular to the direction of the hand.
9. The control method according to claim 8,
wherein the method comprises determining a motion of the person holding the item by analyzing the analysis target region.
10. The control method according to claim 8,
wherein the method comprises determining a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
11. The control method according to claim 8,
wherein the method comprises determining a motion of the person not holding the item by analyzing the analysis target region.
12. The control method according to claim 11,
wherein the method comprises determining a motion of the person not holding the item both before and after contact with the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
13. The control method according to claim 8,
wherein the method comprises determining a motion of the person placing the item on the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
14. The control method according to claim 9,
wherein the method comprises determining a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
15. A non-transitory computer readable medium storing a program for causing a computer to perform operations, the operations comprising:
detecting a pose of the hand of the person from the captured image;
setting an analysis region of a predetermined shape on the captured image, the analysis region having a direction determined based on a direction of the hand defined by the pose of the hand; and
determining a motion of the person by analyzing the analysis target region,
wherein the analysis region has two first parallel sides that are parallel to the direction of the hand defined by the pose of the hand and two second parallel sides that are perpendicular to the direction of the hand.
16. The non-transitory computer readable medium according to claim 15,
wherein the operations comprise determining a motion of the person holding the item by analyzing the analysis target region.
17. The non-transitory computer readable medium according to claim 15,
wherein the operations comprise determining a motion of the person taking out the item from the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
18. The non-transitory computer readable medium according to claim 15,
wherein the operations comprise determining a motion of the person not holding the item by analyzing the analysis target region.
19. The non-transitory computer readable medium according to claim 18,
wherein the operations comprise determining a motion of the person not holding the item both before and after contact with the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
20. The non-transitory computer readable medium according to claim 15,
wherein the operations comprise determining a motion of the person placing the item on the display place by analyzing the analysis target region set for each of a plurality of captured images generated at different timepoints.
US18/227,848 2017-06-21 2023-07-28 Information processing apparatus, control method, and program Pending US20240029273A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/227,848 US20240029273A1 (en) 2017-06-21 2023-07-28 Information processing apparatus, control method, and program

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
PCT/JP2017/022875 WO2018235198A1 (en) 2017-06-21 2017-06-21 Information processing device, control method, and program
US201916623656A 2019-12-17 2019-12-17
US17/349,045 US11763463B2 (en) 2017-06-21 2021-06-16 Information processing apparatus, control method, and program
US18/227,848 US20240029273A1 (en) 2017-06-21 2023-07-28 Information processing apparatus, control method, and program

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US17/349,045 Continuation US11763463B2 (en) 2017-06-21 2021-06-16 Information processing apparatus, control method, and program

Publications (1)

Publication Number Publication Date
US20240029273A1 true US20240029273A1 (en) 2024-01-25

Family

ID=64737005

Family Applications (4)

Application Number Title Priority Date Filing Date
US16/623,656 Abandoned US20210142490A1 (en) 2017-06-21 2017-06-21 Information processing apparatus, control method, and program
US17/349,045 Active 2037-08-17 US11763463B2 (en) 2017-06-21 2021-06-16 Information processing apparatus, control method, and program
US18/227,848 Pending US20240029273A1 (en) 2017-06-21 2023-07-28 Information processing apparatus, control method, and program
US18/227,850 Pending US20230410321A1 (en) 2017-06-21 2023-07-28 Information processing apparatus, control method, and program

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US16/623,656 Abandoned US20210142490A1 (en) 2017-06-21 2017-06-21 Information processing apparatus, control method, and program
US17/349,045 Active 2037-08-17 US11763463B2 (en) 2017-06-21 2021-06-16 Information processing apparatus, control method, and program

Family Applications After (1)

Application Number Title Priority Date Filing Date
US18/227,850 Pending US20230410321A1 (en) 2017-06-21 2023-07-28 Information processing apparatus, control method, and program

Country Status (3)

Country Link
US (4) US20210142490A1 (en)
JP (2) JP7197171B2 (en)
WO (1) WO2018235198A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507125A (en) * 2019-01-30 2020-08-07 佳能株式会社 Detection device and method, image processing device and system
JP6982259B2 (en) * 2019-09-19 2021-12-17 キヤノンマーケティングジャパン株式会社 Information processing equipment, information processing methods, programs
US20230080815A1 (en) * 2020-02-28 2023-03-16 Nec Corporation Customer analysis apparatus, customer analysis method, and non-transitory storage medium
KR20220144889A (en) * 2020-03-20 2022-10-27 후아웨이 테크놀러지 컴퍼니 리미티드 Method and system for hand gesture-based control of a device
WO2021255894A1 (en) * 2020-06-18 2021-12-23 日本電気株式会社 Control device, control method, and program
US11881045B2 (en) * 2020-07-07 2024-01-23 Rakuten Group, Inc. Region extraction device, region extraction method, and region extraction program
US20240127303A1 (en) * 2021-09-29 2024-04-18 Nec Corporation Reporting system, method, and recording medium

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000331170A (en) * 1999-05-21 2000-11-30 Atr Media Integration & Communications Res Lab Hand motion recognizing device
US7227526B2 (en) * 2000-07-24 2007-06-05 Gesturetek, Inc. Video-based image control system
WO2010004719A1 (en) 2008-07-08 2010-01-14 パナソニック株式会社 Article estimation device, article position estimation device, article estimation method, and article estimation program
US9904845B2 (en) * 2009-02-25 2018-02-27 Honda Motor Co., Ltd. Body feature detection and human pose estimation using inner distance shape contexts
JP5361530B2 (en) * 2009-05-20 2013-12-04 キヤノン株式会社 Image recognition apparatus, imaging apparatus, and image recognition method
WO2011007390A1 (en) * 2009-07-15 2011-01-20 株式会社 東芝 Image-processing device and interface device
JP5272213B2 (en) * 2010-04-30 2013-08-28 日本電信電話株式会社 ADVERTISEMENT EFFECT MEASUREMENT DEVICE, ADVERTISEMENT EFFECT MEASUREMENT METHOD, AND PROGRAM
JP5573379B2 (en) * 2010-06-07 2014-08-20 ソニー株式会社 Information display device and display image control method
JP5561124B2 (en) 2010-11-26 2014-07-30 富士通株式会社 Information processing apparatus and pricing method
JP2013164834A (en) * 2012-01-13 2013-08-22 Sony Corp Image processing device, method thereof, and program
US10049281B2 (en) 2012-11-12 2018-08-14 Shopperception, Inc. Methods and systems for measuring human interaction
US10268983B2 (en) * 2013-06-26 2019-04-23 Amazon Technologies, Inc. Detecting item interaction and movement
JP6134607B2 (en) * 2013-08-21 2017-05-24 株式会社Nttドコモ User observation system
JP6194777B2 (en) 2013-11-29 2017-09-13 富士通株式会社 Operation determination method, operation determination apparatus, and operation determination program
JP2016162072A (en) * 2015-02-27 2016-09-05 株式会社東芝 Feature quantity extraction apparatus
JP6618301B2 (en) * 2015-08-31 2019-12-11 キヤノン株式会社 Information processing apparatus, control method therefor, program, and storage medium

Also Published As

Publication number Publication date
WO2018235198A1 (en) 2018-12-27
JP2021177399A (en) 2021-11-11
US20210142490A1 (en) 2021-05-13
JPWO2018235198A1 (en) 2020-04-09
JP7197171B2 (en) 2022-12-27
US20230410321A1 (en) 2023-12-21
JP7332183B2 (en) 2023-08-23
US11763463B2 (en) 2023-09-19
US20210343026A1 (en) 2021-11-04

Similar Documents

Publication Publication Date Title
US11763463B2 (en) Information processing apparatus, control method, and program
CN107358149B (en) Human body posture detection method and device
US9866820B1 (en) Online calibration of cameras
JP6176388B2 (en) Image identification device, image sensor, and image identification method
US20160117824A1 (en) Posture estimation method and robot
US10063843B2 (en) Image processing apparatus and image processing method for estimating three-dimensional position of object in image
CN108573471B (en) Image processing apparatus, image processing method, and recording medium
US10515459B2 (en) Image processing apparatus for processing images captured by a plurality of imaging units, image processing method, and storage medium storing program therefor
US10496874B2 (en) Facial detection device, facial detection system provided with same, and facial detection method
US20050139782A1 (en) Face image detecting method, face image detecting system and face image detecting program
US20220012514A1 (en) Identification information assignment apparatus, identification information assignment method, and program
US20240104769A1 (en) Information processing apparatus, control method, and non-transitory storage medium
JP2007241477A (en) Image processor
US9639763B2 (en) Image target detecting apparatus and method
US20210042576A1 (en) Image processing system
US11812131B2 (en) Determination of appropriate image suitable for feature extraction of object from among captured images in which object is detected
CN106406507B (en) Image processing method and electronic device
JP5217917B2 (en) Object detection and tracking device, object detection and tracking method, and object detection and tracking program
JP6772059B2 (en) Electronic control devices, electronic control systems and electronic control methods
US11521330B2 (en) Image processing apparatus, image processing method, and storage medium
JP2013250604A (en) Object detecting device and object detecting method
KR20220083347A (en) Method, apparatus, and computer program for measuring volume of objects by using image
US20170200383A1 (en) Automated review of forms through augmented reality
CN113344904A (en) Surface defect detection method and related product
CN111047644A (en) Method for identifying object position and orientation relation between objects in picture

Legal Events

Date Code Title Description
STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED