WO2020143179A1 - 物品识别方法及系统、电子设备 - Google Patents

物品识别方法及系统、电子设备 Download PDF

Info

Publication number
WO2020143179A1
WO2020143179A1 PCT/CN2019/092405 CN2019092405W WO2020143179A1 WO 2020143179 A1 WO2020143179 A1 WO 2020143179A1 CN 2019092405 W CN2019092405 W CN 2019092405W WO 2020143179 A1 WO2020143179 A1 WO 2020143179A1
Authority
WO
WIPO (PCT)
Prior art keywords
item
image
information
frame
result
Prior art date
Application number
PCT/CN2019/092405
Other languages
English (en)
French (fr)
Inventor
邹文财
欧阳高
岳泊暄
王进
Original Assignee
虹软科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 虹软科技股份有限公司 filed Critical 虹软科技股份有限公司
Priority to JP2019566841A priority Critical patent/JP6986576B2/ja
Priority to EP19908887.3A priority patent/EP3910608B1/en
Priority to US16/479,222 priority patent/US11335092B2/en
Priority to KR1020197036280A priority patent/KR102329369B1/ko
Publication of WO2020143179A1 publication Critical patent/WO2020143179A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/183Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a single remote source
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07FCOIN-FREED OR LIKE APPARATUS
    • G07F11/00Coin-freed apparatus for dispensing, or the like, discrete articles
    • G07F11/72Auxiliary equipment, e.g. for lighting cigars, opening bottles
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/22Measuring arrangements characterised by the use of optical techniques for measuring depth
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • G01S17/8943D imaging with simultaneous measurement of time-of-flight at a 2D array of receiver pixels, e.g. time-of-flight cameras or flash lidar
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/20Point-of-sale [POS] network systems
    • G06Q20/208Input by product or record sensing, e.g. weighing or scanner processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G07CHECKING-DEVICES
    • G07GREGISTERING THE RECEIPT OF CASH, VALUABLES, OR TOKENS
    • G07G1/00Cash registers
    • G07G1/0036Checkout procedures
    • G07G1/0045Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader
    • G07G1/0054Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader with control of supplementary check-parameters, e.g. weight or number of articles
    • G07G1/0063Checkout procedures with a code reader for reading of an identifying code of the article to be registered, e.g. barcode reader or radio-frequency identity [RFID] reader with control of supplementary check-parameters, e.g. weight or number of articles with means for detecting the geometric dimensions of the article of which the code is read, such as its size or height, for the verification of the registration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/20Point-of-sale [POS] network systems
    • G06Q20/201Price look-up processing, e.g. updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • the present invention relates to the field of information processing technology, and in particular, to an article identification method and system, and electronic equipment.
  • RFID Radio Frequency Identification
  • RFID The cost of electronic tags is high, and on the other hand, the labor cost of attaching labels to thousands of items after being put on the market is too high; and the recognition accuracy of metal and liquid items is insufficient; the tags are easily torn off, resulting in damage to the goods
  • the rate is high; for the second, static recognition based on visual recognition, a camera needs to be installed on the top of each layer of the container, an image is taken before and after the door is opened, and then the type and number of items are automatically recognized by visual recognition technology Finally, the final result is obtained through comparison.
  • the space utilization rate is low, because the camera needs to have a higher height from the lower partition, otherwise it is difficult to capture the whole picture, the recognition accuracy is easily affected by the occlusion of the items, and the items cannot be stacked.
  • Embodiments of the present disclosure provide an item identification method and system, and electronic equipment to at least solve the technical problem of low identification accuracy when identifying items in the related art.
  • an item identification method including: acquiring multiple frames of an item through an image capture device; processing the multiple frames of the item to obtain the item in each frame of image Location information and category information; acquiring auxiliary information of the item through an information capture device; multimodal fusion of the location information and the auxiliary information to obtain a fusion result; based on the category information and the fusion result, Determine the identification result of the item.
  • processing the multi-frame images of the item to obtain position information and category information of the item in each frame image includes: performing image preprocessing on each frame image of the item; determining to perform image preprocessing The item detection frame and the category information in each subsequent frame of images, wherein at least one item is included in the item detection frame; the position information of the item is determined according to the item detection frame.
  • the method further includes: performing non-maximum suppression on the item detection frame.
  • the method further includes: acquiring a multi-frame image of the target part through an image capturing device; processing the multi-frame image of the target part to obtain position information and discrimination results of the target part in each frame of image .
  • the identification result of the item is determined according to the position information and discrimination result of the target part, the category information of the item, and the fusion result in the image of each frame.
  • processing the multi-frame images of the target part to obtain position information and discrimination results of the target part in each frame image includes: performing image preprocessing on each frame image of the target part to enhance The image contour of the user's target part; select the part candidate region where the user's target part appears in each frame of the image after image preprocessing; extract the feature information in the part candidate region to obtain multiple part features; by pre-training The classifier recognizes the features of the multiple parts to obtain the position information of the target part and the discrimination result in each frame of the image.
  • selecting a part candidate region where the target part of the user appears in each frame of the image after image preprocessing including: scanning each frame of the image through the sub-window to determine the part candidate of the target part of the user that may appear in each frame of the image area.
  • the method further includes: performing fine-grain classification on the items.
  • the information capturing device includes at least one of the following: a depth camera, a card reader, a gravity device, and an odor sensor.
  • a depth image of the item is acquired through the depth camera, and the auxiliary information of the item includes depth information.
  • multi-modal fusion of the position information and the auxiliary information, and obtaining the fusion result includes: obtaining lens parameters and position parameters of the image capture device and the depth camera; according to the lens of the depth camera Parameters, the depth information, and the position of the item in the depth image, to obtain the position of the item in the coordinate system of the depth camera; according to the position parameters of the image capture device and the depth camera,
  • the coordinate system of the depth camera is used as a reference to calibrate the relative positional relationship of the image capture device relative to the depth camera; based on the lens parameters, the position of the item in the depth image, and the depth information And the relative position relationship, determining that the position of the item in the depth image corresponds to the mapped position information of the item in the image acquired by the image capture device; the position information and the mapped position information Perform a comparison to obtain the fusion result.
  • acquiring the multi-frame image of the item through the image capturing device includes: turning on the image capturing device to acquire a video of the item; and intercepting the multi-frame image of the item from the video.
  • the method further includes: determining a tracking trajectory of the item according to the fusion result; classifying the tracking trajectory to obtain a trajectory classification result, wherein the trajectory classification result corresponds to the movement result of the item;
  • the trajectory classification result determines the item retrieval result and item replacement result; and the item management list is updated according to the item retrieval result and item replacement result.
  • determining the tracking trajectory of the item according to the fusion result includes: obtaining the position information of the item and the movement trend of the item according to the fusion result; based on the current detection frame of the item and the predicted candidate Overlap similarity and feature similarity between frames to determine the matching degree between the current detection result and the detection result of the previous frame to obtain the tracking trajectory of the item, wherein the predicted candidate frame has the position information of the item in the previous frame
  • the tracking trajectory includes: the position of the item, the item type, and the timestamp of the item movement at each time node.
  • the step of classifying the tracking trajectory to obtain a trajectory classification result includes: extracting the movement length of the item from the tracking trajectory; combining the pre-trained classification decision tree model and the movement length of the item, matching The tracking trajectory is classified to obtain a trajectory classification result.
  • the step of determining the result of taking the item or the result of putting the item back according to the trajectory classification result includes: acquiring the image capturing device, or combining the image capturing device and the information capturing device at the same time Trajectory classification results; based on the trajectory classification results of the image capture device, or the image capture device and the information capture device at the same time, establish a classification discrimination scheme based on a classification rule base; according to the classification discrimination scheme and Describe the results of the trajectory classification to determine the result of taking the item or the result of putting the item back.
  • the method further includes: obtaining an item price list, wherein the item price list includes the price of each item; based on the item retrieval result and the item replacement result, determining the item and item quantity taken ; Determine the total settlement price of the items based on the items and the number of items taken and the price of each item.
  • the method should be set as a new retail scenario, where the new retail scenario includes at least: an unmanned sales store and a smart container.
  • an item identification system including: an image capturing device configured to acquire a multi-frame image of an item; an information capturing device configured to acquire auxiliary information of the item; a server , Set to process the multi-frame images of the item to obtain position information and category information of the item in each frame image, and multi-modal fusion of the position information and the auxiliary information to obtain a fusion result , And then determine the identification result of the item based on the category information and the fusion result.
  • the image capturing device is further configured to acquire multi-frame images of the target part.
  • the server is further configured to process the multi-frame images of the target part to obtain position information and discrimination results of the target part in each frame of the image, and according to the The position information and discrimination result of the target part, the category information and the fusion result determine the recognition result of the article.
  • it further includes an item storage device, and the image capture device and the information capture device are turned on when the item storage device is turned on.
  • an electronic device including: a processor; and a memory configured to store executable instructions of the processor; wherein the processor is configured to execute the Executable instructions can be used to perform any of the item identification methods described above.
  • the storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to execute any of the above Item identification method.
  • the multi-frame image of the item is acquired by the image capturing device, the multi-frame image of the item is processed to obtain the position information and category information of the item in each frame image, and the auxiliary information of the item is acquired by the information capturing device , Multi-modal fusion of location information and auxiliary information to obtain fusion results, according to the category information and fusion results, determine the item recognition results.
  • multi-frame image acquisition can be achieved, and the position information and category information of the item can be analyzed and combined with the auxiliary information of the item to accurately identify the item. It can also accurately identify the type and number of items taken by the user In order to solve the technical problem of low recognition accuracy when recognizing items in the related art.
  • FIG. 1 is a schematic diagram of an optional item identification system according to an embodiment of the present invention.
  • FIG. 2 is a flowchart of an optional item identification method according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of implementing item identification according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of identifying a target part in an image according to an embodiment of the present invention.
  • New retail refers to relying on the Internet, through the use of big data, artificial intelligence and other technical means to upgrade the production, circulation and sales of goods, and in-depth integration of online services, offline experience and modern logistics.
  • RFID Radio Frequency Identification
  • RFID electronic tag can identify specific targets and read and write related data through radio signals without establishing mechanical or optical contact between the identification system and the specific targets.
  • Smart container a container equipped with visual recognition technology.
  • Cargo loss rate The ratio of the number of items lost during the operation of the container to the total number of items.
  • TOF depth camera Time of Flight depth camera, also known as 3D camera, is different from the traditional camera in that the camera can simultaneously shoot the grayscale information of the scene and the 3D information including depth.
  • NMS Non Maximum Suppression, non-maximum suppression.
  • This camera is specially designed for camera.
  • Multi-frame images Images based on images or videos that contain at least one frame.
  • the embodiments of the present invention can be applied to various implementation scenarios of new retail, for example, for the use of smart containers in new retail, relative to the related art, in the item identification process, the images captured by the image capture device cannot be accurately identified.
  • the type and number of items taken by the user such as just taking an image before opening the door and closing the door, and then automatically identifying the type and number of items through visual recognition technology, and finally get the final result by comparison, it will be taken Items cannot be recognized by a photo.
  • multiple cameras can be installed on the smart container, and the video after the door is opened is analyzed, and the multi-frame images in the video are analyzed to perform multi-modal fusion on the image. Therefore, the type and data of the items taken by the user can be accurately identified, the intelligent degree of item identification of the smart container can be improved, and the rate of cargo damage can be reduced.
  • the embodiments of the present invention can be applied to the field of new retail and the like, and the specific application range can be in areas such as smart containers, smart cabinets, shopping malls, and supermarkets.
  • the present invention can be used to schematically illustrate the present invention with smart containers, but is not limited thereto.
  • FIG. 1 is a schematic diagram of an optional item identification system according to an embodiment of the present invention. As shown in FIG. 1, the system may include: an image capturing device 11, an information capturing device 12, and a server 13, wherein,
  • the image capturing device 11 is configured to acquire multi-frame images of the article.
  • the image capturing device may be installed in a container, a shopping mall, or the like, and the number of image capturing devices arranged is at least one.
  • the image capturing device may be a common camera, for example, an RGB camera, an infrared camera, or the like.
  • RGB camera an RGB camera
  • infrared camera or the like.
  • those skilled in the art can adjust the type and number of image capturing devices according to actual needs without being limited to the examples given here, and when the number of image capturing devices is more than two, the same type of image capturing devices can all be used Or use a combination of different types of image capture devices.
  • the information capture device 12 is configured to acquire auxiliary information of the article.
  • the information capturing device it can be arranged around the image capturing device and used in cooperation with the image capturing device, and the number of the provided information capturing devices is at least one.
  • the information capture device may include: a depth camera configured to obtain depth information, a card reader configured to scan the item identification code, and a gravity device (such as a gravity plate) configured to obtain gravity information , Set as an odor sensor to obtain odor information, etc.
  • depth cameras include TOF depth cameras, binocular cameras, structured light cameras, and so on.
  • the above-mentioned information device is a gravity device
  • it can determine whether to take the goods and roughly which goods to take by comparing the gravity information obtained by the gravity device at different times.
  • the gravity device may be provided in the article storage device.
  • the gravity information detected by the gravity device is combined with the item information analyzed by the image capture device to determine the item recognition result.
  • the above-mentioned information device when the above-mentioned information device is an odor sensor, it can obtain the odor information of the item through the odor sensor, and determine the result of the item recognition by combining the item information analyzed by the image capture device.
  • the odor sensor may be provided in the article storage device.
  • the server 13 is set to process the multi-frame images of the items to obtain the position information and category information of the items in each frame image, and multi-modal fusion of the position information and auxiliary information to obtain the fusion result, and then according to the category of the items Information and fusion results to determine the results of item identification.
  • the above item identification system uses the image capturing device 11 to acquire multiple frames of the item, the information capturing device 12 to acquire auxiliary information of the item, and finally, the server 13 processes the multiple frame images of the item to obtain the position of the item in each frame of image Information and category information, and multi-modal fusion of location information and auxiliary information to obtain the fusion result, and then determine the item recognition result according to the category information and the fusion result.
  • the number of image capturing devices and the number of information capturing devices can be rationally arranged according to each area of use and equipment used. For example, for a smart container, two image capturing devices and one information capturing device can be deployed Device.
  • the information capturing device is a TOF depth camera, which is set to acquire a depth image of the item, and the auxiliary information of the item includes depth information. That is, the depth image of the item can be collected by the depth camera to obtain the depth information of the item, so that the overlapped or covered items can be effectively identified.
  • the above item identification system further includes: acquiring multiple frames of images of the target part using an image capturing device.
  • the target part can be a hand, a robotic hand, a prosthesis, or other human parts or mechanical devices that can take items, that is, this application can detect the image of the user when taking the item by hand, and pass the image of the target part of the user. Detect and analyze the location of the target part.
  • the server is further configured to process the multi-frame images of the target part to obtain the position information and discrimination results of the target part in each frame image, and according to the position information and discrimination of the target part in each frame image
  • the result, the category information of the item and the fusion result determine the identification result of the item. That is, the category information and fusion result of the item can be analyzed by combining the position information and discrimination result of the target part with the image obtained by the image capture device and the information capture device, thereby improving the accuracy of item recognition.
  • the type and number of items taken by the user can also be obtained.
  • the above determination result indicates whether the target part is determined.
  • the detection of the target part may be the detection of the hand.
  • the following embodiments of the present invention can be described by using the user's hand as the target part of the user, and detecting the position of the hand in each frame of image.
  • the above item identification system further includes: an item storage device, an image capture device, and an information capture device that are turned on when the item storage device is turned on.
  • the item storage device indicates equipment and devices for storing items.
  • the item storage device may include, but is not limited to, the above: smart container.
  • the opening information of the item storage device can be used as the trigger information, and the image capturing device and the information capturing device can be simultaneously turned on to collect the multi-frame images of the item and the auxiliary information of the item, and then to the multi-frame
  • the image and auxiliary information are analyzed to obtain information such as the position and type of the item, and then multi-modal fusion is performed with the auxiliary information to obtain the item recognition result.
  • Multi-frame images of the target part can also be detected by the image capturing device, and the target part can be detected, and then the images obtained by the image capturing device and the information capturing device can be combined and analyzed according to the position information and discrimination results of the target part in each frame of the image
  • the category information and fusion results of the items are obtained to more accurately obtain the identification results of the items and improve the identification accuracy of the items.
  • an embodiment of an item identification method is provided. It should be noted that the steps shown in the flowchart of the accompanying drawings can be executed in a computer system such as a set of computer-executable instructions, and, although The logic sequence is shown in the flowchart, but in some cases, the steps shown or described may be performed in an order different from here.
  • FIG. 2 is a flowchart of an optional item identification method according to an embodiment of the present invention. As shown in FIG. 2, the method includes the following steps:
  • Step S202 acquiring multi-frame images of the item through the image capturing device
  • Step S204 processing the multi-frame images of the item to obtain the position information and category information of the item in each frame image;
  • Step S206 acquiring auxiliary information of the item through the information capturing device
  • Step S208 multi-modal fusion of position information and auxiliary information to obtain a fusion result
  • Step S210 Determine the item recognition result based on the category information and the fusion result.
  • the multi-frame image of the item can be acquired by the image capturing device, and the multi-frame image of the item can be processed to obtain the position information and category information of the item in each frame image, and the auxiliary information of the item can be obtained by the information capturing device.
  • the position information and auxiliary information are multi-modal fusion, and the fusion result is obtained.
  • the category information and the fusion result the item recognition result is determined.
  • multi-frame image acquisition can be achieved, and the position information and category information of the item can be analyzed and combined with the auxiliary information of the item to accurately identify the item. It can also accurately identify the type and number of items taken by the user In order to solve the technical problem of low recognition accuracy when recognizing items in the related art.
  • the item identification method can be applied to a new retail scenario, which at least includes: intelligent container sales in an unmanned retail store, and intelligent container sales in supermarket shopping.
  • step S202 a multi-frame image of the item is acquired through the image capturing device.
  • the image capturing device may be an ordinary camera, for example, an RGB camera, an infrared camera, a camera, or the like.
  • the number of image capturing devices is at least one.
  • the number of image capturing devices is more than two, Both use the same type of image capture device or use a combination of different types of image capture devices.
  • Each image capture device can capture at least two images.
  • the number of items is at least one, and the items may be placed in the item storage device, for example, the items are stored in a smart container.
  • Item storage devices include but are not limited to: smart containers.
  • the image capture device and the information capture device can be turned on.
  • acquiring the multi-frame image of the item through the image capturing device includes: turning on the image capturing device to acquire the video of the item; intercepting the multi-frame image of the item from the video. That is, after the item storage device is opened, the video in the item storage device can be obtained in real time through the image capture device, and after the item storage device is turned off or the user's picking action is detected to stop, multiple frames of images can be obtained from the video .
  • Step S204 processing the multi-frame image of the item to obtain position information and category information of the item in each frame of image.
  • the position and category of the item in the image are emphasized, and when analyzing the position information, the current position of the item in the image or the current position of the item and the items in the previous frames of the image can be analyzed The relationship between the location.
  • the first case is to identify the location and category of the item in the image
  • the second is to identify the location of the target part in the image.
  • processing the multi-frame images of the item to obtain position information and category information of the item in each frame of image includes: performing image preprocessing on each frame of the item.
  • the image preprocessing includes at least one of the following: image enhancement, image scaling, and image subtraction mean; determining the item detection frame and category information in each frame of image after image preprocessing, wherein, in the item detection frame At least one item is included; the position information of the item is determined according to the item detection frame.
  • multiple item candidate boxes can be extracted first, and then the item candidate frames are subjected to deep learning and analysis to determine the items Detection frame and item category information.
  • the item candidate frame and the position of the target part can be combined to perform high-accuracy identification on the item detection frame.
  • the above item identification method further includes: performing non-maximum suppression on the item detection frame to prevent false detection and improve item identification accuracy.
  • the image is preprocessed first, including image enhancement, scaling, and average reduction operations.
  • the next step is to extract the item detection frame and perform non-maximum suppression on the extracted item detection frame ( NMS) to prevent false detections and improve the accuracy of item identification.
  • NMS non-maximum suppression on the extracted item detection frame
  • the above item identification method further includes: classifying the fine-grained items to improve the identification accuracy of the items. That is, the article identification information can be obtained by performing fine-grain analysis on the articles. Optionally, fine-grained classification is performed for similar items, and the accuracy of item identification is improved by analyzing small differences between similar items.
  • the types of items in the embodiment of the present invention include but are not limited to: vegetables, fruits, snacks, fresh meat, seafood, and the like.
  • FIG. 3 is a schematic diagram of implementing item recognition according to an embodiment of the present invention.
  • a video captured by an image capturing device may be input first, and after cropping the video, the image may be Pre-process and extract the item candidate frame, analyze the extracted item candidate frame in combination with the detection of the target part to obtain the item detection frame, then perform non-maximum suppression on the item detection frame, and finally can use fine-grained classification and multi-modality Fusion technology to determine the item recognition results.
  • the location of the target part in the image is identified.
  • the hand can be used as the target part for description.
  • the above item recognition method further includes: acquiring multiple frames of the target part through an image capture device; processing the multiple frames of the target part to obtain position information of the target part in each frame of image And discriminate results.
  • processing the multi-frame images of the target part to obtain the position information and the discrimination result of the target part in each frame image includes: performing image preprocessing on each frame image of the target part, To enhance the image contour of the target part of the user.
  • the image pre-processing here may include at least one of the following: one or more processing methods such as image noise reduction, image enhancement, contrast enhancement, image smoothing, image sharpening, etc.; each frame after image preprocessing is selected Part candidate regions where the target part of the user appears in the image; extract the feature information in the part candidate region to obtain multiple part features; identify multiple part features through a pre-trained classifier to obtain the position information of the target part in each frame of image And discriminate results.
  • the image pre-processing in the embodiment of the present invention mainly performs image pre-processing on each frame of the image of the target part.
  • image pre-processing which may include image noise reduction and image enhancement
  • the hand part is enhanced, including contrast Enhancement, image smoothing, noise filtering, image sharpening to enhance target contours.
  • multiple candidate regions for a part can be determined, for example, multiple gesture candidate regions (Region of Interest, ROI) are determined, and some possible gesture candidate regions are selected in the global perception range of the camera.
  • ROI gesture candidate regions
  • selecting a part candidate region where the user's target part appears in each frame of the image after image preprocessing including: scanning each frame of the image through a sub-window to determine the part candidate where the user's target part may appear in each frame image area. That is, the full image can be scanned using a sub-window, and 1/n of the image height is selected as the minimum scale of the hand, and the size of the sub-window is gradually increased by a certain multiple on this basis.
  • the above-mentioned gesture candidate areas indicate that the opponent may make actions for recognition.
  • the factors such as the position of the arm and the position of the container are generally referred to.
  • the opponent when the feature information in the candidate region of the part is extracted to obtain the characteristics of multiple parts, for example, the opponent may be in a gesture of picking up an article or a gesture of preparing to put back the article.
  • the above-mentioned classifier may be a pre-trained part classification model.
  • the part classification model is determined to be a gesture classification model
  • the opponent can be recognized. Determine the full size of the hand in the image, the position of the hand, and the outline of the hand.
  • the features of the head, shoulders and other parts can also be identified, so as to more accurately analyze the relative position between the article, the article storage device, and the user.
  • FIG. 4 is a schematic diagram of recognizing a target part in an image according to an embodiment of the present invention.
  • a video of an item can be obtained through an image capture device, and then analyzed to obtain multiple frames of images, and The captured image is preprocessed, and multiple candidate regions are extracted, then feature extraction and description are performed for each candidate region, and gestures are detected and recognized using a classifier. Finally, the recognition results can be output and decision-making.
  • the above embodiment indicates that after extracting the ROI candidate region, all the targets should be scaled to a uniform discriminant size, calculate their various features, select a set of features for each target as the basis of classification, and then input the features to have been trained Classifier to identify target candidate regions.
  • the recognition result of the item is determined.
  • Step S206 Acquire auxiliary information of the item through the information capturing device.
  • the information capture device includes: a depth camera configured to obtain depth information, a card reader configured to scan the item identification code, and a gravity device (such as a gravity plate) configured to obtain gravity information, It is set as an odor sensor for acquiring odor information.
  • the depth camera includes a TOF depth camera, a binocular camera, a structured light camera, and the like.
  • the above-mentioned information device is a gravity device
  • it can determine whether to take the goods and roughly which goods to take by comparing the gravity information obtained by the gravity device at different times.
  • the gravity device may be provided in the article storage device.
  • the gravity information detected by the gravity device is combined with the item information analyzed by the image capture device to determine the item recognition result.
  • the above-mentioned information device when the above-mentioned information device is an odor sensor, it can obtain the odor information of the item through the odor sensor, and determine the result of the item recognition by combining the item information analyzed by the image capture device.
  • the odor sensor may be provided in the article storage device.
  • the information capturing device is a depth camera, which is set to acquire a depth image of the item, and the auxiliary information of the item includes depth information. That is, the depth information of the item can be obtained through the selected depth camera. For example, after the user takes multiple items, the items appear to overlap or be blocked. At this time, the image captured by the image capture device cannot accurately analyze the blocked item.
  • auxiliary information such as depth information
  • the auxiliary information can be analyzed to obtain the analysis result of the item.
  • Step S208 multi-modal fusion of the location information and the auxiliary information to obtain a fusion result.
  • multi-modal fusion of the position information and the auxiliary information to obtain the fusion result includes: obtaining lens parameters and position parameters of the image capture device and the depth camera.
  • the lens parameters at least include: camera focal length, camera center point, the position parameter is set to indicate the position of the item in the depth image
  • the position parameter includes at least: the installation coordinates of each image capture device or depth camera; according to the depth
  • the camera lens parameters, depth information and the position of the item in the depth image obtain the position of the item in the depth camera coordinate system; according to the image capture device and depth position parameters, the depth camera coordinate system is used as a reference to calibrate the image capture device Relative positional relationship with respect to the depth camera; based on lens parameters, position of the item in the depth image, depth information and relative positional relationship, it is determined that the position of the item in the depth image corresponds to the position of the item in the image acquired by the image capture device
  • Multi-modal fusion is based on depth information to fuse recognition results.
  • the multi-modal fusion in the embodiment of the present invention is directed to images taken by two cameras, a normal camera and a depth camera.
  • the lens parameters and position parameters of the three cameras are acquired, where the lens parameters include the camera focal length, camera center point, etc.; according to the lens parameters and position parameters of the depth camera 2, the items in the depth camera 2 are obtained Coordinates; using the coordinate system of the depth camera 2 as a reference, calibrate the relative positional relationship of the image capture device relative to the depth camera 2; and based on the lens parameters, the position of the item in the depth image, the depth information and the relative positional relationship, according to the item
  • the coordinates in the depth camera 2 determine the mapping position information of the item in the image capture device (that is, the camera 1 and the camera 3), and finally the position information and the mapping position information can be compared to obtain a fusion result.
  • the position of the three-dimensional point in the image and the position in the camera coordinate system satisfy the following relationship:
  • s represents a scaling factor
  • F x and F y are the focal length of the camera's x and y axes
  • m x m and y are the center point of the camera in the x and y axes
  • K represents a camera internal reference matrix
  • X represents the position of the three-dimensional point of the article in the camera coordinate system
  • X [X Y Z] T
  • x represents the position of the three-dimensional point of the article in the image
  • x [u v] T.
  • d 2 represents the depth information of the depth camera 2
  • [u 2 v 2 1] T represents the position of the item in the depth image
  • K 2 represents the internal parameter matrix of the depth camera 2
  • [X 2 Y 2 Z 2 ] T represents the item Position in the depth camera 2 coordinate system.
  • the depth d 2 , the internal reference matrix K 2 , and the position of the item in the depth image [u 2 v 2 1] T are known quantities. Therefore, according to the depth camera lens parameters, all The depth information and the position of the item in the depth image can calculate the position of the item in the coordinate system of the depth camera 2 [X 2 Y 2 Z 2 ] T.
  • the relative positional relationship T 12 and T 32 of the cameras 1 and 3 relative to the depth camera 2 can be calibrated based on the coordinate system of the depth camera 2, where T 12 represents the depth camera 2 coordinate system to the camera 1
  • T 32 represents the relative position relationship between the depth camera 2 coordinate system to the camera 3 coordinate system.
  • the position of the item in the coordinate system of camera 1 [X 1 Y 1 Z 1 ] T can be obtained according to the position of the item in the coordinate system of depth camera 2 [X 2 Y 2 Z 2 ] T and the relative positional relationship T 12 , ie
  • the position of the item in the coordinate system of camera 3 [X 3 Y 3 Z 3 ] T can be obtained according to the position of the item in the coordinate system of depth camera 2 [X 2 Y 2 Z 2 ] T and the relative positional relationship T 32 , ie
  • the above-mentioned multi-modal fusion can realize the accurate recognition of the items in the image and obtain the result of the item fusion in the image.
  • Step S210 Determine the item recognition result based on the category information and the fusion result.
  • the item recognition result can be obtained according to the item category obtained in advance and the fusion result of item identification.
  • This application can focus on the item category, the number of items in each item category, and specific items.
  • the first one is to determine the merchandise that has been taken and put back according to the recognition results of the items in the multi-frame images.
  • the method when analyzing the item taking and returning, the method further includes: determining the tracking trajectory of the item according to the fusion result; classifying the tracking trajectory to obtain the trajectory classification result, wherein the trajectory classification result corresponds to For the movement results of the items; according to the trajectory classification results, determine the results of taking the items and the results of returning the items; according to the results of taking the items and the results of returning the items, update the item management list.
  • determining the tracking trajectory of the item according to the fusion result includes: obtaining the position information of the item and the movement trend of the item according to the fusion result; according to the current detection frame of the item and the predicted candidate frame The coincidence similarity and feature similarity between, determine the matching degree of the current detection result and the previous frame detection result, and obtain the tracking trajectory of the item, wherein the predicted candidate frame is based on the position information of the previous frame based on the movement of the item
  • the trend is obtained and the tracking trajectory includes: the position of the item, the type of item, and the timestamp of the item movement at each time node.
  • RGB camera that captures RGB images is used as the image capturing device, and a depth camera is used as the information capturing device.
  • the multimodal information fusion of the depth camera and RGB camera information can enable the system to obtain the position information of the item and the movement trend of the item, according to the coincidence similarity of the current detection frame of the item and the predicted candidate frame and the current detection frame of the item.
  • the feature similarity to the predicted candidate frame is used to determine the matching degree between the current detection result and the previous frame detection result, as shown in the following formula:
  • r is the matching degree between the detection result of the previous frame and the detection result of the current frame
  • IOU BBox current , BBox predict
  • f is the feature similarity between the current item detection frame and the predicted candidate frame
  • ⁇ and ⁇ are the weight coefficients of coincidence similarity and feature similarity, respectively, where the predicted candidate frame is based on the position information of the previous frame of the item Obtained according to the movement trend of items.
  • the tracking trajectory includes: the position of the item, the type of the item, and the timestamp of the item movement at each time node, that is, each time node can include the position and category of the product. Time stamp, etc.
  • the second step based on machine learning trajectory classification, optionally, the tracking trajectory is classified to obtain a trajectory classification result, including: extracting the moving length of the item from the tracking trajectory; combining pre-trained classification decision The tree model and the moving length of the item classify the tracking trajectory to obtain the trajectory classification result.
  • the trajectory is classified by combining the manually extracted parameters of the tracking trajectory with the decision tree pattern recognition algorithm. Combined with the experience of experts, the length of the trajectory, the starting position, the maximum position in the image, and the position at the end are extracted from the trajectory. Combined with the decision tree model, the trajectory can be classified as “accurately taken”, “accurately put back”, “suspected Six categories: “take”, “suspicious replacement”, “misidentification", "other”.
  • the trajectory classification results are discriminated.
  • the step of determining the item taking result or the item putting back result according to the track classification result includes: acquiring the image capturing device, or combining the image capturing device and the information capturing device at the same time Trajectory classification results; based on the trajectory classification results of the image capture device or the image capture device and the information capture device at the same time, establish a classification discrimination scheme based on a classification rule base; based on the classification discrimination scheme and trajectory classification results To determine the result of taking the item or the result of returning the item.
  • the above classification results may be discriminated, and the trajectory classification results of the image capture device may be discriminated based on the classification rule base.
  • multiple cameras and at least one depth camera are used as examples to explain, the following is a description of a judgment rule, and the following rules are established:
  • the second method is to determine the items to be taken and the commodities to be returned through the sales reference line.
  • the method in the present invention further includes: determining a sales reference line in the image captured by the image acquisition device, wherein the sales reference line is used to determine the operation of taking the item and the operation of placing the item back;
  • the cargo reference line determines the number of items and items that are taken in the item storage device (such as a container), and the number of items and items that are returned to the item storage device after being taken.
  • a sales reference line l can be defined in the camera's field of view, and the item is determined to be taken from the container through the reference line, and vice versa, the item moves from the reference line toward the container, and after the reference line, it is determined to be returned.
  • the items picked up and put back by the user are detected in real time.
  • the method of the present invention further includes: determining the coordinate system of each image acquisition device; dividing an item sensing area in the coordinate system; determining the item to be taken from the item storage device through the item sensing area and video And the number of items, and the number of items and items that are returned to the item storage device after being picked up.
  • an effective area in the camera coordinate system, detect the number of items in this area in real time, and combine the front and back frame information to determine the direction of item movement (you can judge based on the initial point position and end point position), Make a decision to take it back.
  • the items taken by the user and the items put back can be determined, and then automatic settlement can be performed.
  • the above item identification method further includes: obtaining an item price list, where the item price list contains the price of each item; and according to the item retrieval result and the item replacement result, it is determined to be taken The number of items and the number of items; based on the number of items and items taken, and the price of each item, determine the total price of the item settlement.
  • the above item price list may be a price list used by a store (or other shopping mall, etc.) using an item storage device, which records the items placed in each item storage device and the items taken and taken back Articles can be automatically managed through the article price list of the present invention.
  • the accuracy of item identification and counting can be effectively improved, container costs and operating costs can be greatly reduced, and the cargo damage rate can be effectively reduced.
  • an electronic device including: a processor; and a memory configured to store executable instructions of the processor; wherein, the processor is configured to execute the above by executing the executable instructions Any item identification method.
  • the storage medium includes a stored program, wherein, when the program is running, the device where the storage medium is located is controlled to perform any one of the above item identification methods.
  • An embodiment of the present invention provides an apparatus.
  • the apparatus includes a processor, a memory, and a program stored on the memory and executable on the processor.
  • the processor executes the program, the following steps are realized: acquiring multiple frames of an item through an image capturing device Processing of multi-frame images of items to obtain the position information and category information of items in each frame image; obtaining auxiliary information of items through an information capture device; multi-modal fusion of position information and auxiliary information to obtain fusion results; According to the category information and the fusion result, the identification result of the item is determined.
  • the following steps may also be implemented: performing image preprocessing on each frame of the image of the item, where the image preprocessing includes at least one of the following: image enhancement, image scaling, and image subtraction average value; Determine the item detection frame and category information in each frame of image after image preprocessing, where at least one item is included in the item detection frame; the position information of the item is determined according to the item detection frame.
  • non-maximum value suppression is performed on the item detection frame.
  • the processor executes the program, the following steps can also be achieved: acquiring multiple frames of the target part through the image capturing device; processing the multiple frames of the target part to obtain position information of the target part in each frame of image And discriminate results.
  • the recognition result of the item is determined.
  • the following steps may also be implemented: performing image preprocessing on each frame of the image of the target part to enhance the image contour of the user's target part, where the image preprocessing includes the following: image noise reduction , One or more processing methods such as image enhancement, contrast enhancement, image smoothing, image sharpening, etc.; select the part candidate region where the user's target part appears in each image after image preprocessing; extract the features in the part candidate region Information to obtain multiple part features; recognize multiple part features through a pre-trained classifier to obtain the position information and discrimination results of the target part in each frame of image.
  • image preprocessing includes the following: image noise reduction , One or more processing methods such as image enhancement, contrast enhancement, image smoothing, image sharpening, etc.
  • the following steps may also be implemented: scanning each frame of image through the sub-window to determine the part candidate region where the target part of the user may appear in each frame of image.
  • the information capturing device is a depth camera, which is set to acquire a depth image of the item, and the auxiliary information of the item includes depth information.
  • the following steps can also be achieved: acquiring the lens parameters and coordinate parameters of the image capture device and the depth camera; obtaining the items according to the lens parameters, depth information and position of the items in the depth image of the depth camera Position in the depth camera coordinate system; based on the image capture device and depth position parameters, using the depth camera coordinate system as a reference, calibrate the relative positional relationship of the image capture device relative to the depth camera; based on the lens parameters, items The position, depth information and relative position relationship in the depth image to determine the position of the item in the depth image corresponds to the mapped position information of the item in the image acquired by the image capture device; compare the position information and the mapped position information to obtain fusion result.
  • the processor executes the program, the following steps may also be implemented: turning on the image capturing device to obtain a video of the item; and intercepting multiple frames of the item from the video.
  • the following steps may also be implemented: determine the tracking trajectory of the item according to the fusion result; classify the tracking trajectory to obtain the trajectory classification result, where the trajectory classification result corresponds to the movement result of the item; According to the trajectory classification results, determine the item retrieval results and item return results; update the item management list based on the item retrieval results and item return results.
  • the following steps may also be achieved: according to the fusion result, the position information of the item and the movement trend of the item are obtained; according to the coincidence similarity between the current detection frame of the item and the predicted candidate frame and Feature similarity, determine the matching degree between the current detection result and the previous frame detection result, and obtain the tracking trajectory of the item.
  • the predicted candidate frame is obtained according to the movement trend of the item based on the position information of the previous frame.
  • the tracking trajectory includes : The position of the item, the type of item, and the timestamp of the item movement at each time node.
  • the following steps may also be implemented: extracting the movement length of the item from the tracking trajectory; combining the pre-trained classification decision tree model and the movement length of the item, classifying the tracking trajectory to obtain a trajectory classification result.
  • the following steps may also be achieved: obtaining the trajectory classification result of the image capturing device, or the image capturing device and the information capturing device at the same time; according to the image capturing device, or the image capturing device and all The information capture device combines the trajectory classification results at the same time to establish a classification discrimination scheme based on the classification rule base; according to the classification discrimination scheme and the trajectory classification results, it determines the item taking result or the item returning result.
  • the following steps can also be achieved: obtaining the item price list, where the item price list contains the price of each item; according to the item retrieval result and the item replacement result, it is determined to be taken The number of items and the number of items; based on the number of items and items taken, and the price of each item, determine the total price of the item settlement.
  • the present application also provides a computer program product which, when executed on a data processing device, is suitable for executing a program initialized with the following method steps: acquiring a multi-frame image of an item through an image capture device; processing the multi-frame image of the item To obtain the position information and category information of the items in each frame of the image; obtain the auxiliary information of the item through the information capture device; multi-modal fusion of the position information and the auxiliary information to obtain the fusion result; determine the item based on the category information and the fusion result Recognition results.
  • the disclosed technical content may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit may be a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may Integration into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, units or modules, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present invention essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the aforementioned storage media include: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program code .
  • the solution provided in this embodiment of the present application can realize item identification.
  • it can be applied to new retail smart containers and other equipment for selling goods.
  • Multiple cameras are installed on the smart container and multiple The camera shoots the video after opening the door, and then analyzes the multi-frame images in the video, by identifying the position and category of the items in the image, and multi-modal fusion with the auxiliary information obtained by the information capture device, so as to accurately obtain the identification result of the item
  • It can also accurately identify the type and quantity of items taken by users in the container, improve the item recognition rate, reduce the cargo damage rate, and then solve the technical problem of low recognition accuracy when recognizing items in related technologies.
  • the embodiment of the present application can automatically analyze the images taken by each device in the new retail scenario, analyze the types of items and item data taken by the user, and realize accurate identification of the items, improve the intelligent identification of the goods, and thus improve the new retail Intelligent commodity sales capability.

Abstract

本发明公开了一种物品识别方法及系统、电子设备。其中,该方法包括:通过图像捕获装置获取物品的多帧图像;对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息;通过信息捕获装置获取物品的辅助信息;将位置信息与辅助信息进行多模态融合,获得融合结果;根据类别信息和融合结果,确定物品的识别结果。

Description

物品识别方法及系统、电子设备
本申请要求于2019年01月08日提交中国专利局、申请号为201910016934.7、申请名称“物品识别方法及系统、电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及信息处理技术领域,具体而言,涉及一种物品识别方法及系统、电子设备。
背景技术
在相关技术中,智能货柜是新零售行业发展的一个重要方向,在识别物品时,目前主要有两种解决方案,一种是传统的RFID(Radio Frequency Identification,无线射频识别)技术方案,另一种是基于视觉识别的静态识别。对于第一种,基于RFID电子标签的解决方案,需要为不同类别的物品制定不同的RFID电子标签,通过无线电信号识别电子标签中的数据以达到物品识别与统计目的,其成本高昂,一方面RFID电子标签的成本高,另一方面推向市场后对成千上万的物品粘贴标签的人力成本太高;而且对金属、液体类物品的识别准确度不足;标签容易被人为撕下,导致货损率较高;而对于第二种,基于视觉识别的静态识别方案,需要在货柜的每层顶部安装相机,开门前和关门后各拍摄一张图像,然后通过视觉识别技术自动识别物品种类和数量,最后通过对比得出最后的结果,空间利用率低,因为相机要距离下层隔板要有较高的高度,否则难以拍到全貌,识别精度容易受到物品遮挡影响,物品不能堆叠摆放。
针对上述的问题,目前尚未提出有效的解决方案。
发明内容
本公开实施例提供了一种物品识别方法及系统、电子设备,以至少解决相关技术中在识别物品时,识别精度低的技术问题。
根据本发明实施例的一个方面,提供了一种物品识别方法,包括:通过图像捕获装置获取物品的多帧图像;对所述物品的多帧图像进行处理,以获得每帧图像中所述物品的位置信息和类别信息;通过信息捕获装置获取所述物品的辅助信息;将所述位 置信息与所述辅助信息进行多模态融合,获得融合结果;根据所述类别信息和所述融合结果,确定所述物品的识别结果。
可选地,对所述物品的多帧图像进行处理,以获得每帧图像中所述物品的位置信息和类别信息包括:对所述物品的每帧图像进行图像预处理;确定进行图像预处理后的每帧图像中的物品检测框和所述类别信息,其中,在所述物品检测框中包括至少一种物品;根据所述物品检测框确定所述物品的位置信息。
可选地,所述方法还包括:对所述物品检测框进行非极大值抑制。
可选地,所述方法还包括:通过图像捕获装置获取目标部位的多帧图像;对所述目标部位的多帧图像进行处理,以获得每帧图像中所述目标部位的位置信息和判别结果。
可选地,根据所述每帧图像中所述目标部位的位置信息和判别结果、所述物品的所述类别信息和所述融合结果,确定所述物品的识别结果。
可选地,对所述目标部位的多帧图像进行处理,以获得每帧图像中所述目标部位的位置信息和判别结果包括:对所述目标部位的每帧图像进行图像预处理,以增强用户的目标部位的图像轮廓;选取进行图像预处理后的每帧图像中出现用户的目标部位的部位候选区域;提取所述部位候选区域中的特征信息,得到多个部位特征;通过预先训练的分类器识别所述多个部位特征,以得到每帧图像中所述目标部位的位置信息及判别结果。
可选地,选取进行图像预处理后的每帧图像中出现用户的目标部位的部位候选区域,包括:通过子窗口扫描每帧图像,以确定每帧图像中可能出现用户的目标部位的部位候选区域。
可选地,所述方法还包括:对所述物品进行细粒度分类。
可选地,所述信息捕获装置包括下述至少之一:深度相机、读卡器、重力装置、气味传感器。
可选地,在所述信息捕获装置为所述深度相机时,通过所述深度相机获取所述物品的深度图像,所述物品的辅助信息包括深度信息。
可选地,将所述位置信息与所述辅助信息进行多模态融合,获得融合结果包括:获取所述图像捕获装置和所述深度相机的镜头参数和位置参数;根据所述深度相机的镜头参数、所述深度信息和所述物品在所述深度图像中的位置,获得所述物品在所述深度相机坐标系中的位置;根据所述图像捕获装置和所述深度相机的位置参数,以所 述深度相机的坐标系为基准,标定出所述图像捕获装置相对于所述深度相机的相对位置关系;基于所述镜头参数、所述物品在所述深度图像中的位置、所述深度信息和所述相对位置关系,确定所述物品在所述深度图像中的位置对应到所述物品在所述图像捕获装置获取的图像中的映射位置信息;将所述位置信息和所述映射位置信息进行比对,获得所述融合结果。
可选地,通过图像捕获装置获取物品的多帧图像包括:开启所述图像捕获装置以获取所述物品的视频;从所述视频中截取所述物品的多帧图像。
可选地,所述方法还包括:根据所述融合结果,确定物品的跟踪轨迹;对所述跟踪轨迹进行分类,得到轨迹分类结果,其中,所述轨迹分类结果对应于物品的移动结果;根据所述轨迹分类结果,确定物品拿取结果和物品放回结果;根据物品拿取结果和物品放回结果,更新物品管理列表。
可选地,根据所述融合结果,确定物品的跟踪轨迹包括:根据所述融合结果,获得所述物品的位置信息和所述物品的移动趋势;根据所述物品当前的检测框与预测的候选框之间的重合相似度和特征相似度,判断当前检测结果与上一帧检测结果的匹配度,得到物品的跟踪轨迹,其中,所述预测的候选框在上一帧所述物品的位置信息基础上根据所述物品的移动趋势获得,所述跟踪轨迹包括:在每个时间节点上物品的位置、物品种类、物品移动的时间戳。
可选地,对所述跟踪轨迹进行分类,得到轨迹分类结果的步骤,包括:从所述跟踪轨迹中提取物品移动长度;结合预先训练的分类决策树模型和所述物品移动长度,对对所述跟踪轨迹进行分类,得到轨迹分类结果。
可选地,根据所述轨迹分类结果,确定物品拿取结果或物品放回结果的步骤,包括:获取所述图像捕获装置、或者所述图像捕获装置与所述信息捕获装置结合在同一时刻的轨迹分类结果;根据所述图像捕获装置、或者所述图像捕获装置和所述信息捕获装置结合在同一时刻的轨迹分类结果,建立基于分类规则库的分类判别方案;依据所述分类判别方案和所述轨迹分类结果,确定物品拿取结果或物品放回结果。
可选地,所述方法还包括:获取物品价格表,其中,所述物品价格表中包含每种物品的价格;依据物品拿取结果和物品放回结果,确定被拿取的物品和物品数量;根据被拿取的物品和物品数量,以及每种物品的价格,确定物品结算总价。
可选地,所述方法应设置为新零售场景,所述新零售场景至少包括:无人售货商店、智能货柜。
根据本发明实施例的另一方面,还提供了一种物品识别系统,所包括:图像捕获 装置,设置为获取物品的多帧图像;信息捕获装置,设置为获取所述物品的辅助信息;服务器,设置为对所述物品的多帧图像进行处理,以获得每帧图像中所述物品的位置信息和类别信息,并将所述位置信息与所述辅助信息进行多模态融合,获得融合结果,再根据所述类别信息和所述融合结果,确定所述物品的识别结果。
可选地,所述图像捕获装置还设置为获取目标部位的多帧图像。
可选地,所述服务器,还设置为对所述目标部位的多帧图像进行处理,以获得每帧图像中所述目标部位的位置信息和判别结果,并根据所述每帧图像中所述目标部位的位置信息和判别结果、所述类别信息和所述融合结果,确定所述物品的识别结果。
可选地,还包括:物品存储装置,所述图像捕获装置和所述信息捕获装置在所述物品存储装置打开时开启。
根据本发明实施例的另一方面,还提供了一种电子设备,包括:处理器;以及存储器,设置为存储所述处理器的可执行指令;其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的物品识别方法。
根据本发明实施例的另一方面,还提供了一种存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行上述任意一项所述的物品识别方法。
在本发明实施例中,通过图像捕获装置获取物品的多帧图像,对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息,通过信息捕获装置获取物品的辅助信息,将位置信息与辅助信息进行多模态融合,获得融合结果,根据类别信息和融合结果,确定物品的识别结果。在该实施例中,可以实现多帧图像的获取,并分析得到物品的位置信息和类别信息,结合物品的辅助信息,准确识别出物品,同样可以准确识别被用户拿取的物品种类和物品数量,进而解决相关技术中在识别物品时,识别精度低的技术问题。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1根据本发明实施例的一种可选的物品识别系统的示意图;
图2是根据本发明实施例的一种可选的物品识别方法的流程图;
图3是根据本发明实施例的一种实现物品识别的示意图;
图4是根据本发明实施例一种识别图像中目标部位的示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
为便于用户理解本发明,下面对本发明各实施例中涉及的部分术语或名词做出解释:
新零售:是指以互联网为依托,通过运用大数据、人工智能等技术手段,对物品的生产、流通与销售过程进行升级改造,并对线上服务、线下体验以及现代物流进行深度融合。
RFID:无线射频识别(Radio Frequency Identification,RFID),又称作RFID电子标签,可通过无线电讯号识别特定目标并读写相关数据,而无需识别系统与特定目标之间建立机械或光学接触。
智能货柜:搭载有视觉识别技术的货柜。
货损率:货柜在运营过程中损失的物品数量占总物品数量的比例。
TOF深度相机:Time of Flight深度相机,又被称为3D相机,与传统相机不同之处在于该相机可同时拍摄景物的灰阶信息以及包含深度的3维信息。
NMS:Non Maximum Suppression,非极大值抑制。
相机:本文专指定制的摄像头。
多帧图像:基于图像或视频获取的包含至少一帧的图像。
本发明实施例可以应用于新零售的各种实施场景中,例如对于新零售中的智能货柜的使用,相对于相关技术中,在物品识别过程中,无法准确识别图像捕获装置捕获的图像中被用户拿取的物品的种类和数量,如仅仅通过开门前和关门后各拍摄一张图像,然后通过视觉识别技术自动识别物品种类和数量,最后通过对比得出最后的结果,会出现被拿取的物品无法通过一张照片识别出来,本发明实施例中,可以在智能货柜上安装多个相机,并拍摄开门后的视频,并分析视频中的多帧图像,对图像进行多模态融合,从而准确识别被用户拿取的物品种类和物品数据,提高智能货柜的物品识别智能化程度,减少货损率。
下面通过详细的实施例来说明本发明。
本发明实施例可以应用于新零售等领域,具体使用范围可以是在智能货柜、智能橱柜、商场、超市等区域,本发明可以以智能货柜对本发明进行示意性说明,但不限于此。
图1根据本发明实施例的一种可选的物品识别系统的示意图,如图1所示,该系统可以包括:图像捕获装置11,信息捕获装置12,服务器13,其中,
图像捕获装置11,设置为获取物品的多帧图像。可选的,图像捕获装置可以安装于货柜或者商场等区域中,布置图像捕获装置的数量为至少一个。可选的,在本发明实施例中,图像捕获装置可以为普通相机,例如,RGB相机、红外相机等。当然,本领域技术人员可根据实际需求调整图像捕获装置的类型和数量而不限于此处所给出的示例,并且当图像捕获装置的数量为2个以上时,可以均使用相同类型的图像捕获装置或者使用不同类型的图像捕获装置的组合。
信息捕获装置12,设置为获取物品的辅助信息。而对于信息捕获装置,可以设置在图像捕获装置周围,与图像捕获装置合作使用,设置的信息捕获装置的数量为至少一个。可选的,在本发明实施例中,信息捕获装置可以包括:设置为获取深度信息的深度相机,设置为扫描物品标识码的读卡器,设置为获取重力信息的重力装置(如重力板),设置为获取气味信息的气味传感器等。具体地,深度相机包括TOF深度相机、双目相机、结构光相机等。当然,本领域技术人员可根据实际需求调整信息捕获装置的类型和数量而不限于此处所给出的示例,并且当信息捕获装置的数量为2个以上时,均可以使用相同类型的信息捕获装置或者使用不同类型的信息捕获装置的组合。
例如,在上述信息装置为重力装置时,其可以通过比较重力装置在不同时刻获取的重力信息,判断是否拿取商品,以及大致拿取哪些商品。该重力装置可设置在物品 存储装置中。通过重力装置检测到的重力信息,结合图像捕获装置分析的物品信息确定出物品识别结果。
例如,在上述信息装置为气味传感器时,其可以通过气味传感器获取物品的气味信息,并结合图像捕获装置分析的物品信息确定物品识别结果。该气味传感器可设置在物品存储装置中。
服务器13,设置为对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息,并将位置信息与辅助信息进行多模态融合,获得融合结果,再根据物品的类别信息和融合结果,确定物品的识别结果。
上述物品识别系统,利用图像捕获装置11获取物品的多帧图像,通过信息捕获装置12获取物品的辅助信息,最后通过服务器13对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息,并将位置信息与辅助信息进行多模态融合,获得融合结果,再根据类别信息和融合结果,确定物品的识别结果。通过对图像中的物品的位置和类别进行识别,与信息捕获装置得到的辅助信息进行多模态融合,从而准确得到物品的识别结果,同样可以准确识别货柜中被用户拿取的物品种类和物品数量,提高物品识别率,减少货损率,进而解决相关技术中在识别物品时,识别精度低的技术问题。
对于图像捕获装置的布置数量和信息捕获装置的布置数量,可以根据每个使用区域和使用的设备进行合理性布置,例如,对于一个智能货柜而言,可以布设两个图像捕获装置和一个信息捕获装置。
优选的,信息捕获装置为TOF深度相机,设置为获取物品的深度图像,而物品的辅助信息包括深度信息。即可以通过深度相机采集物品的深度图像,得到物品摆放的深度信息,这样就可以有效识别重叠或者被遮盖的物品。
作为本申请一可选的实施例,上述物品识别系统还包括:使用图像捕获装置获取目标部位的多帧图像。在本申请中,目标部位可以为手、机械手、假肢或者其它可以拿取物品的人体部位、机械装置等,即本申请可以检测用户通过手拿取物品时的图像,通过对用户目标部位的图像检测,分析目标部位所处的位置。
另一种可选的,上述服务器还设置为对目标部位的多帧图像进行处理,以获得每帧图像中目标部位的位置信息和判别结果,并根据每帧图像中目标部位的位置信息和判别结果、物品的类别信息和融合结果,确定物品的识别结果。即可以通过目标部位的位置信息和判别结果,结合图像捕获装置和信息捕获装置得到的图像分析出物品的类别信息和融合结果,进而提高物品识别精确度。通过该目标部位的检测还可以得到 被用户拿取的物品种类和物品数量。
可选的,上述判别结果指示判别是不是目标部位。
优选的,目标部位的检测可以为手的检测。本发明下述实施例可以通过用户的手作为用户的目标部位进行说明,并检测每帧图像中手的位置。
作为本申请一可选的实施例,上述物品识别系统还包括:物品存储装置,图像捕获装置和信息捕获装置在物品存储装置打开时开启。
可选的,物品存储装置指示存储物品的设备、装置,在本申请中,物品存储装置可以包括但不限于上述的:智能货柜。
通过本发明实施例的物品识别系统,可以通过物品存储装置的开启信息为触发信息,同时开启图像捕获装置和信息捕获装置,以分别采集物品的多帧图像和物品的辅助信息,进而对多帧图像和辅助信息进行分析,得到物品的位置、类别等信息,进而与辅助信息进行多模态融合,得到物品的识别结果。还可以通过图像捕获装置检测到目标部位的多帧图像,并对目标部位进行检测,进而根据每帧图像中目标部位的位置信息和判别结果,结合图像捕获装置和信息捕获装置得到的图像并分析出物品的类别信息和融合结果,更准确的得到物品的识别结果,提高物品的识别精度。
下面说明本发明实施例中一种应用于上述物品识别系统的物品识别方法的实施例。
根据本发明实施例,提供了一种物品识别方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
图2是根据本发明实施例的一种可选的物品识别方法的流程图,如图2所示,该方法包括如下步骤:
步骤S202,通过图像捕获装置获取物品的多帧图像;
步骤S204,对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息;
步骤S206,通过信息捕获装置获取物品的辅助信息;
步骤S208,将位置信息与辅助信息进行多模态融合,获得融合结果;
步骤S210,根据类别信息和融合结果,确定物品的识别结果。
通过上述步骤,可以通过图像捕获装置获取物品的多帧图像,对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息,通过信息捕获装置获取物品的辅助信息,将位置信息与辅助信息进行多模态融合,获得融合结果,根据类别信息和融合结果,确定物品的识别结果。在该实施例中,可以实现多帧图像的获取,并分析得到物品的位置信息和类别信息,结合物品的辅助信息,准确识别出物品,同样可以准确识别被用户拿取的物品种类和物品数量,进而解决相关技术中在识别物品时,识别精度低的技术问题。
本发明实施例中,物品识别方法可以应用于新零售场景,新零售场景至少包括:无人售货商店中的智能货柜售货、超市购物中的智能货柜售货。
下面对上述各步骤进行详细说明。
步骤S202,通过图像捕获装置获取物品的多帧图像。
在本申请中,可选的,在本发明实施例中,图像捕获装置可以为普通相机,例如,RGB相机、红外相机、摄像头等。当然,本领域技术人员可根据实际需求调整图像捕获装置的类型和数量而不限于此处所给出的示例,图像捕获装置的数量为至少一个,当图像捕获装置的数量为2个以上时,可以均使用相同类型的图像捕获装置或者使用不同类型的图像捕获装置的组合。每个图像捕获装置都可以捕获至少两张图像,在识别时,需要统一图像捕获装置之间的图像捕获时间点,即对同一时间点的图像分别进行分析,以从多个角度识别出物品。
可选地,物品的数量为至少一个,物品可以放置在物品存储装置中,例如,将物品存放在智能货柜中。物品存储装置包括但不限于:智能货柜。
另一种可选的,在检测到物品存储装置被打开后,可以开启图像捕获装置和信息捕获装置。
在一可选的实施例中,通过图像捕获装置获取物品的多帧图像包括:开启图像捕获装置以获取物品的视频;从视频中截取物品的多帧图像。即可以在物品存储装置被打开后,通过图像捕获装置实时获取到物品存储装置内的视频,在物品存储装置被关闭或者检测到用户的拿取动作停止后,可以从视频中获取到多帧图像。
步骤S204,对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息。
本发明实施例中,在处理图像时,重点识别图像中物品所在的位置和类别,在分析位置信息时,可以重点分析物品在图像中的当前位置或者分析物品当前位置与上几 帧图像中物品所处的位置之间的关系。
本发明实施例中,在对图像进行处理时,包括两种情况,第一种情况,是对图像中物品所在位置和物品类别进行识别,第二种,是对图像中目标部位所在位置进行识别。
第一种情况,对图像中物品所在位置和物品类别进行识别。
可选地,对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息包括:对物品的每帧图像进行图像预处理。其中,该处图像预处理包括下述至少之一:图像增强、图像缩放、图像减均值;确定进行图像预处理后的每帧图像中的物品检测框和类别信息,其中,在物品检测框中包括至少一种物品;根据物品检测框确定物品的位置信息。
可选的,上述在确定进行图像预处理后的每帧图像中的物品检测框之前,可以先提取多个物品候选框(prior box),然后进行物品候选框进行深度学习和分析,以确定物品检测框和物品的类别信息。
其中,在分析物品检测框时,可以结合物品候选框和目标部位所在位置,对物品检测框进行高精度识别。
另一种可选的,上述物品识别方法还包括:对物品检测框进行非极大值抑制,以防止误检测,提高物品的识别精度。
即可以在识别图像中的物品时,先对图像进行预处理,包括图像增强、缩放以及减均值等操作,接下来是提取物品检测框,并对提取的物品检测框进行非极大值抑制(NMS),以防止误检测,提高物品的识别精度。
在另一可选的实施例中,上述物品识别方法还包括:对物品进行细粒度分类,以提高物品的识别精度。即可以通过对物品进行细粒度分析,以得到物品识别信息。可选的,针对相似物品进行细粒度分类,通过分析相似物品之间的微小差异提高物品的识别精度。可选的,本发明实施例中的物品的类型包括但不限于:蔬菜类、水果类、零食类、鲜肉类、海产类等。
图3是根据本发明实施例的一种实现物品识别的示意图,如图3所示,在进行物品识别时,可以先输入图像捕获装置拍摄的视频,在对视频进行裁剪后,可以对图像进行预处理,并提取物品候选框,结合目标部位的检测对提取的物品候选框进行分析,得到物品检测框,然后对物品检测框进行非极大值抑制,最后可以利用细粒度分类与多模态融合技术,确定出物品识别结果。
第二种情况,对图像中目标部位所在位置进行识别。
本发明实施例中,可以以手作为目标部位进行说明。
作为本发明一可选的实施例,上述物品识别方法还包括:通过图像捕获装置获取目标部位的多帧图像;对目标部位的多帧图像进行处理,以获得每帧图像中目标部位的位置信息和判别结果。
而在本发明另一可选的实施例,对目标部位的多帧图像进行处理,以获得每帧图像中目标部位的位置信息和判别结果包括:对目标部位的每帧图像进行图像预处理,以增强用户的目标部位的图像轮廓。其中,该处的图像预处理可以包括下述至少之一:图像降噪、图像增强、对比度增强、图像平滑、图像锐化等一项或多项处理方式;选取进行图像预处理后的每帧图像中出现用户的目标部位的部位候选区域;提取部位候选区域中的特征信息,得到多个部位特征;通过预先训练的分类器识别多个部位特征,以得到每帧图像中目标部位的位置信息及判别结果。
本发明实施例中的该处图像预处理,主要是针对目标部位的每帧图像进行图像预处理,通过图像预处理(可包括图像降噪、图像增强)等操作,对手部位进行增强,包括对比度增强、图像平滑、噪声滤除、图像锐化以增强目标轮廓。
在完成上述的图像预处理后,可以确定出多个部位候选区域,例如,确定多个手势候选区域(Region of Interest,ROI),在相机的全局感知范围中选取一些可能的手势候选区域。
可选的,选取进行图像预处理后的每帧图像中出现用户的目标部位的部位候选区域,包括:通过子窗口扫描每帧图像,以确定每帧图像中可能出现用户的目标部位的部位候选区域。即可以采用子窗口扫描全图,选取图像高度的1/n作为手部最小尺度,子窗口的大小在此基础上以一定倍数逐步增加。
上述的手势候选区域指示的是对手可能做出动作进行识别,在确定这些手势候选区域时,一般会参考手臂的位置、货柜的位置这些因素。
作为本发明一可选的示例,在提取部位候选区域中的特征信息,得到多个部位特征时,如,可以对手可能处于拿取物品的手势或者准备放回物品的手势进行识别。
可选的,上述的分类器可以为预先训练的部位分类模型,例如,确定部位分类模型为手势分类模型,在将提取的手部特征放入训练好的分类器模型后,可以对手进行识别,确定出图像中的手的完整大小、手的位置、手的轮廓。当然,本发明实施例中还能对头部、肩部等部位特征进行识别,以更精确分析物品与物品存储装置、用户之 间所处的相对位置。
图4是根据本发明实施例一种识别图像中目标部位的示意图,如图4所示,在进行图像识别时,可以通过图像捕获装置获取到物品的视频,然后分析得到多帧图像,并对拍摄的图像进行图像预处理,并提取出多个部位候选区域,然后对每个部位候选区域进行特征提取和描述,并利用分类器对手势进行检测识别,最后可以输出识别结果,并进行决策。
上述实施方式,指示了在提取ROI候选区域后,要将所有目标缩放为统一的判别大小,计算其各种特征,为每个目标选取一组特征作为分类的基础,而后将特征输入已训练好的分类器,对目标候选区域进行识别。
可选地,根据每帧图像中目标部位的位置信息和判别结果、结合图像捕获装置和信息捕获装置得到的图像并分析出物品的类别信息和融合结果,确定物品的识别结果。
步骤S206,通过信息捕获装置获取物品的辅助信息。
可选的,在本发明实施例中,信息捕获装置包括:设置为获取深度信息的深度相机,设置为扫描物品标识码的读卡器,设置为获取重力信息的重力装置(如重力板),设置为获取气味信息的气味传感器等,具体地,深度相机包括TOF深度相机、双目相机、结构光相机等。当然,本领域技术人员可根据实际需求调整信息捕获装置的类型和数量而不限于此处所给出的示例,并且当信息捕获装置的数量为2个以上时,可以均使用相同类型的信息捕获装置或者使用不同类型的信息捕获装置的组合。
例如,在上述信息装置为重力装置时,其可以通过比较重力装置在不同时刻获取的重力信息,判断是否拿取商品,以及大致拿取哪些商品。该重力装置可设置在物品存储装置中。通过重力装置检测到的重力信息,结合图像捕获装置分析的物品信息确定出物品识别结果。
例如,在上述信息装置为气味传感器时,其可以通过气味传感器获取物品的气味信息,并结合图像捕获装置分析的物品信息确定物品识别结果。该气味传感器可设置在物品存储装置中。
可选的,信息捕获装置为深度相机,设置为获取物品的深度图像,物品的辅助信息包括深度信息。即可以通过选取的深度相机来获取到物品的深度信息,例如,在用户拿取多个物品后,物品出现重叠或者遮挡,此时,通过图像捕获装置捕获的图像无法准确分析被遮挡的物品,通过该信息捕获装置可以获取到物品的辅助信息(如深度信息),对辅助信息进行分析,可以得到物品的分析结果。
步骤S208,将位置信息与辅助信息进行多模态融合,获得融合结果。
作为本发明另一可选的实施例,将位置信息与辅助信息进行多模态融合,获得融合结果包括:获取图像捕获装置和深度相机的镜头参数和位置参数。其中,镜头参数至少包括:相机焦距、相机中心点,位置参数设置为指示所述物品在所述深度图像中的位置,位置参数至少包括:每个图像捕获装置或深度相机的安装坐标;根据深度相机的镜头参数、深度信息和物品在深度图像中的位置获得物品在深度相机坐标系中的位置;根据图像捕获装置和深度的位置参数,以深度相机的坐标系为基准,标定出图像捕获装置相对于所述深度相机的相对位置关系;基于镜头参数、物品在深度图像中的位置、深度信息和相对位置关系,确定物品在深度图像中的位置对应到物品在图像捕获装置获取的图像中的映射位置信息品在图像捕获装置中的第二位置信息;将位置信息和映射位置信息进行比对,获得融合结果。
下面对多模态融合进行说明。多模态融合基于深度信息来对识别结果进行融合,本发明实施例中的多模态融合,针对普通相机和深度相机两种相机拍摄的图像。
分别以两个图像捕获装置(定义为普通相机,即相机1和相机3)和一个深度相机(深度相机2)为例进行说明。在相机设备出厂之前,对三个相机的镜头参数和位置参数进行获取,其中,镜头参数包括相机焦距,相机中心点等;根据深度相机2的镜头参数、位置参数获得物品在深度相机2中的坐标;以深度相机2的坐标系为基准,标定出图像捕获装置相对于深度相机2的相对位置关系;并基于镜头参数、物品在深度图像中的位置、深度信息和相对位置关系,根据物品在深度相机2中的坐标确定物品在图像捕获装置(即相机1和相机3)中的映射位置信息,最后可以将位置信息和映射位置信息进行比对,获得融合结果。
在相机模型中,根据小孔成像原理,三维点在图像中的位置与在相机坐标系中的位置满足如下关系:
Figure PCTCN2019092405-appb-000001
其中,s表示缩放因子,f x和f y分别为x轴和y轴上的相机焦距,m x和m y分别为x轴和y轴上的相机中心点,K表示相机内参矩阵,
Figure PCTCN2019092405-appb-000002
X表示物 品的三维点在相机坐标系中的位置,X=[X Y Z] T,x表示物品的三维点在图像中的位置,x=[u v] T
基于上述关系,对于深度相机,存在下列公式:
Figure PCTCN2019092405-appb-000003
其中,d 2表示深度相机2的深度信息,[u 2 v 2 1] T表示物品在深度图像中的位置,K 2表示深度相机2的内参矩阵,[X 2 Y 2 Z 2] T表示物品在深度相机2坐标系中的位置。
在上述公式(1)中,深度d 2,内参矩阵K 2,及物品在深度图像中的位置[u 2 v 2 1] T为已知量,因此,根据所述深度相机的镜头参数、所述深度信息和所述物品在所述深度图像中的位置可计算出物品在深度相机2坐标系中的位置[X 2 Y 2 Z 2] T
同样,对于相机1和3,分别存在下列公式:
Figure PCTCN2019092405-appb-000004
Figure PCTCN2019092405-appb-000005
本发明实施例中,可以以深度相机2的坐标系为基准,标定出相机1和3相对于深度相机2的相对位置关系T 12和T 32,其中T 12表示深度相机2坐标系到相机1坐标系之间的相对位置关系,T 32表示深度相机2坐标系到相机3坐标系之间的相对位置关系。
因此,物品在相机1坐标系中的位置[X 1 Y 1 Z 1] T可根据物品在深度相机2坐标系中的位置[X 2 Y 2 Z 2] T和相对位置关系T 12得到,即
Figure PCTCN2019092405-appb-000006
同样,物品在相机3坐标系中的位置[X 3 Y 3 Z 3] T可根据物品在深度相机2坐标系中的位置[X 2 Y 2 Z 2] T和相对位置关系T 32得到,即
Figure PCTCN2019092405-appb-000007
将公式(1)、(4)和(5)分别代入公式(2)和(3),经过变换可得到:
Figure PCTCN2019092405-appb-000008
Figure PCTCN2019092405-appb-000009
则物品在深度图像中的位置[u 2 v 2]在相机1和相机3上捕获的图像中的位置分别为[u 1 v 1]和[u 3 v 3]。
通过上述公式,可以确定出物品在深度相机中成像点在其它相机中的成像点,即将深度相机的拍摄的物品映射到其它普通相机中,并比较相机之间拍摄到的物品种类和物品数量是否出现误差,若有误差,需要服务器重新进行计算、比较,以确定识别出的物品结果。
上述的多模态融合,可以实现图像中物品的准确识别,得到图像中的物品融合结果。
步骤S210,根据类别信息和融合结果,确定物品的识别结果。
即可以根据预先分析得到的物品类别,以及物品识别的融合结果,得到物品的识别结果,本申请可以重点得到物品类别、每一物品类别的物品数量、具体物品。
对整个视频进行分析后,可以对连续多帧图像进行分析,以确定出物品被拿取和 放回的数据。
本发明实施例中,确定被拿取的商品和被放回的商品,包括三种方式。
第一种,根据多帧图像中物品的识别结果,确定被拿取和被放回的商品。
其中,本发明实施例中,在分析物品拿取和物品放回时,方法还包括:根据融合结果,确定物品的跟踪轨迹;对跟踪轨迹进行分类,得到轨迹分类结果,其中,轨迹分类结果对应于物品的移动结果;根据轨迹分类结果,确定物品拿取结果和物品放回结果;根据物品拿取结果和物品放回结果,更新物品管理列表。
即可以分为三步,第一步,基于信息捕获装置和图像捕获装置的轨迹跟踪;第二步,基于机器学习的轨迹分类;第三步,对轨迹分类结果进行判别。其中,在进行轨迹跟踪时,可选的,根据融合结果,确定物品的跟踪轨迹包括:根据融合结果,获得物品的位置信息和物品的移动趋势;根据物品当前的检测框与预测的候选框之间的重合相似度和特征相似度,判断当前检测结果与上一帧检测结果的匹配度,得到物品的跟踪轨迹,其中,预测的候选框在上一帧物品的位置信息基础上根据物品的移动趋势获得,跟踪轨迹包括:在每个时间节点上物品的位置、物品种类、物品移动的时间戳。
以拍摄RGB图像的RGB相机作为图像捕获装置,以深度相机作为信息捕获装置进行说明。深度相机和RGB相机信息的多模态信息融合,可以使系统获得物品的位置信息以及物品的移动趋势,根据所述物品当前的检测框与预测的候选框的重合相似度以及物品当前的检测框与预测的候选框的特征相似度,判断当前检测结果与上一帧检测结果的匹配度,如下式所示:
r=αIOU(BBox current,BBox predict)+βf(BBox current,BBox predict),
其中,r为上一帧检测结果与当前帧检测结果的匹配度,IOU(BBox current,BBox predict),为当前物品检测框与预测的候选框在空间上的重合相似度,f(BBox current,BBox predict)为当前物品检测框与预测的候选框的特征相似度,α和β分别为重合相似度和特征相似度的权重系数,其中,预测的候选框在上一帧物品的位置信息基础上根据物品的移动趋势获得。
将连续的检测结果轨迹连接,形成完整的跟踪轨迹,该跟踪轨迹包括:在每个时间节点上物品的位置、物品种类、物品移动的时间戳,即每个时间节点可以包括商品位置,品类,时间戳等。
对于轨迹分类,即第二步,基于机器学习的轨迹分类,可选的,对跟踪轨迹进行 分类,得到轨迹分类结果的步骤,包括:从跟踪轨迹中提取物品移动长度;结合预先训练的分类决策树模型和物品移动长度,对跟踪轨迹进行分类,得到轨迹分类结果。
在本发明实施例中,通过跟踪轨迹的人工提取参数与决策树模式识别算法结合,对轨迹进行分类。结合专家经验,从轨迹中提取轨迹长度,图像中起始位置,最大位置,结束时位置等特征,结合决策树模型,可以将轨迹分类成“准确拿取”“准确放回”,“疑似拿取”,“疑似放回”,“误识”,“其他”等六类。
另外,对于轨迹判别,即第三步,对轨迹分类结果进行判别。可选的,根据所述轨迹分类结果,确定物品拿取结果或物品放回结果的步骤,包括:获取所述图像捕获装置、或者所述图像捕获装置与所述信息捕获装置结合在同一时刻的轨迹分类结果;根据所述图像捕获装置、或者所述图像捕获装置和所述信息捕获装置结合在同一时刻的轨迹分类结果,建立基于分类规则库的分类判别方案;依据分类判别方案和轨迹分类结果,确定物品拿取结果或物品放回结果。
可选的,在进行分类判别时,可以针对上述分类结果进行判别,可以基于分类规则库进行对图像捕获装置的轨迹分类结果进行判别,可选的,以多个摄像头和至少一个深度相机为例进行说明,下面通过一种判别规则进行说明,建立如下规则:
1.多数摄像头认为“准确拿取”或“准确放回”则确认结果。
2.多数摄像头认为“疑似”,少数认为“准确”,则认为“准确”。
3.多数摄像头认为“疑似拿取”或“疑似放回”,则认为“拿取”或“放回”。
4.摄像头结果方向存在争议,则忽略此次结果。
通过上述方式,可以识别出多帧图像中的物品,并通过轨迹跟踪、轨迹分类、分类判别等方式,确定出被用户(或机器)拿取和放回的商品,从而为后续结算做准备。
第二种,通过售货参考线确定被拿取的物品和放回的商品。
可选的,本发明中的方法还包括:在图像获取装置拍摄的画面中确定一售货参考线,其中,售货参考线用于确定物品的拿取操作和物品的放回操作;根据售货参考线,确定物品存储装置(如货柜)中被拿取的物品和物品数量,以及在拿取后被放回物品存储装置的物品和物品数量。
即可以在相机视野中定义一条售货参考线l,物品从货柜内向外经过参考线则判定为拿取,反之,物品从参考线以外朝货柜运动,经过参考线,则判定为放回。
第三种,基于物品感应区来实时检测被用户拿取和放回的物品。
可选的,本发明中的方法还包括:确定每个图像获取装置的坐标系;在坐标系中划分出一个物品感应区域;通过物品感应区域和视频,确定物品存储装置中被拿取的物品和物品数量,以及在拿取后被放回物品存储装置的物品和物品数量。
在相机坐标系中划定一个有效区域(物品感应区域),实时检测此区域中出现的物品个数,并结合前后帧信息,判断物品移动方向(可以依据初始点位置和终止点位置判断),做出拿取放回判定。
通过上述步骤,可以确定出被用户拿取的物品和放回的物品,进而可以进行自动结算。
作为本申请另一可选的示例,上述物品识别方法还包括:获取物品价格表,其中,物品价格表中包含每种物品的价格;依据物品拿取结果和物品放回结果,确定被拿取的物品和物品数量;根据被拿取的物品和物品数量,以及每种物品的价格,确定物品结算总价。
可选的,上述物品价格表可以是使用物品存储装置的商店(或其它商场等)所使用的价格表,里面记录每个物品存储装置中所放置的物品和被拿取的、以及放回的物品,通过本发明的物品价格表,能够实现物品的自动管理。
通过本发明实施例中,可以有效提高物品识别与计数的精度,大幅度降低货柜成本与运营成本,同时有效减少货损率。
根据本发明实施例的另一方面,还提供了一种电子设备,包括:处理器;以及存储器,设置为存储处理器的可执行指令;其中,处理器配置为经由执行可执行指令来执行上述任意一项的物品识别方法。
根据本发明实施例的另一方面,还提供了一种存储介质,存储介质包括存储的程序,其中,在程序运行时控制存储介质所在设备执行上述任意一项的物品识别方法。
本发明实施例提供了一种设备,设备包括处理器、存储器及存储在存储器上并可在处理器上运行的程序,处理器执行程序时实现以下步骤:通过图像捕获装置获取物品的多帧图像;对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息;通过信息捕获装置获取物品的辅助信息;将位置信息与辅助信息进行多模态融合,获得融合结果;根据类别信息和融合结果,确定物品的识别结果。
可选的,上述处理器执行程序时,还可以实现以下步骤:对物品的每帧图像进行图像预处理,其中,图像预处理包括下述至少之一:图像增强、图像缩放、图像减均值;确定进行图像预处理后的每帧图像中的物品检测框和类别信息,其中,在物品检 测框中包括至少一种物品;根据物品检测框确定物品的位置信息。
可选的,上述处理器执行程序时,还可以实现以下步骤:对物品检测框进行非极大值抑制。
可选的,上述处理器执行程序时,还可以实现以下步骤:通过图像捕获装置获取目标部位的多帧图像;对目标部位的多帧图像进行处理,以获得每帧图像中目标部位的位置信息和判别结果。
可选地,根据每帧图像中目标部位的位置信息和判别结果、物品的类别信息和融合结果,确定物品的识别结果。
可选的,上述处理器执行程序时,还可以实现以下步骤:对目标部位的每帧图像进行图像预处理,以增强用户的目标部位的图像轮廓,其中,图像预处理包括下:图像降噪、图像增强、对比度增强、图像平滑、图像锐化等一项或多项处理方式;选取进行图像预处理后的每帧图像中出现用户的目标部位的部位候选区域;提取部位候选区域中的特征信息,得到多个部位特征;通过预先训练的分类器识别多个部位特征,以得到每帧图像中目标部位的位置信息及判别结果。
可选的,上述处理器执行程序时,还可以实现以下步骤:通过子窗口扫描每帧图像,以确定每帧图像中可能出现用户的目标部位的部位候选区域。
可选的,上述处理器执行程序时,还可以实现以下步骤:对物品进行细粒度分类。
可选的,信息捕获装置为深度相机,设置为获取物品的深度图像,物品的辅助信息包括深度信息。
可选的,上述处理器执行程序时,还可以实现以下步骤:获取图像捕获装置和深度相机的镜头参数和坐标参数;根据深度相机的镜头参数、深度信息和物品在深度图像中的位置获得物品在深度相机坐标系中的位置;根据图像捕获装置和深度的位置参数,以深度相机的坐标系为基准,标定出图像捕获装置相对于所述深度相机的相对位置关系;基于镜头参数、物品在深度图像中的位置、深度信息和相对位置关系,确定物品在深度图像中的位置对应到物品在图像捕获装置获取的图像中的映射位置信息;将位置信息和映射位置信息进行比对,获得融合结果。
可选的,上述处理器执行程序时,还可以实现以下步骤:开启图像捕获装置以获取物品的视频;从视频中截取物品的多帧图像。
可选的,上述处理器执行程序时,还可以实现以下步骤:根据融合结果,确定物品的跟踪轨迹;对跟踪轨迹进行分类,得到轨迹分类结果,其中,轨迹分类结果对应 于物品的移动结果;根据轨迹分类结果,确定物品拿取结果和物品放回结果;根据物品拿取结果和物品放回结果,更新物品管理列表。
可选的,上述处理器执行程序时,还可以实现以下步骤:根据融合结果,获得物品的位置信息和物品的移动趋势;根据物品当前的检测框与预测的候选框之间的重合相似度和特征相似度,判断当前检测结果与上一帧检测结果的匹配度,得到物品的跟踪轨迹,其中,预测的候选框在上一帧物品的位置信息基础上根据物品的移动趋势获得,跟踪轨迹包括:在每个时间节点上物品的位置、物品种类、物品移动的时间戳。
可选的,上述处理器执行程序时,还可以实现以下步骤:从跟踪轨迹中提取物品移动长度;结合预先训练的分类决策树模型和物品移动长度,对跟踪轨迹进行分类,得到轨迹分类结果。
可选的,上述处理器执行程序时,还可以实现以下步骤:获取图像捕获装置、或者图像捕获装置与信息捕获装置结合在同一时刻的轨迹分类结果;根据图像捕获装置、或者图像捕获装置和所述信息捕获装置结合在同一时刻的轨迹分类结果,建立基于分类规则库的分类判别方案;依据分类判别方案和轨迹分类结果,确定物品拿取结果或物品放回结果。
可选的,上述处理器执行程序时,还可以实现以下步骤:获取物品价格表,其中,物品价格表中包含每种物品的价格;依据物品拿取结果和物品放回结果,确定被拿取的物品和物品数量;根据被拿取的物品和物品数量,以及每种物品的价格,确定物品结算总价。
本申请还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序:通过图像捕获装置获取物品的多帧图像;对物品的多帧图像进行处理,以获得每帧图像中物品的位置信息和类别信息;通过信息捕获装置获取物品的辅助信息;将位置信息与辅助信息进行多模态融合,获得融合结果;根据类别信息和融合结果,确定物品的识别结果。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的技术内容,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,可以为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所 显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。
工业实用性
本申请实施例提供的方案可以实现物品识别,在本申请实施例提供的技术方案中,可以应用于新零售的智能货柜等售卖商品的设备中,在智能货柜上安装多个相机,使用多个相机拍摄开门后的视频,然后分析视频中的多帧图像,通过对图像中的物品的位置和类别进行识别,与信息捕获装置得到的辅助信息进行多模态融合,从而准确得到物品的识别结果,同样可以准确识别货柜中被用户拿取的物品种类和物品数量,提高物品识别率,减少货损率,进而解决相关技术中在识别物品时,识别精度低的技术问题。本申请实施例可以自动分析新零售场景下各个设备拍摄的图像,分析出被用户拿取的物品种类和物品数据,对物品实现准确识别,提高商品的智能化识别程度,进而提升新零售下的智能化商品售货能力。

Claims (24)

  1. 一种物品识别方法,包括:
    通过图像捕获装置获取物品的多帧图像;
    对所述物品的多帧图像进行处理,以获得每帧图像中所述物品的位置信息和类别信息;
    通过信息捕获装置获取所述物品的辅助信息;
    将所述位置信息与所述辅助信息进行多模态融合,获得融合结果;
    根据所述类别信息和所述融合结果,确定所述物品的识别结果。
  2. 根据权利要求1所述的方法,其中,对所述物品的多帧图像进行处理,以获得每帧图像中所述物品的位置信息和类别信息包括:
    对所述物品的每帧图像进行图像预处理;
    确定进行图像预处理后的每帧图像中的物品检测框和所述类别信息,其中,在所述物品检测框中包括至少一种物品;
    根据所述物品检测框确定所述物品的位置信息。
  3. 根据权利要求2所述的方法,其中,所述方法还包括:对所述物品检测框进行非极大值抑制。
  4. 根据权利要求1所述的方法,其中,所述方法还包括:
    通过图像捕获装置获取目标部位的多帧图像;
    对所述目标部位的多帧图像进行处理,以获得每帧图像中所述目标部位的位置信息和判别结果。
  5. 根据权利要求4所述的方法,其中,根据所述每帧图像中所述目标部位的位置信息和判别结果、所述物品的所述类别信息和所述融合结果,确定所述物品的识别结果。
  6. 根据权利要求4所述的方法,其中,对所述目标部位的多帧图像进行处理,以获得每帧图像中所述目标部位的位置信息和判别结果包括:
    对所述目标部位的每帧图像进行图像预处理,以增强用户的目标部位的图像 轮廓;
    选取进行图像预处理后的每帧图像中出现用户的目标部位的部位候选区域;
    提取所述部位候选区域中的特征信息,得到多个部位特征;
    通过预先训练的分类器识别所述多个部位特征,以得到每帧图像中所述目标部位的位置信息及判别结果。
  7. 根据权利要求6所述的方法,其中,选取进行图像预处理后的每帧图像中出现用户的目标部位的部位候选区域,包括:
    通过子窗口扫描每帧图像,以确定每帧图像中可能出现用户的目标部位的部位候选区域。
  8. 根据权利要求1所述的方法,其中,所述方法还包括:对所述物品进行细粒度分类。
  9. 根据权利要求1所述的方法,其中,所述信息捕获装置包括下述至少之一:深度相机、读卡器、重力装置、气味传感器。
  10. 根据权利要求9所述的方法,其中,在所述信息捕获装置为所述深度相机时,通过所述深度相机获取所述物品的深度图像,所述物品的辅助信息包括深度信息。
  11. 根据权利要求10所述的方法,其中,将所述位置信息与所述辅助信息进行多模态融合,获得融合结果包括:
    获取所述图像捕获装置和所述深度相机的镜头参数和位置参数;
    根据所述深度相机的镜头参数、所述深度信息和所述物品在所述深度图像中的位置,获得所述物品在所述深度相机坐标系中的位置;
    根据所述图像捕获装置和所述深度相机的位置参数,以所述深度相机的坐标系为基准,标定出所述图像捕获装置相对于所述深度相机的相对位置关系;
    基于所述镜头参数、所述物品在所述深度图像中的位置、所述深度信息和所述相对位置关系,确定所述物品在所述深度图像中的位置对应到所述物品在所述图像捕获装置获取的图像中的映射位置信息;
    将所述位置信息和所述映射位置信息进行比对,获得所述融合结果。
  12. 根据权利要求1所述的方法,其中,通过图像捕获装置获取物品的多帧图像包括:
    开启所述图像捕获装置以获取所述物品的视频;
    从所述视频中截取所述物品的多帧图像。
  13. 根据权利要求1所述的方法,其中,所述方法还包括:
    根据所述融合结果,确定物品的跟踪轨迹;
    对所述跟踪轨迹进行分类,得到轨迹分类结果,其中,所述轨迹分类结果对应于物品的移动结果;
    根据所述轨迹分类结果,确定物品拿取结果和物品放回结果;
    根据物品拿取结果和物品放回结果,更新物品管理列表。
  14. 根据权利要求13所述的方法,其中,根据所述融合结果,确定物品的跟踪轨迹包括:
    根据所述融合结果,获得所述物品的位置信息和所述物品的移动趋势;
    根据所述物品当前的检测框与预测的候选框之间的重合相似度和特征相似度,判断当前检测结果与上一帧检测结果的匹配度,得到物品的跟踪轨迹,其中,所述预测的候选框在上一帧所述物品的位置信息基础上根据所述物品的移动趋势获得,所述跟踪轨迹包括:在每个时间节点上物品的位置、物品种类、物品移动的时间戳。
  15. 根据权利要求13所述的方法,其中,对所述跟踪轨迹进行分类,得到轨迹分类结果的步骤,包括:
    从所述跟踪轨迹中提取物品移动长度;
    结合预先训练的分类决策树模型和所述物品移动长度,对所述跟踪轨迹进行分类,得到轨迹分类结果。
  16. 根据权利要求15所述的方法,其中,根据所述轨迹分类结果,确定物品拿取结果或物品放回结果的步骤,包括:
    获取所述图像捕获装置、或者所述图像捕获装置与所述信息捕获装置结合在同一时刻的轨迹分类结果;
    根据所述图像捕获装置、或者所述图像捕获装置和所述信息捕获装置结合在同一时刻的轨迹分类结果,建立基于分类规则库的分类判别方案;
    依据所述分类判别方案和所述轨迹分类结果,确定物品拿取结果或物品放回结果。
  17. 根据权利要求1所述的方法,其中,所述方法还包括:
    获取物品价格表,其中,所述物品价格表中包含每种物品的价格;
    依据物品拿取结果和物品放回结果,确定被拿取的物品和物品数量;
    根据被拿取的物品和物品数量,以及每种物品的价格,确定物品结算总价。
  18. 根据权利要求1所述的方法,其中,所述方法应设置为新零售场景,所述新零售场景至少包括:
    无人售货商店、智能货柜。
  19. 一种物品识别系统,所包括:
    图像捕获装置,设置为获取物品的多帧图像;
    信息捕获装置,设置为获取所述物品的辅助信息;
    服务器,设置为对所述物品的多帧图像进行处理,以获得每帧图像中所述物品的位置信息和类别信息,并将所述位置信息与所述辅助信息进行多模态融合,获得融合结果,再根据所述类别信息和所述融合结果,确定所述物品的识别结果。
  20. 根据权利要求19所述的物品识别系统,其中,所述图像捕获装置还设置为获取目标部位的多帧图像。
  21. 根据权利要求20所述的物品识别系统,其中,所述服务器,还设置为对所述目标部位的多帧图像进行处理,以获得每帧图像中所述目标部位的位置信息和判别结果,并根据所述每帧图像中所述目标部位的位置信息和判别结果、所述类别信息和所述融合结果,确定所述物品的识别结果。
  22. 根据权利要求19所述的物品识别系统,其中,还包括:
    物品存储装置,所述图像捕获装置和所述信息捕获装置在所述物品存储装置打开时开启。
  23. 一种电子设备,包括:
    处理器;以及
    存储器,设置为存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1至18中任意一项所述的物品识别方法。
  24. 一种存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行权利要求1至18中任意一项所述的物品识别方法。
PCT/CN2019/092405 2019-01-08 2019-06-21 物品识别方法及系统、电子设备 WO2020143179A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2019566841A JP6986576B2 (ja) 2019-01-08 2019-06-21 物品識別方法及びシステム、電子機器
EP19908887.3A EP3910608B1 (en) 2019-01-08 2019-06-21 Article identification method and system, and electronic device
US16/479,222 US11335092B2 (en) 2019-01-08 2019-06-21 Item identification method, system and electronic device
KR1020197036280A KR102329369B1 (ko) 2019-01-08 2019-06-21 물품 식별 방법 및 시스템, 전자설비

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910016934.7A CN111415461B (zh) 2019-01-08 2019-01-08 物品识别方法及系统、电子设备
CN201910016934.7 2019-01-08

Publications (1)

Publication Number Publication Date
WO2020143179A1 true WO2020143179A1 (zh) 2020-07-16

Family

ID=71490812

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/092405 WO2020143179A1 (zh) 2019-01-08 2019-06-21 物品识别方法及系统、电子设备

Country Status (6)

Country Link
US (1) US11335092B2 (zh)
EP (1) EP3910608B1 (zh)
JP (1) JP6986576B2 (zh)
KR (1) KR102329369B1 (zh)
CN (1) CN111415461B (zh)
WO (1) WO2020143179A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642425A (zh) * 2021-07-28 2021-11-12 北京百度网讯科技有限公司 基于多模态的图像检测方法、装置、电子设备及存储介质
CN113723383A (zh) * 2021-11-03 2021-11-30 武汉星巡智能科技有限公司 异视角同步识别同区域商品的订单生成方法及智能售货机
CN116021526A (zh) * 2023-02-07 2023-04-28 台州勃美科技有限公司 一种农业机器人控制方法、装置及农业机器人
CN117576535A (zh) * 2024-01-15 2024-02-20 腾讯科技(深圳)有限公司 一种图像识别方法、装置、设备以及存储介质

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7191671B2 (ja) * 2018-12-19 2022-12-19 フォルシアクラリオン・エレクトロニクス株式会社 キャリブレーション装置、キャリブレーション方法
CN111860326B (zh) * 2020-07-20 2023-09-26 品茗科技股份有限公司 一种建筑工地物品移动检测方法、装置、设备及存储介质
CN112242940B (zh) * 2020-07-31 2023-06-06 广州微林软件有限公司 一种箱柜食物智能管理系统及管理方法
CN112508109B (zh) * 2020-12-10 2023-05-19 锐捷网络股份有限公司 一种图像识别模型的训练方法及装置
CN112749638A (zh) * 2020-12-28 2021-05-04 深兰人工智能(深圳)有限公司 视觉识别轨迹的筛错方法和售货柜的视觉识别方法
CN112381184B (zh) * 2021-01-15 2021-05-25 北京每日优鲜电子商务有限公司 图像检测方法、装置、电子设备和计算机可读介质
CN112966766B (zh) * 2021-03-18 2022-06-07 北京三快在线科技有限公司 物品分类方法、装置、服务器及存储介质
CN113111932B (zh) * 2021-04-02 2022-05-20 支付宝(杭州)信息技术有限公司 智能货柜的物品核对方法及系统
CN112991380B (zh) * 2021-04-28 2021-08-31 北京每日优鲜电子商务有限公司 基于视频图像的轨迹生成方法、装置、电子设备和介质
CN113822859B (zh) * 2021-08-25 2024-02-27 日立楼宇技术(广州)有限公司 基于图像识别的物品检测方法、系统、装置和存储介质
CN113643473A (zh) * 2021-10-13 2021-11-12 北京每日优鲜电子商务有限公司 信息识别方法、装置、电子设备和计算机可读介质
CN114359973A (zh) * 2022-03-04 2022-04-15 广州市玄武无线科技股份有限公司 基于视频的商品状态识别方法、设备及计算机可读介质
CN115601686B (zh) * 2022-12-09 2023-04-11 浙江莲荷科技有限公司 物品交付确认的方法、装置和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017058780A (ja) * 2015-09-14 2017-03-23 日機装株式会社 自動販売機
CN108961547A (zh) * 2018-06-29 2018-12-07 深圳和而泰数据资源与云技术有限公司 一种商品识别方法、无人售货机及计算机可读存储介质
CN108985359A (zh) * 2018-06-29 2018-12-11 深圳和而泰数据资源与云技术有限公司 一种商品识别方法、无人售货机及计算机可读存储介质
CN109003390A (zh) * 2018-06-29 2018-12-14 深圳和而泰数据资源与云技术有限公司 一种商品识别方法、无人售货机及计算机可读存储介质
CN109035579A (zh) * 2018-06-29 2018-12-18 深圳和而泰数据资源与云技术有限公司 一种商品识别方法、无人售货机及计算机可读存储介质

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7639841B2 (en) * 2004-12-20 2009-12-29 Siemens Corporation System and method for on-road detection of a vehicle using knowledge fusion
US20080249870A1 (en) * 2007-04-03 2008-10-09 Robert Lee Angell Method and apparatus for decision tree based marketing and selling for a retail store
TWI391876B (zh) * 2009-02-16 2013-04-01 Inst Information Industry 利用多重模組混合圖形切割之前景偵測方法、系統以及電腦程式產品
US9111351B2 (en) * 2011-12-15 2015-08-18 Sony Corporation Minimizing drift using depth camera images
CN105190655B (zh) 2013-03-04 2018-05-18 日本电气株式会社 物品管理系统、信息处理设备及其控制方法和控制程序
CN104008571B (zh) * 2014-06-12 2017-01-18 深圳奥比中光科技有限公司 基于深度相机的人体模型获取方法及网络虚拟试衣系统
KR102317247B1 (ko) * 2015-06-15 2021-10-26 한국전자통신연구원 영상정보를 이용한 증강현실 기반 손 인터랙션 장치 및 방법
CN105205454A (zh) * 2015-08-27 2015-12-30 深圳市国华识别科技开发有限公司 自动捕捉目标物的系统和方法
CN106781121A (zh) * 2016-12-14 2017-05-31 朱明� 基于视觉分析的超市自助结账智能系统
DE112017007492T5 (de) * 2017-04-28 2020-02-13 MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. System und Verfahren zur Erfassung von Objekten in einem digitalen Bild und System und Verfahren zur Neubewertung von Objekterfassungen
CN108470332B (zh) * 2018-01-24 2023-07-07 博云视觉(北京)科技有限公司 一种多目标跟踪方法及装置
CN108389316B (zh) * 2018-03-02 2021-07-13 北京京东尚科信息技术有限公司 自动售货方法、装置和计算机可读存储介质
CN108470339A (zh) * 2018-03-21 2018-08-31 华南理工大学 一种基于信息融合的重叠苹果视觉识别与定位方法
CN108921645B (zh) * 2018-06-07 2021-07-13 深圳码隆科技有限公司 一种商品购买判定方法、装置和用户终端
CN108921048A (zh) * 2018-06-14 2018-11-30 深圳码隆科技有限公司 一种购物结算方法、装置和用户终端
CN109117746A (zh) * 2018-07-23 2019-01-01 北京华捷艾米科技有限公司 手部检测方法及机器可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017058780A (ja) * 2015-09-14 2017-03-23 日機装株式会社 自動販売機
CN108961547A (zh) * 2018-06-29 2018-12-07 深圳和而泰数据资源与云技术有限公司 一种商品识别方法、无人售货机及计算机可读存储介质
CN108985359A (zh) * 2018-06-29 2018-12-11 深圳和而泰数据资源与云技术有限公司 一种商品识别方法、无人售货机及计算机可读存储介质
CN109003390A (zh) * 2018-06-29 2018-12-14 深圳和而泰数据资源与云技术有限公司 一种商品识别方法、无人售货机及计算机可读存储介质
CN109035579A (zh) * 2018-06-29 2018-12-18 深圳和而泰数据资源与云技术有限公司 一种商品识别方法、无人售货机及计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3910608A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642425A (zh) * 2021-07-28 2021-11-12 北京百度网讯科技有限公司 基于多模态的图像检测方法、装置、电子设备及存储介质
CN113723383A (zh) * 2021-11-03 2021-11-30 武汉星巡智能科技有限公司 异视角同步识别同区域商品的订单生成方法及智能售货机
CN113723383B (zh) * 2021-11-03 2022-06-28 武汉星巡智能科技有限公司 异视角同步识别同区域商品的订单生成方法及智能售货机
CN116021526A (zh) * 2023-02-07 2023-04-28 台州勃美科技有限公司 一种农业机器人控制方法、装置及农业机器人
CN117576535A (zh) * 2024-01-15 2024-02-20 腾讯科技(深圳)有限公司 一种图像识别方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
CN111415461B (zh) 2021-09-28
US20210397844A1 (en) 2021-12-23
EP3910608A1 (en) 2021-11-17
EP3910608B1 (en) 2024-04-03
US11335092B2 (en) 2022-05-17
KR102329369B1 (ko) 2021-11-19
CN111415461A (zh) 2020-07-14
KR20200088219A (ko) 2020-07-22
JP2021513690A (ja) 2021-05-27
JP6986576B2 (ja) 2021-12-22
EP3910608A4 (en) 2022-02-16

Similar Documents

Publication Publication Date Title
WO2020143179A1 (zh) 物品识别方法及系统、电子设备
CN108491799B (zh) 一种基于图像识别的智能售货柜商品管理方法及系统
CN109271847B (zh) 无人结算场景中异常检测方法、装置及设备
Nam et al. Pest detection on traps using deep convolutional neural networks
US10176384B2 (en) Method and system for automated sequencing of vehicles in side-by-side drive-thru configurations via appearance-based classification
EP3678047B1 (en) Method and device for recognizing identity of human target
JP5959093B2 (ja) 人物検索システム
CN111222870A (zh) 结算方法、装置和系统
Liu et al. Grab: Fast and accurate sensor processing for cashier-free shopping
Chang et al. Localized detection of abandoned luggage
CN111476609A (zh) 零售数据获取方法、系统、设备及存储介质
CN113468914A (zh) 一种商品纯净度的确定方法、装置及设备
WO2021091481A1 (en) System for object identification and content quantity estimation through use of thermal and visible spectrum images
WO2023138445A1 (zh) 用于检测人摔倒、人的拾取或放回行为的检测方法和设备
KR102476496B1 (ko) 인공지능 기반의 바코드 복원을 통한 상품 식별 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램
US10679086B2 (en) Imaging discernment of intersecting individuals
Achakir et al. An automated AI-based solution for out-of-stock detection in retail environments
CN112257617B (zh) 多模态目标识别方法和系统
CN111738184B (zh) 一种商品取放识别方法、装置、系统及设备
JP6616093B2 (ja) 外観ベースの分類による隣り合ったドライブスルー構造における車両の自動順位付け方法及びシステム
US10991119B2 (en) Mapping multiple views to an identity
Nikouei et al. Smart surveillance video stream processing at the edge for real-time human objects tracking
JP2021107989A (ja) 情報処理装置、情報処理方法、プログラム
Boufama et al. Tracking multiple people in the context of video surveillance
KR102476498B1 (ko) 인공지능 기반의 복합 인식을 통한 상품 식별 방법 및 이를 실행하기 위하여 기록매체에 기록된 컴퓨터 프로그램

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019566841

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19908887

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019908887

Country of ref document: EP

Effective date: 20210809