WO2023180178A1 - System and method for object recognition utilizing color identification and/or machine learning - Google Patents

System and method for object recognition utilizing color identification and/or machine learning Download PDF

Info

Publication number
WO2023180178A1
WO2023180178A1 PCT/EP2023/056782 EP2023056782W WO2023180178A1 WO 2023180178 A1 WO2023180178 A1 WO 2023180178A1 EP 2023056782 W EP2023056782 W EP 2023056782W WO 2023180178 A1 WO2023180178 A1 WO 2023180178A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
data
luminescence
neural network
object recognition
Prior art date
Application number
PCT/EP2023/056782
Other languages
French (fr)
Inventor
Yunus Emre Kurtoglu
Matthew Ian Childers
Original Assignee
Basf Coatings Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Basf Coatings Gmbh filed Critical Basf Coatings Gmbh
Publication of WO2023180178A1 publication Critical patent/WO2023180178A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/60Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • aspects described herein generally relate to methods and systems for object recognition utilizing luminescence identification and/or machine learning. More specifically, aspects described herein relate to methods and systems for recognition of at least one luminescent object being present in a scene using data generated by a luminescent object recognition system and further data on the scene. This allows to boost the accuracy of the object recognition to near 100% in cases with ambiguous luminescence identification. Moreover, aspects described therein relate to a method for training an object recognition neural network using data generated by a luminescent object recognition system as well as methods and systems for object recognition using the trained neural network.
  • the luminescent objects are recognized by the luminescent object recognition system at pixel level, highly accurate bounding boxes or segmentation can be created automatically from the output data of the luminescent object recognition system and the labelled images can be directly used fortraining of the object recognition neural network to improve its performance.
  • This renders timeconsuming manual labeling of images superfluous and allows to generate neural networks which are specifically trained on the respective scene, thus reducing the number of objects each neural network must recognize to the number of objects occurring at the scene. This allows to reduce the necessary training time as well as the computing power necessary for object recognition using the trained neural network.
  • Computer vision is a field in rapid development due to abundant use of electronic devices capable of collecting information about their surroundings via sensors such as cameras, distance sensors such as LIDAR or radar, and depth camera systems based on structured light or stereo vision to name a few. These electronic devices provide raw image data to be processed by a computer processing unit and consequently develop an understanding of a scene using artificial intelligence and/or computer assistance algorithms. There are multiple ways how this understanding of the scene can be developed. In general, 2D or 3D images and/or maps are formed, and these images and/or maps are analysed for developing an understanding of the scene and the objects in that scene. The object identification process has been termed remote sensing, object identification, classification, authentication, or recognition over the years. While shape and appearance of objects in the scene acquired as 2D or 3D images can be used to develop an understanding of the scene, these techniques have some shortcomings. One prospect for improving computer vision is to identify objects based on the chemical components present on the objects in the scene.
  • a number of techniques have been developed for recognition of an object in computer vision systems and include, for example, the use of image-based physical tags (e.g. barcodes, QR codes, serial numbers, text, patterns, holograms etc.) or scan-/close contact-based physical tags (e.g. viewing angle dependent pigments, upconversion pigments, metachromics, colors (red/green), luminescent materials).
  • image-based physical tags e.g. barcodes, QR codes, serial numbers, text, patterns, holograms etc.
  • scan-/close contact-based physical tags e.g. viewing angle dependent pigments, upconversion pigments, metachromics, colors (red/green), luminescent materials.
  • image-based physical tags is associated with some drawbacks including (i) reduced readability in case the object comprising the image-based physical tag is occluded, only a small portion of the object is in view or the image-based physical tag is distorted, and (ii) the necessity to furnish the image-based physical tag on all sides of the object in large sizes to allow recognition from all sides and from a distance.
  • Scanning and close contact-based tags also have drawbacks. Upconversion pigments are usually opaque and have large particles sizes, thus limiting their use in coating compositions. Moreover, they require strong light probes because they only emit low levels of light due to their small quantum yields.
  • upconversion pigments have unique response times that are used for object recognition and classification, however, the measurement of the response time requires knowing the distance between the probe and the sample in order to calculate the time of flight for the light probe. Additionally, the upconversion response is much slower than the fluorescence and light reflection, thus requiring that the time of flight distance for that sensor/object system is known in advance to use the upconversion response for object recognition. This distance is, however, rarely known in computer vision applications. Similarly, viewing angle dependent pigment systems only work in close range and require viewing at multiple angles. Also, the color is not uniform for visually pleasant effects. The spectrum of incident light must be managed to get correct measurements. Within a single image/scene, an object that has angle dependent color coating will have multiple colors visible to the camera along the sample dimensions.
  • Luminescence based recognition under ambient lighting is a challenging task, as the reflective and luminescent components of the object are added together.
  • luminescence-based recognition will instead utilize a dark measurement condition and a priori knowledge of the excitation region of the luminescent material so the correct light probe/source can be used.
  • Another technique utilized for recognition of an object in computer vision is the use of passive or active electronic tags.
  • Passive electronic tags are devices which are attached to objects to be recognized without requiring to be visible or to be supplied with power, and include, for example, RFID tags.
  • Active electronic tags are powered devices attached to the object(s) to be recognized which emit information in various forms, such as wireless communications, light, radio, etc..
  • Use of passive electronic tags, such as RFID tags require the attachment of a circuit, power collector, and antenna to the item/object to be recognized or the object recognition system to retrieve information stored on the tag, adding cost and complication to the design. To determine a precise location when using passive electronic tags, multiple sensors have to be used in the scene, thus further increasing the costs.
  • Use of active electronic tags require the object to be recognized to be connected to a power source, which is cost- prohibitive for simple items like a soccer ball, a shirt, or a box of pasta and is therefore not practical.
  • Yet another technique utilized for recognition of an object in computer vision is the imagebased feature detection relying on known geometries and shapes stored in a database or image-based deep learning methods using algorithms which have been trained by numerous labelled images comprising the objects to be recognized.
  • a frequent problem associated with image-based feature detection and deep learning methods is that the accuracy depends largely on the quality of the image and the position of the camera within the scene, as occlusions, different viewing angles, and the like can easily change the results.
  • detection of flexible objects that can change their shape is problematic as each possible shape must be included in the database to allow recognition.
  • the visual parameters of the object must be converted to mathematical parameters at great effort to allow usage of a database of known geometries and shapes.
  • logo type images present a challenge since the can be present in multiple places within the scene (i.e. , a logo can be on a ball, a T- shirt, a hat, or a coffee mug) and the object recognition is by inference.
  • object recognition is by inference.
  • similarly shaped objects may be misidentified as the object of interest.
  • image-based deep learning methods such as CNNs
  • the accuracy of the object recognition is dependent on the quality of the training data set and large amounts of training material are needed for each object to be recognized/classified.
  • object tracking methods are used for object recognition.
  • items in a scene are organized in a particular order and labelled. Afterwards, the objects are followed in the scene with known color/geometry/3D coordinates.
  • “recognition” is lost if the object leaves the scene and re-enters.
  • these methods all lack the possibility to identify as many objects as possible within each scene with high accuracy and low latency using a minimum amount of resources in sensors, computing capacity, light probe etc.
  • luminescent object recognition systems relying on the illumination invariant luminescence of materials and the use of special light sources and/or sensor arrays to separate the reflected light from the luminesced light are known in the state of the art.
  • the number of recognizable objects is limited by the number of distinct luminescent colors and the same or similar luminescent color cannot be used for the same object.
  • the information on luminescence obtained by the luminescent recognition system is combined with traditional visual Al systems. This combination allows to use similar luminescent materials as identification tags for objects having different shapes because the traditional Al system will be able to distinguish between the objects having similar luminescence based on the different shape.
  • use of traditional Al systems requires training of such systems with huge amounts of training data which needs to be labelled manually.
  • the computer-implemented methods and systems for recognition of luminescent object(s) in a scene should result in a near 100% accuracy in cases with ambiguous luminescence identification, i.e. in case the same or a luminescent material is used as tag for different objects, preferably without the use of visual object recognition systems.
  • the method for training an object recognition neural network should allow automatic labelling of objects with high accuracy in acquired images of the scene as well as scene specific further training of the implemented trained neural network to prevent exhaustive training prior to implementation and to ensure flexible adaption of the implemented neural network to the occurrence of new objects in the scene.
  • the trained neural network should be used to recognize objects in a scene with high accuracy.
  • Object recognition refers to the capability of a system to identify an object in a scene, for example by using any of the aforementioned methods, such as analysing a picture with a computer and identifying/labelling a ball in that picture, sometimes with even further information such as the type of a ball (basketball, soccer ball, baseball), brand, the context, etc..
  • Luminescent object recognition system refers to a system which is capable of identifying an object being present in the scene by detecting the luminescence and optionally reflectance of the object upon illumination of the scene with a suitable light source.
  • a “scene” refers to the field of view of the object recognition system.
  • An object recognition system may have multiple cameras to expand its field of view and therefore the scene it covers.
  • An object recognition system may be comprised of multiple subsystems, each with multiple cameras, to cover increasingly large areas of space in sufficient detail.
  • Systems may be located in fixed locations, or be mobile and transported via human or robotic means.
  • Each subsystem may be further comprised of subsystems. For example, a household may be covered by a kitchen subsystem, comprised of a kitchen pantry subsubssytem and a kitchen waste canister subsubsystem, a garage subsystem, a living room subsystem, and a basement subsystem.
  • Scenes from each of these subsystems may overlap and information from one subsystem may be informative to the other related subsystems, as each scene is located in an overall related environment.
  • the location of each scene within a system may be indicative of the status of an item in that scene, for example, its placement in a waste or recycling bin scene may signal to the system that the item is being disposed of and a replacement should be ordered.
  • Ambient lightning refers to sources of light that are already available naturally (e.g. the sun, the moon) or artificial light being used to provide overall illumination in an area utilized by humans (e.g. to light a room).
  • ambient light source refers to an artificial light source that affects all objects in the scene and provides a visually pleasant lighting of the scene to the eyes of an observer without having any negative influences on the health of the observer.
  • the artificial light source may or may not be part of the artificial ambient lighting in a room. If it is part of the artificial ambient lightning in the room, it may act as the primary or secondary artificial ambient light source in a room.
  • Digital representation may refer to a representation of an object class, e.g. a known object, and to a representation of the scene in a computer readable form.
  • the digital representation of object classes may, e.g. be data on object specific reflectance and/or luminescence properties of known objects. Such data may comprise RGB values, rg chromacity values, spectral luminescence patterns, reflectance patterns or a combination thereof.
  • the data on object specific reflectance and/or luminescence properties may be interrelated with data on the respective object, such as object name, object type, bar code, QR code, article number, object dimensions, such as length, width, height, object volume, object weight or a combination thereof, to allow identification of the object upon determining the object specific reflectance and/or luminescence properties.
  • the digital representation of the scene may, e.g. be data being indicative of the geographic location of the scene, data being indicative of the household, data on stock on hand, data on preferences, historical data of the scene, data being indicative of legal regulations and/or commercial availability valid for the scene or geographic location, dimensions of the scene or a combination thereof, said data being optionally interrelated with a scene identifier.
  • Data being indicative of the geographic location of the scene may include GPS coordinates, IP address data, address data or a combination thereof.
  • Historical data of the scene may include the order history.
  • the scene identifier may be, for example, a user identity.
  • Communication interface may refer to a software and/or hardware interface for establishing communication such as transfer or exchange or signals or data.
  • Software interfaces may be e. g. function calls, APIs.
  • Communication interfaces may comprise transceivers and/or receivers.
  • the communication may either be wired, or it may be wireless.
  • Communication interface may be based on or it supports one or more communication protocols.
  • the communication protocol may a wireless protocol, for example: short distance communication protocol such as Bluetooth®, or WiFi, or long distance communication protocol such as cellular or mobile network, for example, second-generation cellular network ("2G"), 3G, 4G, Long-Term Evolution (“LTE”), or 5G.
  • 2G second-generation cellular network
  • 3G 3G
  • 4G Long-Term Evolution
  • 5G Long-Term Evolution
  • the communication interface may even be based on a proprietary short distance or long distance protocol.
  • the communication interface may support any one or more standards and/or proprietary protocols.
  • Computer processor refers to an arbitrary logic circuitry configured to perform basic operations of a computer or system, and/or, generally, to a device which is configured for performing calculations or logic operations.
  • the processing means, or computer processor may be configured for processing basic instructions that drive the computer or system.
  • the processing means or computer processor may comprise at least one arithmetic logic unit ("ALU"), at least one floating-point unit (“FPU)", such as a math coprocessor or a numeric coprocessor, a plurality of registers, specifically registers configured for supplying operands to the ALU and storing results of operations, and a memory, such as an L1 and L2 cache memory.
  • the processing means, or computer processor may be a multicore processor.
  • the processing means, or computer processor may be or may comprise a Central Processing Unit (“CPU”).
  • the processing means or computer processor may be a (“GPU”) graphics processing unit, (“TPU”) tensor processing unit, (“CISC”) Complex Instruction Set Computing microprocessor, Reduced Instruction Set Computing (“RISC”) microprocessor, Very Long Instruction Word (“VLIW') microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets.
  • the processing means may also be one or more special-purpose processing devices such as an Application-Specific Integrated Circuit (“ASIC”), a Field Programmable Gate Array (“FPGA”), a Complex Programmable Logic Device (“CPLD”), a Digital Signal Processor (“DSP”), a network processor, or the like.
  • ASIC Application-Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • CPLD Complex Programmable Logic Device
  • DSP Digital Signal Processor
  • processing means or processor may also refer to one or more processing devices, such as a distributed system of processing devices located across multiple computer systems (e.g., cloud computing), and is not limited to a single device unless otherwise specified.
  • Neurons refers to a collection of connected units or nodes called neurons. Each connection (also called edge) can transmit a signal to other neurons. An artificial neuron that receives a signal then processes it and can signal neurons connected to it. The "signal" at a connection is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs. Neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times.
  • Data storage medium may refer to physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general- purpose or specialpurpose computer system. Computer-readable media may include physical storage media that store computer-executable instructions and/or data structures.
  • Physical storage media include computer hardware, such as RAM, ROM, EEPROM, solid state drives (“SSDs”), flash memory, phase-change memory (“PCM”), optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage device(s) which can be used to store program code in the form of computer-executable instructions or data structures, which can be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention.
  • Database may refer to a collection of related information that can be searched and retrieved.
  • the database can be a searchable electronic numerical, alphanumerical, or textual document; a searchable PDF document; a Microsoft Excel® spreadsheet; or a database commonly known in the state of the art.
  • the database can be a set of electronic documents, photographs, images, diagrams, data, or drawings, residing in a computer readable storage media that can be searched and retrieved.
  • a database can be a single database or a set of related databases or a group of unrelated databases. “Related database” means that there is at least one common information element in the related databases that can be used to relate such databases.
  • a computer-implemented method for recognizing at least one object having object specific luminescence properties in a scene comprising:
  • a system for recognizing at least one object having object specific luminescence properties in a scene comprises: a light source comprising at least one illuminant for illuminating the scene a sensor unit for acquiring data of the scene including object specific reflectance and/or luminescence properties of at least one object being present in the scene upon illumination of the scene with the light source; a data storage medium comprising digital representations of pre-defined objects and digital representations of the scene; at least one communication interface for providing the acquired data of the scene, the digital representations of pre-defined objects and the digital representations of the scene; a processing unit in communication with the sensor unit and the communication interface, the processing unit programmed to o determine the at least one object in the scene based on
  • ⁇ the provided digital representation of the scene o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
  • the inventive systems combine a luminescence object recognition system based on the detection of object specific luminescence and optionally reflectance properties and further data, such as data on the scene, to boost the accuracy of the object identification to near 100% in use cases with ambiguous luminescence-only identification.
  • a method for training an object recognition neural network comprising: a) providing via a communication interface to a computer processor data of a scene, said data of the scene including image(s) of the scene and data on object specific luminescence and optionally reflectance properties of at least one object being present in the scene, b) calculating with the computer processor for each provided image of the scene a labelled image of the scene by b1) annotating each classified pixel of each image with an object specific label based on the data on object specific luminescence and optionally reflectance properties, and b2) optionally creating bounding boxes around the objects determined in the images in step b1) based on the annotated pixels or segmenting the images obtained after step b1) based on the annotated pixels; c) providing via a communication interface the calculated labelled images of the scene and optionally a digital representation of the scene to the object recognition neural network; and d) training the object recognition neural network with the provided calculated labelled images of the scene and optionally with the provided digital representation of the scene as
  • the inventive training method therefore allows to provide a trained object recognition neural network, wherein the training is performed automatically by using the data provided by the luminescence object recognition system, thus reducing the amount of user interaction normally required during training of visual Al recognition systems, for example by manually labelling the training images.
  • incentives may be used.
  • households may have their purchased luminescence object recognition system subsidized or receive other reimbursements in exchange for the labelled data their system generates.
  • items can be coated with luminescent tags.
  • purveyors of object recognition neural networks may coat or tag items that are not normally coated with the luminescent tags in order to collect training data.
  • a computer-implemented method for recognizing at least one object in a scene comprising:
  • (E) optionally providing via a communication interface the at least one identified object and/or triggering at least one action associated with the identified object(s).
  • the trained neural network can either be used in combination with a luminescent object recognition system or can be used independently from the luminescent object recognition system in a similar scene.
  • Use of the trained neural network in combination with the luminescent object recognition system results in an iterative improvement of the performance of each system, because the output of each system can be used to reduce the ambiguity in the object recognition of the other system, for example by performing sanity checks or to distinguish between objects with similar luminescence but different shapes or vice versa.
  • Use of the trained object recognition neural network in combination with a luminescent object recognition systems allows to improve identification accuracy because information devoid of shape, i.e. the information on the object derived from object specific reflectance and/or luminescence properties, is combined with information on the shape, i.e.
  • the trained object recognition neural network This allows to identify an object in case the information on reflectance and/or luminescence in combination with the information on the scene is not sufficient to clearly identify the object.
  • Independent use of the trained neural network may be preferred if the use of a luminescent object recognition system is not possible or not desirable due to the higher costs associated with the use of the luminescence object recognition system.
  • a retailer may install the more expensive fluorescent color identification system in one of its stores and use the data it collects to train the traditional Al visual system used in its other stores with similar layouts and product offerings.
  • a system for recognizing at least one object in a scene comprising: a sensor unit for acquiring data of the scene; a data storage medium comprising an object recognition neural network, in particular an object recognition neural network which has been trained according to the inventive method for training an object recognition neural network and optionally digital representations of pre-defined objects and/or a digital representation of the scene, at least one communication interface for providing the acquired data, the object recognition neural network, and optionally the digital representations of pre-defined objects and/or the digital representation of the scene, a processing unit in communication with the sensor unit and data storage medium, the processing unit programmed to o determine the at least one object in the scene based on
  • optionally the provided digital representations of pre-defined objects and/or the digital representation of the scene, o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
  • a non-transitory computer-readable storage medium including instructions that when executed by a computer, cause the computer to perform the steps according to the computer-implemented methods described herein.
  • a system comprising a scene and at least identified object, wherein the object was recognized using the system or the methods disclosed herein.
  • the inventive method is used to recognize at least one object having object specific reflectance and/or luminescence properties which is present in the scene.
  • Luminescence is the property of light being emitted from a material without heat.
  • luminescence mechanisms such as chemiluminescence, mechanoluminescence, and electroluminescence are known.
  • Photoluminescence is the emission of light/photons due to the absorption of other photons. Photoluminescence includes fluorescence, phosphorescence, upconversion, and Raman scattering. Photoluminescence, fluorescence and phosphorescence are able to change the color appearance of an object under ordinary light conditions. While there is a difference between the chemical mechanisms and time scales of fluorescence and phosphorescence, for most computer vision systems they will appear identical.
  • Some objects are naturally luminescent and can therefore be directly recognized with the proposed system and/or method without further modification of the object.
  • the luminescence has to be imparted.
  • objects having object specific luminescence and reflectance properties comprise at least one luminescence material, each luminescence material having a predefined luminescence property.
  • the object can be imparted with the at least one luminescence material by a variety of methods.
  • luminescent material(s) are dispersed in a coating material which is applied by spray coating, dip coating, coil coating, roll-to-roll coating and other application methods. After optional drying, the applied coating material is cured to form a solid and durable luminescence coating layer on the object surface.
  • the luminescence material(s) are printed onto the surface of the object.
  • the luminescence material(s) are dispersed into a composition and the composition is afterwards extruded, molded, or casted to obtain the respective object.
  • Other examples include genetical engineering of biological materials (vegetables, fruits, bacteria, tissue, proteins, etc.) or the addition of luminescent proteins in any of the ways mentioned herein. Since the luminescence spectral pattern of the luminescence material(s) are known, these luminescent material(s) can be used as an identification tag by interrelating the object comprising said luminescence material(s) with the respective luminescence spectral pattern(s). By using luminescent chemistry of the object as a tag, object recognition is possible irrespective of the shape of the object or partial occlusions.
  • Suitable luminescent materials are commercially available, and their selection is mainly limited by the durability of the fluorescent materials and compatibility with the material of the object to be recognized.
  • Preferred examples of luminescene materials include fluorescent materials, for example the BASF Lumogen® F series of dyes, such as, for example, yellow 170, orange 240, pink 285, red 305, a combination of yellow 170 and orange 240 or any other combination thereof.
  • Another example of suitable fluorescent materials are Clariant Hostasol® fluorescent dyes Red GG, Red 5B, and Yellow 3G.
  • Optical brighteners are a class of fluorescent materials that are often included in object formulations to reduce the yellow color of many organic polymers. They function by fluorescing invisible ultraviolet light into visible blue light, thus making the produced object appear whiter. Many optical brighteners are commercially available, including BASF Tinopal® SFP and Tinopal® NFW and Clariant Telalux® KSI and Telalux® OB1.
  • step (i) of the inventive method data of the scene is provided.
  • Said data of the scene includes data on object specific reflectance and/or luminescence properties of at least one object being present in the scene.
  • the data may be acquired with an object recognition system able to acquire the luminescence and/or reflectance properties of objects being present in the scene.
  • object recognition systems are commonly known in the state of the art and normally include a light source and a sensor unit.
  • the data on the scene is acquired with a sensor unit upon illumination of the scene with at least one light source comprising at least one illuminant and the acquired data is provided via the communication interface to the computer processor.
  • the acquired data may be stored on a data storage medium prior to providing the acquired data to the processor.
  • the data storage medium may be an internal data storage medium of the object recognition system or may be a database connected to the object recognition system via a communication interface.
  • the data on the scene can be acquired continuously, at pre-defined time intervals or upon the detection of a triggering event. Pre-defined time intervals may be based on the scene, such as location, preferences of persons present in the scene etc. Triggering events may include detection of a motion in the scene, switching on/off a light in the scene, detection of a sound in the scene or the vicinity of the scene or a combination thereof.
  • the light source comprises at least two different illuminants, preferably 2 to 20 different illuminants, more preferably 3 to 12 different illuminants, in particular 4 to 10 different illuminants.
  • the at least one illuminant, in particular all illuminants may have a peak center wavelength from 385 to 700 nm.
  • Use of illuminants having the aforementioned peak center wavelength renders it possible to use the light source of the inventive system as a primary or secondary ambient light source in a room. This allows to perform object recognition under ambient lightning conditions without the necessity to use defined lighting conditions (such as dark rooms) and to easily integrate the object recognition system into the ambient lightning system already present in the room without resulting in unpleasant lightning conditions in the room.
  • the illuminant(s) of the light source can be commonly known illuminants, such as illuminants comprising at least one solid-state lighting system (LED illuminants), illuminants comprising at least one incandescent illuminant (incandescent illuminants), illuminants comprising at least one fluorescent illuminant (fluorescent illuminants) or a combination thereof.
  • the at least one illuminant is an illuminant comprising at least one solid-state lighting system, in particular at least one narrowband LED.
  • all illuminants of the light source are illuminants comprising at least one solid-state lighting system, in particular at least one narrowband LED.
  • “Narrowband LED” may refer to an individual color LED (i.e. an LED not having a white output across the entire spectrum) having a full-width-half-max (FWHM) - either after passing through a bandpass filter or without the use of a bandpass filter - 5 to 60 nm, preferably of 3 to 40 nm, more preferably of 4 to 30 nm, even more preferably of 5 to 20 nm, very preferably of 8 to 20 nm.
  • FWHM full-width-half-max
  • each illuminant is obtained from the emission spectrum of each illuminant and is the difference of each wavelength at half of the maximum values of the emission spectrum.
  • Use of LED illuminants reduces the adverse effects on the health which can be associated with the use of fluorescent lights as previously described.
  • use of LED illuminants also has various advantages over the use of illuminants comprising incandescent lights: firstly, they allow fast switching between the illuminants of the light source, thus allowing faster acquisition times of the scene under various illumination conditions and therefore also faster object recognition.
  • LED illuminants require less energy compared to incandescent illuminants for the same amount of in band illumination, thus allowing to use a battery driven object recognition system.
  • LED illuminants require less amount of time to achieve a consistent light output and a steady state operating temperature, thus the object recognition system is ready faster.
  • the lifetime of LED illuminants is much higher, thus requiring reduced maintenance intervals.
  • the FWHD of the LED illuminants is narrow enough such that the use of a bandpass filter is not necessary, thus reducing the complexity of the system and therefore the overall costs.
  • the light source is configured to project at least one light pattern on the scene. Suitable light sources are disclosed, for example, in WO 2020/245441 A1 . In another example, the light source is illuminating the scene without the use of a light pattern.
  • the light source is a switchable light source.
  • Switchable light source refers herein to a light source comprising at least 2 illuminants, wherein the light source is configured to switch between the at least 2 illuminants.
  • the illuminant(s) and/or the solid state lighting system(s) of the illuminant(s) may be switched on sequentially and the switching of the sensor(s) present in the sensor unit may be synchronized to the switching of the illuminant(s) and/or the solid state lighting system(s) of the illuminant(s) such that each sensor acquires data when each illuminant and/or each solid state lighting system of each illuminant is switched on.
  • the switching of the illuminant(s) of the light source and the sensors of the sensor unit may be set to allow acquisition of object specific luminescence and/or reflectance properties under ambient lighting conditions as described later on with respect to the determination of further object specific luminescence and/or reflectance properties.
  • the light source may be a non-switchable light source.
  • the at least one light source comprises at least one light source filter positioned optically intermediate the illuminant(s) of the light source and the scene.
  • Suitable light source filters include bandpass filters or dynamic light filters or notch filters or linear polarizers.
  • the light source may comprise a single filter for all illuminants of the light source or may comprise a filter for each illuminant of the light source.
  • Bandpass filters may be used to narrow the emitted spectral light to obtain the previously described FWHM.
  • Dynamic light filters are configured to continuously operate over the light spectral range of interest and to provide blocking of at least one band of interest on demand, particularly at wavelengths covered by the luminescence spectral pattern of the at least one object.
  • a plurality of dynamic light filters are used, they are preferably configured to be synchronized with each other to block the same spectral band or bands simultaneously. Notch filters are configured to block light entering the scene from a window at at least one distinct spectral band within the spectral range of light continuously.
  • the linear polarizer is coupled with a quarter waveplate and the quarter waveplate is oriented with its fast and slow axes at an angle in the range of 40 to 50 degrees, preferably of 42 to 48 degrees, more preferably of 44 to 46 degrees relative to the linear polarizer.
  • the light source may further include further includes diffuser and/or focusing optics.
  • the light source comprises separate diffuser and/or focusing optics for each illuminant of the light source.
  • single focusing and diffuser optics may be used for all LEDs of the LED illuminant.
  • Suitable focusing optics comprise an individual frosted glass for each illuminant of the light source.
  • the light source comprises a single diffuser and/or focusing optic for all illuminants of the light source.
  • the at least one sensor of the sensor unit can be an optical sensor with photon counting capabilities, in particular a monochrome camera, an RGB camera, a multispectral camera, a hyperspectral camera or a combination thereof.
  • the sensor unit can comprise at least one sensor filter, in particular at least one multi-bandpass filter or at least one multi-dichroic beamsplitter or a linear polarizer.
  • Each sensor filter may be matched to spectral light emitted by the illuminant(s) of the light source to block the reflective light originating from illuminating the scene with the respective illuminant from the fluorescent light originating from illuminating the scene with the respective illuminant.
  • each sensor comprises a sensor filter, such as multi-bandpass filters having complementary transmission valleys and peaks or a linear polarizer.
  • the linear polarizer may be coupled with a quarter waveplate which is oriented with its fast and slow axes at an angle in the range of 40 to 50 degrees, preferably of 42 to 48 degrees, more preferably of 44 to 46 degrees relative to the linear polarizer.
  • the sensor unit comprises a single sensor filter for all sensors present in the sensor unit. Suitable single camera filters include multi-dichroic beam splitters.
  • the light source filter and the sensor filter are configured as separate filters.
  • the light source filter and the sensor filter are configured as a single filter, such as a single bandpass filter. Use of a single filter for the light source and the sensor unit allows to physically separate the luminescence from the reflectance upon illumination of the scene with the light source.
  • the sensor unit may further contain collection optics positioned optically intermediate the sensor filter and each sensor of the sensor unit or positioned optically intermediate the sensor filter of each sensor of the sensor unit and the scene.
  • the collection optics enable efficient collection of the reflected and fluorescent light upon illumination of the scene with the light source.
  • data of the scene further includes an at least partial 3D map of the scene.
  • the at least partial 3D map of the scene may be obtained from a scene mapping tool, for example by time of flight measurements or the usage of structured light.
  • a 3D map of the scene can be formed, thus giving information about specific coordinates of the respective objects within the scene.
  • Suitable object recognition systems which are able to acquire data on object specific luminescence and/or reflectance properties are, for example, disclosed in applications WO 2020/178052A1 , WO 2020/245443A2, WO 2020/245442A1 , WO 2020/245441 A1 , WO 2020/245439A1 and WO 2020/245444A1.
  • step (ii) of the inventive method digital representations of each pre-defined objects and a digital representation of the scene is provided via a communication interface to the computer processor.
  • the digital representations of each pre-defined objects each comprises object specific reflectance and/or luminescence properties optionally interrelated with object data.
  • Object specific reflectance and/or luminescence properties are, for example, RGB values, rg chromacity values, spectral luminescence patterns, reflectance patterns or a combination thereof.
  • Object data may include the object name, the object type, a bar code, a QR code, an article number, object dimensions, such as length, width, height, object volume, object weight or a combination thereof. Interrelation of the object specific luminescence and/or reflectance properties with the object data allows to identify the object upon determining the object specific reflectance and/or luminescence properties as described later on.
  • the digital representation of the scene comprises data being indicative of the geographic location of the scene, data being indicative of the household, data on stock on hand, data on preferences, historical data of the scene, data being indicative of legal regulations and/or commercial availability valid for the scene or geographic location, dimensions of the scene or a combination thereof, said data being optionally interrelated with a scene identifier.
  • Data being indicative of the geographic location of the scene may include GPS coordinates, IP address data, address data or a combination thereof.
  • Historical data of the scene may include the order history.
  • the scene identifier may be, for example, a user identity.
  • step (ii) includes providing at least one data storage medium having stored thereon the digital representations of pre-defined objects and/or the digital representation of the scene, obtaining the digital representations of pre-defined objects and/or the digital representation of the scene and providing the obtained digital representation(s).
  • the digital representation of the scene is obtained by searching the data stored on the data storage medium based on the scene identifier and retrieving the digital representation of the scene interrelated with the scene identifier from the data storage medium.
  • step (i) and steps (ii) of the inventive method may be reversed, i.e. step (ii) may be performed prior to step (i).
  • step (iii) of the inventive method at least one object in the scene is determined with the computer processor based on the data provided in step (i) and the digital representations provided in step (ii).
  • step (iii) further includes - prior to determining the at least one object in the scene - determining further object specific reflectance and/or luminescence properties from the provided data of the scene, in particular from the provided data on object specific reflectance and/or luminescence properties.
  • the further object specific reflectance and/or luminescence properties may be determined with the computer processor used in step (iii) or may be determined with a further processing unit.
  • the further processing unit may be located on in a cloud environment or may be a stationary processing unit.
  • Cloud environment may refer to the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user and may include at least one of the following service modules: infrastructure as a service (laaS), platform as a service (PaaS), software as a service (SaaS), mobile "backend” as a service (MBaaS) and function as a service (FaaS).
  • the stationary processing unit may be located within the object recognition system comprising the previously described light source and sensor unit.
  • Determining further object specific reflectance and/or luminescence properties is generally optional but may result in a higher accuracy in determining the object, especially under ambient lightning conditions as described hereinafter.
  • Determining further reflectance and/or luminescence properties may include at least one of: generating differential data by subtracting data of the scene acquired by at least sensor under ambient lightning and data of the scene acquired by at least one sensor under ambient lightning and illumination by the light source and optionally converting the differential data (also called delta-calculation hereinafter), determining the regions of luminescence in the generated differential data or in the data of the scene, determining the luminescence spectral pattern and/or the reflective spectral pattern.
  • Generating differential data may be necessary in case a physical separation of luminesced light and reflected light upon illumination of the scene with the light source is used to obtain data on object specific luminescence and/or reflectance properties because the filters used to achieve the physical separation are only able to block the reflective light from the illuminators of the light source and the corresponding portions of the ambient lighting but not all of the reflective light from a white light source used as artificial ambient light source in the scene.
  • the object specific reflectance and/or luminescence properties caused by the use of the light source cannot be detected directly.
  • This problem may be circumvented by performing the so-called delta-calculation, i.e. subtracting data collected under the ambient lighting from data collected under ambient lighting and illumination with the light source.
  • the data necessary for performing the delta-calculation can be obtained, for example, by synchronizing the illuminant(s) of the light source and the sensor(s) of the sensor unit such that the acquisition duration (i.e. the time each color sensitive sensor is switched on) of at least one sensor of the sensor unit and the illumination duration (i.e. the time each illuminant is switched on) of each illuminant of the light source only overlap partially, i.e. at least one sensor is switched on during a time where no illuminant of the light source is switched on, thus allowing to acquire data of the scene under illumination conditions being devoid of the illumination contributed by the light source.
  • the delta-calculation i.e.
  • the illuminant(s) and sensor(s) used to acquire the data provided in step (i) are preferably synchronized as described later on in relation with the inventive system. This allows to acquire the data provided in step (i) in combination with white light sources (e.g. ambient light sources), i.e. under real-world conditions, because the accuracy of the determination of the object in step (iii) is no longer dependent on the use of highly defined lightning conditions (such as dark rooms).
  • white light sources e.g. ambient light sources
  • the differential data may be converted, for example if an RGB color camera is used as sensor of the sensor unit.
  • regions of luminescence may be determined after generating the differential image. In another example, regions of luminescence may be determined directly from the acquired sensor data. Determination of regions of luminescence allows to determine the regions to analyze and classify as containing luminescent object(s). In one example, this is performed by analyzing the brightness of the pixels acquired with the luminescence channel (in case physical separation of luminesced and reflected light is used) because non- luminescent regions are black while luminescent regions, when illuminated by a suitable illuminant of the light source, will have some degree of brightness. The analysis can be performed by using a mask to block out black (i.e. non-luminescent regions), an edge detector to mark any region above a certain brightness under any illuminant as being part of the luminescent region or a combination thereof.
  • the luminescence spectral pattern and/or the reflective spectral pattern for the determined regions of luminescence can be determined if a multispectral or hyperspectral camera is used as sensor.
  • the luminescence spectral pattern can be determined from the spectral pattern acquired by the luminescence channel (i.e. the sensor of the sensor unit only acquiring luminescence of the object upon illumination of the scene with the light source) and the reflective spectral pattern and the luminescence spectral pattern can be determined from the spectral pattern acquired by the reflectance and luminescence channel (i.e. the sensor of the sensor unit acquiring reflectance and luminescence of the object upon illumination of the scene with the light source).
  • the luminescence spectral pattern may be calculated based on the acquired radiance data of the scene at different wavelengths, such as the acquired radiance data of the scene within the spectral bands that are omitted/blocked/filtered (e.g. based on the spectral distribution of the light filter).
  • determining the at least one object in the scene includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene and/or the determined further object specific reflectance and/or luminescence properties and the provided digital representations of pre-defined objects, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided digital representation of the scene.
  • the object identification hypotheses are corresponding to the best matching luminescence and/or reflectance properties. In another example, the object identification hypotheses are corresponding to the best matching objects being associated with the best matching luminescence and/or reflectance properties.
  • the associated confidence score is preferably corresponding to the degree of matching obtained during calculation of the best matching reflectance and/or luminescence properties as described hereinafter.
  • Determining the set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene and/or the determined further object specific reflectance and/or luminescence properties and the provided digital representations of predefined objects may include calculating the best matching reflectance and/or luminescence properties and obtaining the object(s) assigned to the best matching reflectance and/or luminescence properties.
  • the best matching reflectance and/or luminescence properties are calculated by applying any number of matching algorithms on the provided data of the scene and/or the determined further object specific reflectance and/or luminescence properties and the provided digital representations of pre-defined objects.
  • the best matching reflectance and/or luminescence properties are calculated by providing a data driven model of light spectral distribution and intensity on the at least one object to be recognized by analyzing an at least partial 3D map of the scene, merging the analyzed data with light source specific radiance values, calculating the radiance of light incident at points on the at least one object, and combining the calculated radiance of light incident at the points on the at least one object with the measured radiance of light returned to the at least one sensor of the sensor array from points on the at least one object, calculating the object specific reflectance and/or luminescence properties using the provided data driven model, and applying any number of matching algorithms on the calculated object specific reflectance and/or luminescence properties and the provided digital representations of pre-defined objects.
  • the radiance of light incident at a specific point in the scene can be formulated via the function of light intensity I(x,y, z) with (x,y,z) designating the space coordinates of the specific point within the scene.
  • the light intensity I(x,y, z) may be obtained in the simplest case by the sum of the light intensities of all light sources at the specific point (x, y, z) according to formula (3):
  • I k is the number of light sources present in the scene.
  • the calculated radiance of light incident at the points in the scene is combined with a measured radiance of light returned to the sensor(s) of the sensor unit from points in the scene, particularly from points on the object to be recognized. Based on such combination of calculated radiance and measured radiance, a model of light spectral distribution and intensity at the object in the scene is formed.
  • Suitable matching algorithms include lowest root mean squared error, lowest mean absolute error, highest coefficient of determination, matching of maximum wavelength value, nearest neighbors, nearest neighbors with neighborhood component analysis, trained machine learning algorithms or a combination thereof.
  • the object(s) assigned to the best matching reflectance and/or luminescence properties are obtained by retrieving the object(s) associated with the best matching reflectance and/or luminescence properties from the provided digital representations of the pre-defined objects. This may be preferred if the digital representations of pre-defined objects contain reflectance and/or luminescence properties interrelated with the respectively assigned object.
  • the object(s) assigned to the best matching reflectance and/or luminescence properties are obtained by searching a database for said object(s) based on the determined best matching reflectance and/or luminescence properties. This may be preferred if the digital representation of pre-defined objects only contains reflectance and/or luminescence properties but no further information on the object assigned to these properties.
  • the further database may be connected to the computer processor via a communication interface.
  • Refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided digital representation of the scene may include determining - based on the digital representation of the scene - confidence score(s) for the determined set of object identification hypothesis and using the determined confidence scores to refine the confidence scores associated with the determined set of object identification hypotheses to identify the at least one object.
  • Determining the confidence score(s) for the set of determined object identification hypotheses based on the provided digital representation of the scene may include determining the likelihood of the presence of objects associated with the determined object identification hypotheses in the scene based on the data contained in the provided digital representation of the scene and associating higher confidence score(s) to objects having a higher probability to be present in the scene based on the data contained in the digital representation of the scene. For example, geographic regions may only have distribution of one item per unique luminescent color and items identified in that region are assumed to be from the set of items distributed in that area and thus associated with the highest confidence score. Alternatively, a search or purchase history may be used to associate previously searched for or purchased item with the highest confidence score.
  • the object can be determined from the two determined confidence scores by a number of different algorithms.
  • the object(s) present in the scene may be determined by adding the two confidence scores together, and selecting the object associated with the highest sum value.
  • the object(s) present in the scene may be determined by multiplying the two confidence scores and selecting the object associated with the highest product value.
  • the object(s) present in the scene may be determined by raising one confidence score to an integer power, multiplying the obtained value by the other confidence score, and selecting the object associated with the highest product value.
  • the suitable algorithm may be determined empirically, using historical data to decide the proper weighting of the two determined confidence scores. For example, it may be beneficial to use a certain algorithm when both confidence scores are high, but use a different one when the confidence score associated with the object identification hypotheses is high and confidence score determined using the digital representation of the scene is low, and yet a different algorithm when the confidence score associated with the object identification hypothesis is low and confidence score determined using the digital representation of the scene is high.
  • Various machine learning models may be used to determine the best weightings or account for different regimes of the two confidence scores using historical data. Refinement of determined object identification hypothesis results in an increase in the accuracy of the object recognition in cases with ambiguous object identification based on the acquired object specific luminescence and/or reflectance properties. This allows to boost the object recognition accuracy of the inventive method to near 100%
  • refining the set of determined object identification hypotheses about the object(s) to be recognized is performed by a computer processor being different from the computer processor determining the set of object identification hypotheses about the object(s) to be recognized.
  • the determined object identification hypotheses may be provided to the further computer processor via a communication interface.
  • the further computer processor may be present in a stationary processing unit or located in a cloud environment.
  • determining the at least one object in the scene includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided digital representation of the scene, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided data of the scene and/or the determined further reflectance and/or luminescence properties and the provided digital representations of pre-defined objects.
  • Determining the set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided digital representation of the scene may include determining the likelihood of the presence of object(s) in the scene based on the provided digital representation of the scene and generating a set of object identification hypotheses and associated confidence scores based on the determined likelihood.
  • the set of object identification hypothesis may include a list of objects wherein each object present in the list is associated with a confidence score, i.e. the likelihood of the occurrence of the object in the respective scene based on the data contained in the digital representation of the scene.
  • the set of determined object identification hypotheses about the object(s) to be recognized is then refined by revising at least certain of said associated confidence scores based on the provided data of the scene and/or the determined further reflectance and/or luminescence properties and the provided digital representation of pre-defined objects.
  • This may include determining confidence scores associated with object(s) present in the scene by calculating the best matching reflectance and/or luminescence properties as previously described and using the determined confidence score(s) to refine the confidence scores associated with the determined object identification hypothesis to identify the at least one object.
  • refinement of the confidence score obtained using the digital representation of the scene with the confidence score(s) obtained by calculation the best matching reflectance and/or luminescence properties may be performed as previously described.
  • the at least one identified object is provided via the communication interface and/or an action associated with the identified object(s) is triggered by the computer processor.
  • this includes displaying at least part of the identified objects on the screen of a display device, optionally in combination further data on the recognized objects(s) and/or with at least one message.
  • the at least one message on the screen of the display device is displayed by retrieving - with the computer processor - the at least one message from a data storage medium, in particular a database, based on the identified object(s) and optionally the provided digital representation of the.
  • Further data on recognized object(s) may include the provided data on object specific reflectance and/or luminescence properties, the determined luminescence spectral pattern and/or reflectance pattern, the calculated best matching reflectance and/or luminescence properties, the position of the object(s) in the scene or a combination thereof.
  • the action is at least one pre-defined action associated with the detected object(s) or associated with the detected object(s) and data contained in the provided digital representation of the scene.
  • the at least one pre-defined action may include ordering of a new product, storing data on recognized object(s) on at least one storage medium, removing recognized object(s) stored on at least one storage medium from said storage medium, updating stock keeping records, creating a list of recognized object(s), providing information on recognized object(s) or created list(s) via a communication interface to a further computer processor, or a combination thereof.
  • the action is triggered automatically, for example after detection of the respective object in the scene.
  • the triggering does therefore not require any user interaction.
  • the processor may provide a message to the user informing the user about the triggered actions and updating the user on the status of the action to be performed.
  • the action is triggered after user interaction, for example after the user has approved the determined pre-defined action.
  • Suitable sensor units include the ones previously described in relation to step (i) of the inventive method.
  • the system further comprises at least one light source configured to illuminate the scene upon acquisition of data of the scene with the at least one sensor of the sensor unit.
  • Suitable light sources are the ones previously described in connection with step (i) of the inventive method.
  • the light source may be synchronized with the sensor unit.
  • at least one processing unit of the inventive system is configured to determine the synchronization or to synchronize the light source and the sensor unit based on a synchronization which was determined using a further processing unit (i.e. a processing unit being present separate from the inventive system).
  • a further processing unit i.e. a processing unit being present separate from the inventive system.
  • the synchronization of the light source and the sensor unit allows the inventive systems to operate under real world conditions using ambient lighting by subtracting data of the scene acquired under ambient light from data of the scene acquired under ambient light and illumination from the light source (i.e. by performing the previously described delta calculation).
  • Synchronization of the light source and the sensor unit mitigates the problems encountered with flickering of the light sources in the scene when the acquisition duration of each sensor is very short compared with the flicker period.
  • the ambient light contribution can vary by 100% depending on when in the flicker cycle the acquisition begins.
  • small changes in the phase i.e. the acquisition duration during a flicker cycle
  • the total number of flicker cycles recorded decreases while the difference in flicker cycle phase recorded remains the same, so the difference increases.
  • the result of the delta-calculation is only accurate if the same ambient lightning contribution is present during the capture of the images which are to be subtracted, i.e. the accurate determination of the contribution of each illuminant to the measured luminescence and reflectance is highly dependent on the acquisition duration of each sensor as well as its timing with respect to the flicker cycle of the light sources being present in the scene.
  • Using a highly defined synchronization allows for the inventive systems to compensate for the changes occurring in the acquired images due to the ambient light changes, thus rendering object recognition possible under ambient lightning instead of using highly defined illumination conditions, such as dark rooms, unpleasant lightning conditions, such as IR lightning conditions, or lightning conditions with adverse health effects or that are detrimental to common items, such as significant levels of UV lighting.
  • the synchronization of the at least one illuminant of the light source and the least one sensor of the sensor unit may be determined according to the method described in unpublished patent application US 63/139,299. Briefly, this method includes the following steps: i. providing a digital representation of the light source and the sensor unit via a communication interface to the computer processor, ii. determining - with a computer processor - the flicker cycle of all illuminants present in the scene or providing via a communication interface a digital representation of the flicker cycle to the computer processor, iii. determining - with the computer processor - the illumination durations for each illuminant of the light source based on the provided digital representations, iv.
  • step (e) determining - with the computer processor - the acquisition durations for each sensor of the sensor unit based on the provided digital representations, determined illumination duration and optionally the determined flicker cycle, and v. determining - with the computer processor - the illumination time points for each illuminant of the light source and the acquisition time points for each sensor of the sensor unit based on the data determined in step (d) and optionally in step (b), and vi. optionally providing the data determined in step (e) via a communication interface.
  • the synchronization is determined by determining the illumination duration of each illuminant or each solid state lighting system of each illuminant required to obtain sufficient sensor exposure as previously described, adapting the acquisition duration to the illumination duration and defining the switching order of the illuminants. This is preferred if separation of luminesced and reflected light used for object detection is performed computationally.
  • the determined synchronization can be provided to the control unit described later on for controlling the light source and sensors according to the determined synchronization.
  • the processing unit may be configured to adjust the determined synchronization based on the acquired data as described below, for example by determining the flicker cycle and/or sensitivity of each sensor during regular intervals and adjusting the durations and/or start points if needed.
  • the system further comprises a control unit configured to control the light source and/or the sensor unit.
  • Suitable control units include Digilent Digital Discovery controllers providing ⁇ 1 microsecond level control or microcontrollers, such as PJRC Teensy® USB Development Boards.
  • Microcontrollers or microprocessors refer to semiconductor chips that contain a processor as well as peripheral functions. In many cases, the working and program memory is also located partially or completely on the same chip.
  • the control unit may either be present within the processing unit, i.e. it is part of the processing unit, or it may be present as a separate unit, i.e. it is not part of the processing unit.
  • the control unit is preferably configured to control the light source by switching on and off the at least one illuminant and/or at least one solid lighting system of the at least one illuminant at at least one defined illumination time point for a defined illumination duration.
  • the time points for switching on and off each illuminant and/or each solid state lighting system of each illuminant and each sensor is received by the control unit from the processing unit.
  • the control unit is therefore preferably connected via a communication interface with the processing unit.
  • the determined synchronization is provided to the control unit and is not adjusted after providing the synchronization to the control unit. In this case, a fixed synchronization is used during object recognition.
  • the determined synchronization can be dynamically adjusted based on real time evaluation of the sensor readings to ensure that different levels of ambient lighting or different distances from the system to the object are considered, thus increasing the accuracy of object recognition. This may be performed, for example, by determining the flicker cycle and/or the sufficient exposure of each sensor and adjusting the acquisition duration and/or the illumination duration and/or the defined time points for each sensor and/or each illuminant accordingly.
  • the flicker cycle and the adjustments may be determined by the processing unit and the determined adjustments are provided via to communication interface to the control unit.
  • control unit is configured to switch on the illuminant(s) or the solid lighting system(s) of the illuminant(s) according to their respective wavelength (i.e. from the shortest to the longest or vice versa) and to switch on each sensor of the sensor device sequentially.
  • control unit is configured to switch on the illuminant(s) or the solid lighting system(s) of the illuminant(s) in an arbitrary order, i.e. not sorted according to their wavelength, and to switch on the corresponding sensor associated with the respective illuminant or the respective solid lighting system of the respective illuminant.
  • control unit may be configured to cycle through each color twice, i.e. by switching on bluel , greenl , red1 , blue2, green2, red2, to achieve a more uniform white balance over time.
  • control unit is configured to switch on each sensor without switching on any illuminant and/or any solid lighting system of any illuminant after each illuminant and/or each solid lighting system of each illuminant has been switched on (i.e. after one cycle is complete) to acquire the background data (i.e. data without the light source of the inventive system being switched on) required for delta-calculation.
  • Measurement of the background data is performed using the same defined time points and defined duration(s) for each sensor as used during the cycling through the illuminants/solid lighting systems of the illuminants (i.e.
  • the same durations are used for acquisition of the background data).
  • the background measurements are made at different intervals, such as for every sensor capture or between multiple cycles, depending on the dynamism of the scene, desired level of accuracy, and desired acquisition time per cycle.
  • the acquired background data is subtracted from the illuminator/solid lighting system “on” acquired data using the corresponding acquisition duration to yield the differential image as previously described. This allows to account for common sources of indoor lighting flicker and thus allows to use the inventive systems under real-life conditions with a high accuracy of object recognition.
  • the control unit may be configured to add extra illumination to the scene by switching on an illuminant/solid lighting system of an illuminant at a time when all sensors of the sensor unit are switched off to achieve better color balance between the illuminants and/or the solid lighting systems of the illuminant and to improve the white balance of the overall illumination.
  • the system comprises a scene mapping tool configure to map the scene to obtain an at least partial 3D map of the scene as previously described in relation to the inventive method.
  • the processing unit comprises a first processing unit in communication with the sensor unit, the communication interface and a second processing unit, the first processing unit programmed to o determine a set of object identification hypotheses about the object(s) to be recognized in the scene based on the
  • the second processing unit in communication with the communication interface and the first processing unit, the second processing unit programmed to o refine the received set of object identification hypothesis to identify at least one object by revising at least certain of said associated confidence scores based on the provided digital representation of the scene or based on the provided data of the scene and the provided digital representations of pre-defined objects, and o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
  • the system further comprises a display unit configured to display the determined object(s) and optionally further data.
  • the display unit may be a display device having a screen on which the determined objects and optionally further data may be displayed to the user.
  • Suitable display units include stationary display devices (e.g. personal computers, television screen, screens of smart home systems being installed within a wall/on a wall) or mobile display devices (e.g. smartphones, tablets, laptops).
  • the display device can be connected with the processing unit via a communication interface which may be wired or wireless.
  • the further data may include data acquired on the object specific reflectance and/or luminescence properties, determined further object specific reflectance and/or luminescence properties, data from the control unit, such as switching cycles of illuminant(s) and sensor(s), used matching algorithms, results obtained from the matching process and any combination thereof.
  • step a) of the inventive training method data of a scene is provided.
  • Said data includes image(s) of the scene and data on object specific luminescence and optionally reflectance properties of at least one object being present in the scene.
  • data may be obtained using a luminescence object recognition system, for example a luminescence object recognition system described previously, which comprises at least one light source and at least one sensor unit configured to acquire object specific luminescence and optionally reflectance properties upon illuminating the scene with the light source.
  • the data on the scene is provided via a communication interface to a computer processor.
  • the computer processor may be located within the luminescence object recognition system or may be located in a further processing unit which is present separate from the luminescence object recognition system.
  • steps b1) and optionally b2) are performed with the same processor. In another example, steps b1) and optionally b2) are performed with separate processors. For example, step b1) is performed with the processor of the luminescence object recognition system while optional step b2) is performed with a further processor being located separate from the luminescence object recognition system.
  • a labelled image is calculated with the computer processor. This includes annotating each classified pixel of each image with an object specific label based on the provided data on object specific luminescence and optionally reflectance properties in a first step b1).
  • the object specific label indicates the name of the object which was identified by the computer processor based on the provided object specific luminescence and optionally reflectance properties as described in the following.
  • step b1) includes providing via a communication interface to the computer processor digital representations of pre-defined objects and optionally a digital representation of the scene, detecting, using the computer processor, for each image of the scene regions of luminescence by classifying the pixels associated with the detected regions of luminescence, determining the object(s) associated with the detected regions of luminescence and being present in each image by determining the best matching reflectance and/or luminescence properties for each detected region of luminescence, obtaining the object(s) assigned to the best matching reflectance and/or luminescence properties and optionally refining the obtained object(s) based on the provided digital representation of the scene, and annotating each classified pixel of each image with an object specific label based on the determined object(s) and the associated detected regions of luminescence.
  • the digital representations of pre-defined objects contain data on object specific luminescence and optionally reflectance properties which may be interrelated with the respective object name.
  • the digital representation of the scene contains the data previously described in relation to the inventive object recognition method.
  • the digital representations can be provided to the computer processor by retrieving them from a data storage medium, such as a database as previously described, for example by using a scene identifier to retrieve the appropriate digital representation of the scene.
  • Detection of regions of luminescence in particular regions having similar luminescence to ensure that overlapping objects with different luminescence are identified as different objects, can be accomplished as previously described in relation to the determination of further object specific luminescence and/or reflectance properties.
  • differential data is generated prior to detecting regions of luminescence as previously described in relation to the determination of further object specific luminescence and/or reflectance properties.
  • luminescence spectral patterns and/or reflective spectral patterns are determined for the determined regions of luminescence prior to determining the best matching luminescence and optionally reflectance properties.
  • the luminescence spectral patterns and/or reflective spectral patterns can be determined as previously described in relation to the inventive method for object recognition.
  • the object(s) associated with the detected regions of luminescence are determined with the computer processor by determining the best matching luminescence and optionally reflectance properties for each detected region of luminescence as previously described in relation to the inventive method for object recognition.
  • object(s) assigned to these best matching properties are retrieved from the provided digital representations of pre-defined objects or from a database as previously described.
  • the list of obtained object(s) may be refined as previously described based on the provided digital representation of the scene, for example if the list of obtained object(s) contains more than one object. This ensures a higher recognition accuracy in cases with ambiguous identification based on the provided luminescence and optionally reflectance properties.
  • each classified pixel in each image is annotated with an object specific label based on the determined object(s) and the associated detected regions of luminescence.
  • step b2) bounding boxes are created around the objects determined in step b1) based on the annotated pixels or the images obtained in step b1) are segmented based on the annotated pixels.
  • Bounding boxes and image segmentation based on the annotated pixels can be done using methods commonly known in the art for image segmentation and bounding box creation.
  • Bounding boxes describe the spatial location of the objects determined in step b1).
  • the bounding box is rectangular, and is defined by the x and y coordinates of the upper-left corner and the x and y coordinates of the lower-right corner of the rectangle.
  • a bounding box representation using the (x)(y)-axis coordinates of the bounding box center and the width and height of the box is used.
  • Bounding boxes can, for example, be created based on coordinate information of the annotated pixels.
  • Image segmentation is a commonly used technique in digital image processing and analysis to partition an image into multiple parts or regions based on the characteristics of the pixels in the image. In this case, image segmentation may involve clustering regions of pixels according to contiguous matching chromaticities.
  • the labelled images calculated in step b) are provided - optionally in combination with the digital representation of the scene - to an object recognition neural network.
  • Suitable object recognition neural networks are convolution neural networks (CNNs) known in the state of the art, such as Inception/GoogLeNet, ResNet-50, ResNet-34, MobileNet v2, VGG-16, MobileNet v2-SSD and YoloV3, V4 or V5.
  • CNNs convolution neural networks
  • Each layer of the CNN is known as a feature map.
  • the feature map of the input layer is a 3D matrix of pixel intensities for different color channels (e.g. RGB).
  • the feature map of any internal layer is an induced multi-channel image, whose ‘pixel’ can be viewed as a specific feature.
  • Every neuron is connected with a small portion of adjacent neurons from the previous layer (receptive field).
  • Different types of transformations can be conducted on feature maps, such as filtering and pooling.
  • Filtering (convolution) operation convolutes a filter matrix (learned weights) with the values of a receptive field of neurons and takes a non-linear function (such as sigmoid, ReLLI) to obtain final responses.
  • Pooling operation such as max pooling, average pooling, L2-pooling and local contrast normalization, summaries the responses of a receptive field into one value to produce more robust feature descriptions.
  • an initial feature hierarchy is constructed, which can be fine-tuned in a supervised manner by adding several fully connected (FC) layers to adapt to different visual tasks. According to the tasks involved, the final layer with different activation function is added to get a specific conditional probability for each output neuron. And the whole network can be optimized on an objective function (e.g. mean squared error or cross-entropy loss) via the stochastic gradient descent (SGD) method.
  • SGD stochastic gradient descent
  • a typical CNN performing object recognition has totally 13 convolutional (conv) layers, 3 fully connected layers, 3 max-pooling layers and a softmax classification layer.
  • the conv feature maps are produced by convoluting 3*3 filter windows, and feature map resolutions are reduced with 2 stride max-pooling layers.
  • the object recognition neural network is a deep convolutional neural network comprising a plurality of convolutional neural network layers followed by one or more fully connected neural network layers.
  • the deep convolutional neural network may comprise a pooling layer after each convolutional layer or after a plurality of convolutional layers and/or may comprise a non-linear layer after each convolutional layer, in particular between the convolutional layer and the pooling layer.
  • step d) of the inventive training method the provided object recognition neural network is trained using the provided calculated labelled images of the scene and optionally the provided digital representation of the scene as input to recognize each labelled object in the inputted calculated labelled images.
  • training the object recognition neural network includes verifying the accuracy of the object recognition neural network by providing images of a scene comprising known objects, comparing the produced output values with expected output values, and modifying the object recognition neural network using a back-propagation algorithm in case the received output values do not correspond to the known objects.
  • Modifying the object recognition neural network using a back-propagation algorithm can be performed, for example, as described in Ian G. et. al, “Deep Learning”, Chapter 6.5 “Back-Propagation and Other Differentiation Algorithms”, MIT Press, 2016, pages 200 to 220 (ISBN: 9780262035613).
  • a trained object recognition neural network is provided via a communication interface to the computer processor.
  • the provided object recognition neural network has been trained according to the training method described previously, i.e. by automatic labelling of data generated by a luminescence object recognition system.
  • the computer processor may be located on a remote computing device or may be located in a cloud environment.
  • the trained object recognition neural network may be stored on a data storage medium, such as an internal medium of the remote computing device or in a cloud environment and may be accessed by the computer processor via a communication interface. Storage of the trained object recognition neural network in a cloud environment allows to access the latest version of the trained object recognition neural network without performing any firmware updates.
  • data of the scene is provided via a communication interface to the computer processor.
  • data of the scene includes image(s) of the scene. Images of the scene may be acquired, for example, by use of a camera, such as a commercially available video camera.
  • data of the scene includes data on object specific reflectance and/or luminescence properties of at least one object having these properties and being present in the scene. Such data can be acquired, for example, by use of a luminescent object recognition system comprising a light source and a sensor unit as described previously.
  • data of the scene includes image(s) of the scene as well as data on object specific reflectance and/or luminescence properties of at least one object being present in the scene.
  • step (C) digital representations of pre-defined objects and/or a digital representation of the scene is provided to the computer processor via a communication interface.
  • the digital representation of the scene contains the previously described data and may improve the recognition accuracy of the object recognition because its use allows to identify an object in case of ambiguous object identification based on the provided data of the scene.
  • step (D) at least one object being present in the scene is determined with the computer processor based on the provided trained object recognition neural network, the provided data of the scene and optionally the provided digital representation of the scene.
  • step d) includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene, the provided digital representations of pre-defined objects and optionally the provided digital representation of the scene, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores using the provided object recognition neural network and optionally the provided digital representation of the scene.
  • the object identification hypothesis about the object(s) to be recognized in the scene may include best matching luminescence and/or reflectance properties. This best matching luminescence and/or reflectance properties can be determined as previously described from the provided data on object specific luminescence and/or reflectance properties or from further object specific luminescence properties determined from said data as described previously.
  • the object identification hypothesis about the object(s) to be recognized in the scene may include best matching objects obtained using the best matching luminescence and/or reflectance properties. This aspect is preferred if the trained object recognition neural network is used in combination with a luminescence object recognition system and allows to perform object recognition using information devoid of shape (i.e.
  • the data on object specific luminescence and/or reflectance properties in combination with information on the shape (i.e. images of the scene) to improve the recognition accuracy in case of ambiguous object recognition when only using information devoid of shape or information on the shape.
  • the recognition accuracy can be further boosted by using information on the scene to remove any remaining object recognition ambiguity.
  • step (D) includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene, the provided object recognition neural network and optionally the provided digital representation of the scene, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores using the provided data of the scene, the provided digital representations of pre-defined objects and optionally the provided digital representation of the scene.
  • the object identification hypothesis about the object(s) to be recognized in the scene may be the object(s) identified using the trained object recognition neural network. Refining the set of object identification hypotheses about the object(s) to be recognized includes determining best matching luminescence and/or reflectance properties as previously described, obtaining the objects associated with the best matching luminescence and/or reflectance properties and comparing the obtained objects to the determined object identification hypothesis as previously described to identify the object(s) present in the scene.
  • step (D) includes inputting the provided data of the scene, in particular the provided image(s) of the scene, and optionally the provided digital representation of the scene into the provided object recognition neural network.
  • This aspect may be preferred if the trained object recognition neural network is used without the luminescence object recognition system, for example if the use of such a luminescence object recognition system is not preferred or not possible.
  • the at least one identified object is provided via a communication interface and/or at least one action associated with the identified object(s) is triggered.
  • the identified object may be provided by displaying the identified object(s) on the screen of a display device as previously described.
  • the at least one action with may be triggered may be an action as previously described.
  • Suitable sensor units include commercially available video cameras as well as cameras described in relation to the inventive system for recognizing at least one object having object specific luminescence properties in a scene.
  • the system may further comprise at least one light source, for example at least one light source previously described.
  • the system may further comprise at least one control unit which is configured to synchronize the light source and the sensor unit as previously described.
  • Suitable data storage media and processing units include the ones previously described. Determination of the at least one object in the scene with the processing unit is performed as described previously in combination with the inventive method for recognizing at least one object in the scene.
  • the system comprising the neural network trained according to the inventive training method is used in a different scene than the scene used to generate the training data set with the proviso that the same objects to be detected are expected to occur in the different scene than in the scene used to generate the training data set.
  • the training data set is preferably generated as described in relation to the inventive training method.
  • Fig. 1 is a block diagram of a computer-implemented method for recognizing at least one object having object specific luminescence properties in a scene according to the invention described herein;
  • Fig. 2a is a first example of a system for recognizing at least one object having object specific luminescence properties in a scene according to the invention described herein;
  • Fig. 2b is a second example of a system for recognizing at least one object having object specific luminescence properties in a scene according to the invention described herein;
  • Fig. 3 is an example of a method for training an object recognition neural network using data on object specific luminescence and optionally reflectance properties of at least one object being present in the scene according to the invention described herein;
  • Fig. 4 is an example of a computer-implemented method for recognizing at least one object in a scene using a trained object recognition neural network according to the invention described herein;
  • Fig. 5 is an example of a system for training an object recognition neural network according to the method of FIG. 3 or for recognizing at least one object in a scene using a trained object recognition neural network in combination with a luminescent object recognition system according to the invention described herein.
  • Fig. 6 is an example of a system for recognizing at least one object in a scene using a trained object recognition neural network according to the invention described herein.
  • FIG. 1 depicts a non-limiting embodiment of a method 100 for recognizing at least one object having object specific luminescence and/or reflectance properties in a scene.
  • the object to be recognized is imparted with luminescence by use of a fluorescent coating on the surface of the object and the scene is located indoors.
  • the scene may be located outdoors.
  • a display device is used to display the determined objects on the screen, in particular via a GUI.
  • Suitable luminescence object recognition systems which can be used to perform method 100 are described, for example, in relation to Figures 2a and 2b, in unpublished patent application US 63/139,299 and in published patent applications US 2020/279383 A1 , WO 2020/245443 A2, WO 2020/245442 A1 , WO 2020/245441 A1 , WO 2020/245439 A1 and WO 2020/245444 A1 .
  • routine 101 determines whether ambient light compensation (ALC) is to be performed, i.e. whether the amount of ambient light in the scene is above a defined threshold. This will normally be the case if method 100 is performed in a scene light by ambient light sources, such as sunlight, other natural lightning or artificial ambient lighting. In contrast, no ALC will be required if the object recognition is performed in the absence of ambient light, for example in a dark environment, or if detection of fluorescence is performed in filtered regions, such as described in published patent applications WO 2020/245443 A2 and WO 2020/245442 A1. In case routine 101 determines that ambient light compensation (ALC) is to be performed, routine 101 proceeds to block 104, otherwise routine 101 proceeds to block 116.
  • ALC ambient light compensation
  • routine 101 determines whether flicker of the ambient light requires the flickering to be compensated or not. Flicker compensation is normally necessary if object recognition is performed indoors and separation of luminesced and reflected light upon illumination of the scene with the light source is achieved physically, for example by use of a filter before each sensor of the sensor unit, such as described in US application number 63/139,299.
  • routine 101 proceeds to block 106, otherwise routine 101 proceeds to block 116 described later on, for example if separation of luminesced and reflected light is achieved computationally, such as described in published patent applications US 2020/279383 A1 and WO 2020/245444 A1 , or if the ambient light present in the scene is exclusively resulting from sunlight or other natural light sources.
  • routine 101 determines whether the flicker compensation is to be performed using phase-locking (i.e. setting the switch on of each sensor to a pre-defined time point) or is to be performed using a multiple of the flicker cycle. This determination may be made according to the programming of the processor implementing routine 101. In one example, a pre-defined programming is used, for example if the illumination setup of the scene is known prior to installation of the luminescence object recognition system. In another example, the processor determines the configuration and type of illuminants present in the scene, for example by connecting the illuminants via Bluetooth to the processor such that the processor is able to retrieve their configuration and type. In case routine 101 determines in block 106 that phaselocking is to be performed, it proceeds to block 108, otherwise it proceeds to block 112.
  • routine 101 determines in block 106 that phaselocking is to be performed, it proceeds to block 108, otherwise it proceeds to block 112.
  • routine 101 determines and sets the phase-lock for each sensor of the sensor unit. This may be accomplished by determining the light variation or the line voltage fluctuation present in the scene using commonly known methods. Normally, the flicker cycle of commonly used illuminants depends on the utility frequency present at the scene. If a 60 Hz utility frequency is used, the frequency of the flicker cycle will be 120 Hz. If a 50 Hz utility frequency is used, the flicker cycle will be 100 Hz. In one example, phase lock is performed relative to the light variation or relative to the line voltage fluctuation.
  • routine 101 proceeds to block 110.
  • routine 101 determines and sets the acquisition duration for each sensor and the illumination duration for each illuminant.
  • the acquisition and illumination durations may be determined as previously described, for example by using the method described in unpublished US application 63/139,299.
  • the setting may be performed according to pre- defined values which may be provided to routine 101 from an internal storage or a database. In case the method is repeated, the determination may be made based on previously acquired sensor data and object recognition accuracy. In case two sensors are used, each illuminant may be switched on when each sensor is switched on. If each sensor is switched on sequentially, then each illuminant may be switched on twice during each lightning cycle.
  • the illumination duration is set to achieve a reasonable measurement within the range of the respective sensor, while leaving room for effect of the additional ambient lighting.
  • a shorter illumination duration for the sensor measuring reflectance + luminescence is needed as compared to the sensor measuring luminescence only, as the measurement for the reflectance + luminescence contains the reflected light from the illuminator(s), and reflection is typically much stronger than luminescence.
  • the illumination duration for each switch-on may vary.
  • routine 101 determines and sets fixed acquisition durations for each sensor.
  • the acquisition durations may be determined as previously described, for example by using the method described in unpublished US application 63/139,299.
  • the fixed acquisition durations may be adapted to the flicker cycle present in the scene. For a 60 Hz utility frequency having a flicker of 120 Hz, acquisition durations of 1/60, 2/60, 3/60 and 4/60 of a second may be used. For a 50 Hz utility frequency having a flicker of 100 Hz, acquisition durations of 1/50, 2/50, 3/50 and 4/50 of a second may be used.
  • the defined acquisition durations may either be preprogrammed or may be retrieved by routine 101.
  • Retrieving the defined acquisition durations may include determining the utility frequency used in the scene, the type of sensors present in the sensor unit and the type of illuminants of the light source and retrieving the defined acquisition durations associated with the determined utility frequency, the determined type of sensors and the determined type of illuminants from a storage medium, such as the internal storage or a database.
  • routine 101 determines and sets the defined acquisition time points to switch on each sensor and the illumination duration for each illuminant. This determination may be made as previously described in relation to block 110.
  • routine 101 determines and sets the sequence of each illuminant and each sensor (i.e. in which order each illuminant and each sensor is switched on and off). Routine 101 may determine the sequence based on pre-defined criteria, such a specific order based on the wavelength of the illuminants or it may arbitrarily select the order. Based on the order of the illuminates, routine 101 may either determine the order of each sensor or may use a pre-defined order, for example sequential order of the sensors.
  • routine 101 instructs the light source to illuminate the scene with the illuminants and to acquire data on object specific luminescence and/or reflectance properties according to the settings determined in blocks 108, 110 and 116 or 112, 114, 116 or in block 116 (in case ALC is not required). In one example, this is performed by providing the settings determined in blocks 108, 110 and 116 or 112, 114, 116 or in block 116 to a control unit connected with the sensor unit and the light source. In another example, the light source and the sensor unit are controlled directly using the processor implanting routine 101.
  • the acquired data may be stored on an internal memory of the sensor unit or may be stored in a database which is connected to the sensor unit via a communication interface.
  • routine 101 determines whether further processing of the acquired data, for example delta calculation and/or identification of luminescence regions and/or conversion of the data resulting from the delta calculation and/or determination of luminescence/reflectance patterns is to be performed. If this is the case, routine 101 proceeds to block 122, otherwise routine 101 proceeds to block 126 described later on. The determination may be made based on the programming and may depend, for example, on the data contained in the digital representations of pre-defined objects used to determine the object(s) present in the scene and/or on the conditions present upon acquisition of data of the scene (i.e. if ALC is required or not) and/or on the specific hardware configuration of the luminescence object recognition system (i.e. if separation of luminesced and reflected light upon illumination of the scene with the light source is achieved computationally or physically).
  • routine 101 determines whether the further processing is to be performed remotely, i.e. with a further processing device being present separately from the processor implementing routine 101. This may be preferred if the further processing requires a large computing power. If routine 101 determines in block 122 that the further processing is to be done remotely, it proceeds to block 140 described later on, otherwise it proceeds to block 124.
  • routine 101 determines further luminescence and/or reflectance properties as previously described by determining differential data (i.e. performing the delta-calculation previously described) and optionally converting the differential data and/or by identifying luminescence regions in the acquired or differential data and/or by determining the luminescence and/or reflectance spectral patterns from the acquired or differential data.
  • the processed data may be stored on a data storage medium, such as the internal storage or a database prior to performing the blocks described in the following.
  • routine 101 determines whether to perform a flicker analysis or flicker measurement. If this is the case, routine 101 proceeds to block 150, otherwise it proceeds to block 128.
  • routine 101 retrieves digital representations of pre-defined objects and the digital representation of the scene from at least one data storage medium, such as a database.
  • the digital representation of pre-defined objects and the digital representations of the scene are stored on different data storage media.
  • the digital representation of pre-defined objects and the digital representations of the scene are stored on the same data storage medium.
  • the data storage medium/media is/are connected to the processor implementing routine 101 via a communication interface.
  • the digital representation of the scene may be retrieved by routine 101 using a scene identifier which is indicative of the scene for which data is acquired in block 118.
  • the scene identifier may be provided to routine 101 upon installation of the luminescence object recognition system and is preferably a unique identifier of the scene.
  • the scene identifier can be a name, a number or a combination thereof.
  • routine 101 determines a set of object identification hypothesis based on
  • Routine 101 may choose the respective option according to its programming.
  • the object identification hypotheses are corresponding to the best matching luminescence and/or reflectance properties and the associated confidence score is corresponding to the degree of matching obtained during calculation of the best matching reflectance and/or luminescence properties.
  • the object identification hypotheses are corresponding to the objects interrelated with the best matching luminescence and/or reflectance properties and the associated confidence score is corresponding to the degree of matching obtained during calculation of the best matching reflectance and/or luminescence properties.
  • the best matching reflectance and/or luminescence properties may be calculated by applying any number of matching algorithms as previously described or by using a data driven model of light spectral distribution and intensity on the at least one object to be recognized to calculate the object specific reflectance and/or luminescence properties and applying any number of matching algorithms as previously described.
  • the object(s) assigned to the best matching reflectance and/or luminescence properties are obtained by retrieving the object(s) associated with the best matching reflectance and/or luminescence properties from the provided digital representations of the pre-defined objects. This may be preferred if the digital representations of pre-defined objects contain reflectance and/or luminescence properties interrelated with the respectively assigned object.
  • the object(s) assigned to the best matching reflectance and/or luminescence properties are obtained by searching a database for said object(s) based on the determined best matching reflectance and/or luminescence properties.
  • the further database may be connected to the computer processor via a communication interface.
  • the object identification hypotheses and the associated confidence score is corresponding to the likelihood of the presence of the object(s) in the scene.
  • the object identification hypotheses may be determined as previously described by determining the likelihood of the presence of object(s) in the scene based on the provided digital representation of the scene.
  • routine 101 After routine 101 has determined a set of object identification hypothesis in block 130, routine 101 proceeds to block 132 and refines the set of object identification hypothesis determined in block 130. Depending on the option used in block 130 to determine the set of object identification hypothesis, routine 101 is programmed to perform the refinement of the determined object based on: the digital representation of the scene retrieved in block 128 in case the set of object identification hypothesis has been determined according to option A previously described or the data acquired in block 118 or the further object specific luminescence and/or reflectance properties determined in block 124/142 and the digital representations of predefined objects retrieved in block 128 in case the set of object identification hypothesis has been determined according to option B previously described.
  • Refining the set of object identification hypothesis determined in block 130 for option A may include determining - based on the digital representation of the scene - confidence score(s) for the determined set of object identification hypothesis and using the determined confidence scores to refine the confidence scores associated with the set of object identification hypotheses determined in block 130 (option A) to identify the at least one object.
  • Determining the confidence score(s) for the set of determined object identification hypotheses based on the provided digital representation of the scene may include determining the likelihood of the presence of objects associated with the determined object identification hypotheses in the scene based on the data contained in the provided digital representation of the scene and associating higher confidence score(s) to objects having a higher probability to be present in the scene based on the data contained in the digital representation of the scene.
  • the object can be determined from the two determined confidence scores by a number of different algorithms as previously described,
  • Refining the set of object identification hypothesis determined in block 130 for option B may include revising at least certain of the confidence scores associated with the object identification hypotheses determined in block 130 based on the provided data of the scene and/or the determined further reflectance and/or luminescence properties and the retrieved digital representation of pre-defined objects to identify the at least one object. This may include determining the confidence scores associated with object(s) present in the scene by calculating the best matching reflectance and/or luminescence properties as previously described and using the determined confidence score(s) to refine the confidence score(s) associated with the object identification hypothesis determined in block 130 to identify the at least one object as previously described.
  • Refinement of the confidence score obtained using the digital representation of the scene with the confidence score(s) obtained by calculation the best matching reflectance and/or luminescence properties may be performed as previously described.
  • the refinement step allows to increase the object recognition accuracy in case of ambiguous object identification based on the object specific luminescence and/or reflectance properties, thus boosting the overall object recognition accuracy of the inventive method.
  • routine 101 provides the determined object(s) to a display device.
  • the display device is connected via a communication interface to the processor implementing routine 101.
  • the processor may provide further data associated with the determined object(s) for display on the screen, such as further data contained in the retrieved digital representation or further data retrieved from a database based on the determined object(s).
  • Routine 101 may then proceed to block 102 or block 106 or block 118 and repeat the object recognition process according to its programming. Monitoring intervals of the scene may be pre-defined based on the situation used for object recognition or may be triggered by pre-defined events, such as entering or leaving of the room.
  • the display device displays the data received from the processor in block 134 on the screen, in particular within a GUI.
  • routine 101 determines actions associated with the determined objects and may display these determined actions to the user on the screen of the display device, this step being generally optional.
  • the determined actions may be pre-defined actions as previously described. In one example, the determined actions may be performed automatically by routine 101 without user interaction. However, routine 101 may provide information about the status of the initiated action to the user on the screen of the display device. In another example, a user interaction is required after displaying the determined actions on the screen of the display device prior to initiating any action by routine 101 as previously described.
  • Routine 101 may be programmed to control the initiated actions and to inform the user on the status of the initiated actions. After the end of block 138, routine 101 may return to block 102, 106 or 118 as previously described or may end method 100.
  • routine 101 may return to block 102 or 106 in case of low determined confidence scores to allow reexamination of the system settings and adjustment of the system settings to improve the determined confidence scores. In case of higher confidence scores, routine 101 may return to block 118 since the settings of the system seem to be appropriate for the monitored scene. Routine 101 may be programmed to return to block 102/106 or block 118 based on a threshold value for the determined confidence scores.
  • Block 140 of method 100 is performed in case routine 101 determines in block 122 that further processing of the data acquired in block 118 is performed remotely, i.e. by a processor being different from the processor implementing routine 101.
  • routine 101 provides the data acquired in block 118 to the further processing device which is connected with the processor implementing routine 101 via a communication interface.
  • the further processing device may be a stationary processing device or may be located in a cloud environment and may implement method 100 using routine 10T.
  • routine 10T determines further object specific luminescence and/or reflectance properties as described in relation to block 124.
  • routine 10T determines whether a flicker analysis is to be performed as described in relation to block 126. If yes, routine 10T proceeds to block 150 described later on. Otherwise, routine 10T proceeds to block 146.
  • routine 10T determines whether the object is to be determined with the further processor implementing routine 10T. If yes, routine 10T proceeds to block 128 and performs blocks 128 to 138 as described previously. Otherwise, routine 10T proceeds to block 148.
  • routine 10T provides the further object specific luminescence and/or reflectance properties determined in block 142 to the computer processor implementing routine 101 , i.e. the computer processor performing steps 102 to 122 previously described. Afterwards, method 100 proceeds with block 128 as previously described.
  • routine 101 or routine 10T determines the effectiveness of flicker mitigation for example, by comparing background images acquired at different measurement times or determining whether background images are brighter than the raw images obtained upon illumination of the scene with the light source and measuring the resulting fluorescence (i.e. the raw data before the delta calculation is performed). If the background images are brighter than the raw images, it is likely that the flicker mitigation timing is not appropriate, and that the background images are acquired at a brighter period of the flicker cycle than the raw images.
  • routine 101 or routine 101’ determines whether the flicker mitigation is satisfactory, for example by determining the ambient flicker contribution in the images and comparing the determined ambient flicker contribution to a pre-defined threshold value stored on a data storage medium. If the mitigation is satisfactory, routine 101 or 10T proceeds to block 128, otherwise routine 101 or 10T proceeds to block 154.
  • routine 101 or routine 10T determines new phase-locking or multiples of the flicker cycle based on the results of block 150. The new phase-locking or multiples of the flicker cycle are then used upon repeating blocks 108 or 112.
  • FIG. 2a illustrates a non-limiting embodiment of a system 200 for recognizing at least one object having object specific luminescence and/or reflectance properties in a scene in accordance with a first embodiment of the invention which may be used to implement method 100 described in relation to FIG. 1 .
  • system 200 monitors scene 202 comprising a waste bin as well as waste 204 (i.e. the object to be recognized) being thrown into the waste bin by a person.
  • the waste bin may be located indoors, for example in a kitchen, bathroom etc., or may be located outdoors, for example in a public place, like a park.
  • the information on the recognized object i.e. the object being discarded
  • System 200 is a luminescent object recognition system comprising a light source 206, a sensor unit 208, a control unit 210, a processing unit 216 and databases 218, 220.
  • the light source 206 and sensor unit 208 are each connected via communication interfaces 222, 224 to the control unit 208.
  • the light source 206 comprises 2 illuminants, such as LEDs, fluorescent illuminants, incandescent illuminants or a combination thereof.
  • the light source 206 comprises only 1 illuminant or more than 2 illuminants.
  • the sensor unit 208 comprises one sensor, such as a camera. In another example, the sensor unit 208 comprises at least 2 sensors.
  • the light source 206 and/or the sensor unit 208 may comprise filters (not shown) in front of each illuminant and/or each sensor. Suitable combinations of light source and sensor unit for luminescent object recognition are, for example, disclosed in unpublished patent application US 63/139,299 and published patent applications US 2020/279383 A1 , WO 2020/245442 A1 , WO 2020/245441 A1 and WO 2020/245444 A1.
  • Control unit 210 is connected to processor 212 of processing unit 216 via communication interface 226 and is configured to control the illuminants of the light source and/or the sensors of the sensor unit by switching on at least one illuminant of the light source and/or at least one sensor of the sensor unit at pre-defined time point(s) for a pre-defined duration.
  • control unit 210 preferably synchronizes the switching of the illuminants of light source 206 and the sensors of sensor unit 208 as previously described, for example as described in relation to the ALC mentioned with respect to FIG. 1.
  • Control unit 210 may receive instructions concerning the synchronization from processor 212.
  • control units include Digilent Digital Discovery controllers or microcontrollers.
  • control unit 210 is present separately from processing unit 216.
  • processing unit 216 comprises control unit 210.
  • processor 212 of processing unit 216 is used to control the illuminants of light source 206 and the sensors of sensor unit 208 and a control unit is not present.
  • the processing unit 216 houses computer processor 212 and internal memory 214 and is connected via communication interfaces 226, 228, 230 to the control unit 210 and databases 218 and 220.
  • the processing unit 216 is part of the luminescent object recognition system 200.
  • the processing unit 216 may be located on a cloud environment and system 200 may transfer the acquired data to said processing unit for further processing. This may reduce the costs of system 200 but requires cloud access to perform object recognition and may be preferable if a large amount of computing power is required to determine the objects in the scene using the processing unit.
  • the processor 212 is configured to execute instructions, for example retrieved from memory 214, and to carry out operations associated with the system 200, namely o determine the at least one object in the scene based on
  • the processor 212 can be a single-chip processor or can be implemented with multiple components. In most cases, the processor 212 together with an operating system operates to execute computer code and produce and use data. In this example, the computer code and data resides within memory 214 that is operatively coupled to the processor 212. Memory 214 generally provides a place to hold data that is being used by the system 200. By way of example, memory 214 may include Read-Only Memory (ROM), Random-Access Memory (RAM), hard disk drive and/or the like. In another example, computer code and data could also reside on a removable storage medium and loaded or installed onto the computer system when needed. Removable storage mediums include, for example, CD-ROM, PC- CARD, floppy disk, magnetic tape, and a network component.
  • Database 218 comprises digital representations of pre-defined objects and is connected via communication interface 228 to processing unit 216.
  • the digital representations of pre-defined objects stored in database 218 are used by processor 212 of processing unit 216 to determine a set of object identification hypotheses by calculating best matching luminescence and/or reflectance properties based on the retrieved digital representations and the provided data of the scene or the processed data of the scene.
  • the digital representations of pre-defined objects stored in database 218 are used by processor 212 of processing unit 216 to refine the determined set of object identification hypotheses which have been determined using the digital representation of the scene.
  • Database 220 comprises digital representations of the scene and is connected via communication interface 230 to processing unit 216.
  • the digital representations of the scene stored in database 220 are used by processor 212 of processing unit 216 to determine a set of object identification hypotheses based on the retrieved digital representations.
  • the digital representations of the scene stored in database 220 are used by processor 212 of processing unit 216 to refine the set of object identification hypotheses which have been determined using the digital representations of predefined objects.
  • system 200 further comprises a database (not shown) containing actions associated with pre-defined objects which is connected to the processing unit 216 via a communication interface.
  • a database (not shown) containing actions associated with pre-defined objects which is connected to the processing unit 216 via a communication interface.
  • This allows to trigger an action, for example re-ordering consumed items, upon detection of an object in the scene (for example upon detection of an object being thrown into the waste bin).
  • the action may be retrieved from the database by the processor 212 and may be triggered automatically and/or may be shown on the screen of a display device (for example, if user interaction is required prior to triggering the action).
  • system 200 further comprises a display device having a screen (not shown) and being connected to processing unit 216 via a communication interface.
  • the display device displays the at least one object determined and provided by processing device 216, in particular via a graphical user interface (GUI), to the user.
  • GUI graphical user interface
  • the display device may be a stationary display device, such as a peripheral monitor, or a portable display device, such as a smartphone, tablet, laptop etc..
  • the screen of the display device may be a monochrome display, color graphics adapter (CGA) display, enhanced graphics adapter (EGA) display, variable-graphics-array (VGA) display, Super GA display, liquid crystal display (e.g., active matrix, passive matrix and the like), cathode ray tube (CRT), plasma displays and the like.
  • system 200 may not comprise a display device.
  • the recognized objects may be stored in a database (not shown) or used as input data for a further processing unit (not shown).
  • FIG. 2b illustrates a non-limiting embodiment a system 201 for recognizing at least one object having object specific luminescence and/or reflectance properties in scene 202’ in accordance with a second embodiment of the invention and may be used to implement method 100 described in relation to FIG. 1.
  • the luminescence object recognition system 201 monitors scene 202’ comprising a waste bin as well as waste 204’ (i.e. the object to be recognized) being thrown into the waste bin by a person.
  • the waste bin may be located indoors, for example in a kitchen, bathroom etc., or may be located outdoors, for example in a public place, like a park.
  • the information on the recognized object i.e. the object being discarded
  • System 201 comprises a similar luminescent object recognition system as described in relation to FIG. 2a. With respect to FIG. 2a previously described, system 201 comprises a further processing unit 234’. This allows for shifting tasks requiring high amounts of computing power to a further processing unit not being part of the luminescent object recognition system, thus reducing the costs and energy consumption of the luminescent object recognition system.
  • the processor 212’ of the first processing unit 216’ is configured to execute instructions, for example retrieved from memory 214’, and to carry out operations associated with the system 201 , namely to o determine a set of object identification hypotheses about the object(s) to be recognized in the scene based on the ⁇ the provided data of the scene and the provided digital representations of pre-defined objects or
  • the provided digital representation of the scene and o provide via the communication interface the determined set of object identification hypotheses and optionally data of the scene to the second processing unit.
  • processor 212’ and memory 214’ have been described in relation to FIG. 2a.
  • the digital representations of pre-defined objects or the digital representations of the scene are stored in database 218’ connected via communication interface 226’ to processor 212’ of the first processing unit 216’.
  • the digital representations stored in this database are retrieved by processor 212’ upon determination of a set of object identification hypotheses about the object(s) to be recognized in the scene as previously described.
  • the first processing unit 216’ is connected via communication interface 228’ to the second processing unit 234’ to provide the determined set of object identification hypotheses and optionally data of the scene acquired by the luminescent object recognition system 240’ to the second processing unit 234’.
  • Communication interface 228’ may be a gateway as described in relation to FIG. 5 below and/or the first processing unit 216’ or the second processing unit 234’ may contain a gateway functionality as described in relation to FIG. 5 below.
  • Data of the scene provided to the second processing unit 234’ may have been processed by the first processing unit 216’, for example by determining further object specific luminescence and/or reflectance properties from the acquired data as previously described.
  • the second processing unit 234’ may be located on a stationary local processing device or in a cloud environment.
  • the processor 230’ of the second processing unit 234’ is configured to execute instructions, for example retrieved from memory 232’, and to carry out operations associated with the system 201 , namely to o refine the received set of object identification hypothesis to identify at least one object by revising at least certain of said associated confidence scores based on the provided digital representation of the scene or based on the provided data of the scene and the provided digital representations of pre-defined objects, and o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
  • the digital representations of pre-defined objects or the digital representations of the scene are stored in database 236’ connected via communication interface 238’ to processor 230’ of the second processing unit 234’.
  • the digital representations stored in this database are retrieved by processor 230’ upon refining the set of object identification hypotheses received from the first processing unit 216’ as previously described.
  • System 201 may further comprise a database having stored therein actions interrelated with pre-defined objects and/or a display device having a screen (not shown) as described in relation to FIG. 2a.
  • FIG. 3 depicts a non-limiting embodiment of a method 300 for training an object recognition neural network using data on object specific luminescence and optionally reflectance properties of at least one object being present in the scene.
  • the data on object specific luminescence and optionally reflectance properties can be acquired and processed using system 200 or 201 described in relation to FIGs. 2a and 2b or the luminescence object recognition systems disclosed in unpublished patent application US 63/139,299 and published patent applications US 2020/279383 A1 , WO 2020/245442 A1 , WO 2020/245441 A1 and WO 2020/245444 A1.
  • routine 301 retrieves data of a scene via a communication interface, said data including image(s) of the scene as well as data on object specific luminescence and optionally reflectance properties of at least one object having said properties and being present in the scene.
  • Data on object specific luminescence and optionally reflectance properties can be acquired using system 200 or 201 described in relation to FIGs. 2a and 2b or the luminescence object recognition systems disclosed in unpublished patent application US 63/139,299 and published patent applications US 2020/279383 A1 , WO 2020/245442 A1 , WO 2020/245441 A1 and WO 2020/245444 A1 .
  • data of the scene is retrieved by routine 301 via a communication interface from the luminescence object recognition system. This allows to provide the data to the computer processor implementing routine 301 directly after data acquisition.
  • data of the scene is retrieved by the computer processor implementing routine 301 from a data storage medium, such as a database or an internal memory (for example located within the luminescence object recognition system), which is connected via a communication interface to the computer processor implementing routine 301. This may be preferred if the acquired data is stored prior to performing method 300.
  • data of the scene acquired by the sensor unit of the luminescence object recognition system is retrieved by the processor of the object recognition system implementing routine 301.
  • routine 301 determines whether differential data has to be generated (i.e. the delta calculation previously mentioned has to be performed). Performing the delta calculation is necessary if separation of the luminesced and reflected light acquired upon illumination of the scene with the light source of the object recognition system is achieved physically, for example by the use of specific filters in front of the illuminants and/or sensors. Routine 301 may be programmed to perform this determination based on data provided from the luminescence object recognition system, for example if flicker compensation (as described in FIG. 1) has been performed or not. If routine 301 determines that differential data has to be generated, it proceeds to block 306. Otherwise, it proceeds to block 308 described below.
  • routine 301 generates differential data from the provided data of the scene (i.e. performs the delta calculation) as described previously.
  • routine 301 retrieves digital representations of pre-defined objects and optionally digital representations of the scene via a communication interface from a data storage medium, such as a database as described in relation to FIG. 1.
  • Routine 301 may retrieve the digital representation of the scene based on a scene identifier which may be contained in the data of the scene received from the luminescence object recognition system.
  • routine 301 determines regions of luminescence, in particular regions having similar luminescence, in the data of the scene retrieved in block 302 or the differential data generated in block 306 by classifying the pixels associated with the detected regions of luminescence. Determination of regions of luminescence can be performed as previously described in relation to the determination of further object specific luminescence and/or reflectance properties by analysing the brightness of the pixels in the data retrieved in block 302 and classifying the pixels according to their brightness.
  • routine 301 determines the best matching luminescence and optionally reflectance properties for each detected region of luminescence as previously described (see for example block 130 of FIG. 1). In one example, this may include determining luminescence spectral patterns and/or reflective spectral patterns for the determined regions of luminescence prior to determining the best matching luminescence and optionally reflectance properties.
  • routine 301 obtains the object(s) assigned to the best matching luminescence and optionally reflectance properties determined in block 312. Obtaining the object(s) may be performed by retrieving the object(s) from the digital representations of pre-defined objects retrieved in block 308 or from a further database as previously described.
  • routine 301 refines the object(s) obtained in block 314 based on the digital representation of the scene retrieved in block 308, this block being generally optional. Refinement of the list of object(s) obtained after block 314 can be performed as described previously in relation to FIG. 1 , block 132 (option A) and can increase the recognition accuracy in case of ambiguous identification of objects based on the luminescence and optionally reflectance properties.
  • routine 301 annotates each classified pixel of each image with an object specific label based on the object(s) obtained in block 314 or refined in block 316 and the associated regions of luminescence determined in block 310.
  • routine 301 determines whether to create labelled images using bounding boxes, this step generally being optional. This determination may be made in accordance with the programming of routine 301.
  • the processor implementing routine 301 and performing blocks 320 to 324 may be the same processor which performs blocks 304 to 318 or may be different from this processor.
  • routine 301 proceeds to block 322. Otherwise, routine 301 proceeds to block 324 described later on.
  • the annotated images generated in block 318 are provided to the further processor via a communication interface prior to performing block 322/324.
  • Use of bounding boxes or image segmentation can increase the accuracy of the training data provided to the neural network, thus resulting in a higher accuracy of object detection using the neural network trained with such more accurate training data.
  • routine 301 creates labelled images based on the annotated images created in block 320 using bounding boxes, this step generally being optional. Bounding box creating can be performed as previously described using coordinate information associated with the annotated pixels created in block 320.
  • routine 301 creates labelled images by segmenting the annotated images created in block 320, this step being generally optional. Image segmentation can be performed by clustering regions of pixels as previously described.
  • Suitable object recognition neural networks include convolution neural networks (CNNs) known in the state of the art, such as deep convolutional neural networks comprising a plurality of convolutional neural network layers followed by one or more fully connected neural network layers. These deep convolutional neural networks may comprise a pooling layer after each convolutional layer or after a plurality of convolutional layers and/or may comprise a non-linear layer after each convolutional layer, in particular between the convolutional layer(s) and the pooling layer.
  • CNNs convolution neural networks
  • the labelled images obtained in block 322 or 324 as well as unlabelled images of the scene provided in block 302 and optionally the digital representation of the scene are provided to the neural network selected in block 326.
  • Use of the digital representation of the scene may improve the recognition accuracy in case of ambiguous object identification.
  • the processor performing block 322 or 324 may be connected via a communication interface to the processing unit hosting the neural network, i.e. the processing unit having implemented the neural network selected in block 326.
  • said digital representation is retrieved from a database as previously described in block 308.
  • This block may further include dividing the data provided in this block into a training set, a validation set, and a verification (or “testing”) set prior to providing the data to the neural network in block 330.
  • the training set is used to adjust the neural network via a back- propagation algorithm so that the neural network iteratively “learns” how to correctly recognize objects in the input data.
  • the validation set is primarily used to minimize overfitting. The validation set typically does not adjust the neural network as does the training set, but rather verifies that any increase in accuracy over the training data set yields an increase in accuracy over a data set that has not been applied to the neural network previously, or at least the neural network has not been trained on it yet (i.e. validation data set).
  • the verification set is used for testing the trained neural network in order to confirm the actual predictive power of the trained neural network and is preferably used in block 332 as described later on.
  • approximately 70% of the provided data of the scene (i.e. labelled images resulting from block 322 or 324 and images of the scene) sets are used for model training, 15% are used for model validation, and 15% are used for model verification. These approximate divisions can be altered as necessary to reach the desired result.
  • the size and accuracy of the training data set can be very important to the accuracy of the neural network obtained by method 300.
  • about 40.000 sets of data of the scene may be collected, each set including images of the scene, labelled images of the scene and a digital representation of the scene.
  • the automatic labelling of the collected images of the scene allows to generate training data sets accurately, fast and efficient, thus allowing to obtain sufficiently trained neural networks within a short amount of time.
  • the training data set may include samples throughout a full range of expected objects which may occur in the scene.
  • the neural network provided in block 326 is trained with the training data set provided in block 328 according to methods well known in the state of the art (see for example Michael A. Nielsen, whoNeural Networks and Deep Learning", Determination Press, 2015). Training is an iterative process that modifies the weights and bias gradients of each layer. In one example, the training is performed using the backpropagation technique to modify the layer's weights and biases. With each iteration of training data to adjust the weights and biases, the validation data is run on the neural network and one or more measures of accuracy is determined by comparison of the recognized objects with the objects being present in the training data.
  • the standard deviation and mean error of the output will improve for the validation data with each iteration and then the standard deviation and mean error will start to increase with subsequent iterations.
  • the iteration for which the standard deviation and mean error is minimized is the most accurate set of weights and biases for that neural network model for that training set of data.
  • the results of the neural network trained in block 330 are verified using the verification data set to determine whether the output of the neural network is sufficiently accurate when compared to the objects being present in the scene. If the accuracy is not sufficient, method 300 proceeds to block 336. Otherwise, method 300 ends and the obtained trained neural network can be used to recognize objects in a scene as described in relation to FIGs. 4 to 6 later on.
  • block 336 larger and/or more accurate sets of training data are used to modify the neural network or a different type and/or different dimensions of the neural network are be selected to improve the accuracy using the training data set provided in block 328 and method 300 continues block 330 previously described.
  • the training method 300 allows to automatically label objects being present in the scene based on data generated by luminescence object recognition systems and to use the labelled data to train neural networks commonly used for object recognition. This allows to train neural networks installed in a specific scene, such as a store, with training data acquired from a luminescence object recognition system installed in a further store having a similar layout. Households may have their purchased luminescence object recognition system subsidized or receive other reimbursements in exchange for the labelled data their system generates, which is used for training traditional Al object recognition systems used by similar households (i.e. households being similar in geographical location, income, interests, brand preferences, etc.), or they may opt to keep their data private but have it used to improve other Al object recognition systems used in their household.
  • FIG. 4 depicts a non-limiting embodiment of a method 400 for recognizing at least one object being present in the scene using a trained neural network.
  • the neural network has been trained using method 300 described in relation to FIG. 3.
  • the scene may be located indoors or outdoors.
  • the trained neural network is used in combination with a luminescent object recognition system (for example a system described in relation to FIGs. 2a and 2b).
  • the trained neural network is used without a luminescent object recognition system.
  • routine 401 implementing method 400 retrieves a trained neural network (TNN).
  • the neural network has been trained using automatically labelled training data generated from luminescence object recognition systems (for example by using the training method described in relation to FIG. 3).
  • the trained neural network may be located on a remote server or may be stored on a data storage medium, such as an internal memory of the processing device implementing routine 401. By locating the trained neural network on a remote server or a cloud server, costs of added memory and/or a more complex processor, and associated battery usage in using the neural network to determine the objects present in the scene can be avoided.
  • a remote server may also serve as a central repository storing training and/or collections of operative data sent from various luminescence object recognition systems to be used to train and develop existing neural networks. For example, a growing repository of data can be used to update and improve existing trained neural networks and to provide neural networks for future use.
  • data of the scene is retrieved via a communication interface by routine 401 .
  • Data on the scene can be acquired using a commercially available camera or by using a luminescence object recognition system as previously described (for example as described in relation to FIGs. 2a and 2b).
  • data of the scene includes image(s) of the scene.
  • data of the scene includes data on object specific reflectance and/or luminescence properties of at least one object have these properties and being present in the scene.
  • the provided data may further include identifiers being indicative of the scene or the system used to acquire the data, for example a location, serial number, unique system identifier, etc..
  • the data may be stored on a data storage medium, such as a database or internal memory, prior to retrieving the data by routine 401.
  • routine 401 determines whether the data retrieved in block 404 was acquired with a luminescence object recognition system. This may be determined based on the retrieved data. In case routine 401 determines that the data retrieved in block 404 was not acquired using a luminescence object recognition system, routine 401 proceeds to block 408. Otherwise, routine 401 proceeds to block 412 described later on.
  • a digital representation of the scene is retrieved by routine 401 , this step being generally optional. It may be preferable to perform this step to increase recognition accuracy in case of ambiguous object recognition based on the data retrieved in block 404.
  • the digital representations of the scene may be stored on a data storage medium, such as a database, and may be retrieved by routine 401 via a communication interface, for example by using a scene identifier contained in the data retrieved in block 404.
  • At least one object being present in the scene is determined using the trained neural network retrieved in block 402, the data of the scene retrieved in block 404 and optionally the digital representation of the scene retrieved in block 408. This is performed by providing the data on the scene and optionally the digital representation of the scene to the trained neural network and using the trained neural network to identify the objects being present in the scene.
  • routine 401 determines whether the data from the luminescence object recognition system retrieved in block 404 is to be processed by determining further object specific luminescence and/or reflectance properties as previously described, for example in relation to block 124 of FIG. 1. The determination may be made based on the programming and may depend, for example, on the retrieved data of the scene (e.g. the measurement conditions and/or the hardware configuration of the luminescence object recognition system). If routine 401 determines in block 412 that further processing is to be performed, routine 401 proceeds to block 414, otherwise routine 401 proceeds to block 416 described later on.
  • further object specific luminescence and/or reflectance properties are determined as previously described, for example as described in relation to block 124.
  • the determination may be made with the computer processor performing block 412 or remotely, i.e. with a second computer processor as described in relation to FIG. 1.
  • the data retrieved in block 404 is provided to the second computer processor prior to determining the further object specific luminescence and/or reflectance properties.
  • the determined further properties may be provided to the first computer processor via a communication interface if desired.
  • digital representations of pre-defined objects and optionally a digital representation of the scene are retrieved by routine 401 as described in relation to block 128 of FIG. 1.
  • the computer processor may be the first or the second computer processor previously described.
  • the computer processor determines a set of object identification hypothesis based on
  • routine 401 proceeds to block 420 and refines the set of object identification hypothesis determined in block 418 as described in relation to block 132 of FIG. 1.
  • routine 401 is programmed to perform the refinement of the determined object based on: the digital representation of the scene retrieved in block 416 in case the set of object identification hypothesis has been determined according to option A previously described or the data retrieved in block 404 or the further object specific luminescence and/or reflectance properties determined in block 414 and the digital representations of predefined objects retrieved in block 416 in case the set of object identification hypothesis has been determined according to option B previously described.
  • routine 401 provides the determined object(s) to a display device as described in relation to block 134 of FIG. 1. Routine 401 may then proceed to block 404 and repeat the object recognition process or the method may be ended.
  • routine 401 determines actions associated with the determined objects and may displays these determined actions to the user on the screen of the display device, this step being generally optional.
  • the determined actions may be pre-defined actions as previously described. In one example, the determined actions may be performed automatically by the computer processor without user interaction. However, routine 401 may provide information about the status of the initiated action to the user on the screen of the display device. In another example, a user interaction is required after displaying the determined actions on the screen of the display device prior to initiating any action by the computer processor as previously described.
  • Routine 401 may be programmed to control the initiated actions and to inform the user on the status of the initiated actions. After the end of block 426, routine 401 may return to block 402 or the method may be ended.
  • FIG. 5 illustrates a non-limiting embodiment of a system 500 comprising a luminescent object recognition system 501 and a visual Al recognition system 503.
  • System 500 may be used to implement the training method 300 described in relation to FIG. 3 or the object recognition method 400 described in relation to FIG. 4.
  • system 500 monitors scene 502 comprising a waste bin as well as waste 504 (i.e. the object to be recognized) being thrown into the waste bin by a person.
  • the waste bin may be located indoors, for example in a kitchen, bathroom etc., or may be located outdoors, for example in a public place, like a park.
  • System 500 comprises a similar luminescent object recognition system 501 as described in relation to FIG. 2b.
  • database 516 contains digital representations of pre-defined objects and digital representations of the scene.
  • the digital representations of pre-defined objects and the digital representations of the scene may be stored in different databases.
  • the processing unit 516 of system 501 comprises a processor 512 which is configured to execute instructions, for example retrieved from memory 514, and to carry out operations associated with the system 500.
  • processor 512 is programmed to perform blocks 302 to 324 described in relation to method 300 of FIG. 3, namely to create labelled images based on the data acquired by the sensor unit 508 upon illumination of scene 502 with the light source 506. These labelled images are then used to train neural network 538 which is implemented on processor 530.
  • processor 512 is programmed to perform blocks 404, 406 and 412 to 418 or blocks 404, 406 and 412 to 416 and 420 described in relation to method 400 of FIG. 4, namely to determine a set of object identification hypothesis or refine the set of object identification hypotheses determined by the processing unit 534 of the visual Al system 503.
  • the set of object identification hypothesis may be determined as described previously in relation to block 418 of FIG. 4 by processor 512 based on
  • the determined set of object identification hypothesis is then provided via communication interface 528 to processor 530 of the visual Al system 503 for refinement.
  • Refinement of the set of object identification hypotheses provided by visual Al system 503 via communication interface 528 may be performed by processor 512 as described previously in relation to block 420 of FIG. 4 based on
  • the object(s) resulting from the refinement process may then be provided to a display device to display the determined object(s) on the screen of said display device.
  • the visual Al recognition system 503 comprises a processing unit 534 which is connected to the processing unit 516 of the luminescent object recognition system 501 via communication interface 528 to allow data exchange between system 501 and 503.
  • a processing unit 534 which is connected to the processing unit 516 of the luminescent object recognition system 501 via communication interface 528 to allow data exchange between system 501 and 503.
  • the communication interface 528 represents a gateway.
  • the luminescence object recognition system 501 is coupled directly to visual Al recognition system 503.
  • the 501 or 503 may be configured with any of the gateway functionality and components described herein and treated like a gateway by system 503 or 501 , at least in some respects.
  • Each gateway may be configured to implement any of the network communication technologies described herein so the gateway may remotely communicate with, monitor, and manage system 501 or 503.
  • Each gateway may be configured with one or more capabilities of a gateway and/or controller as known in the state of the art and may be any of a plurality of types of devices configured to perform the gateway functions defined herein.
  • each gateway may include a Trusted Platform Module (TPM) (for example in a hardware layer of a controller).
  • TPM Trusted Platform Module
  • the TPM may be used, for example, to encrypt portions of communications from/to system 501/503 to/from gateways, to encrypt portions of such information received at a gateway unencrypted, or to provide secure communications between system 501 , gateway 528 and system 503.
  • TPM Trusted Platform Module
  • TPMs or other components of the system 500 may be configured to implement T ransport Layer Security (TLS) for HTTPS communications and/or Datagram Transport Layer Security (DTLS) for datagram-based applications.
  • TLS T ransport Layer Security
  • DTLS Datagram Transport Layer Security
  • one or more security credentials associated with any of the foregoing data security operations may be stored on a TPM.
  • a TPM may be implemented within any of the gateways, system 501 or system 503, for example, during production, and may be used to personalize the gateway or system 501/503.
  • Such gateways, system 501 and/or system 503 may be configured (e.g., during manufacture or later) to implement cryptographic technologies known in the state of the art, such as a Public Key Infrastructure (PKI) for the management of keys and credentials.
  • PKI Public Key Infrastructure
  • gateway 528 connecting system 501 to system 503 or each gateway present within system 501 may be configured to process data received from system 501 or 503, including analyzing data that may have been generated or received by system 501 or 503, and providing instructions to system 501 , for example concerning acquisition of data of the scene.
  • each gateway may be configured to provide one or more functions pertaining to triggering pre-defined actions as described in more detail in relation to FIG. 1.
  • each gateway may be configured with software encapsulating such capability.
  • system 501 connected via a communication interface directly to system 503 may be configured to process data and perform further functions described above.
  • system 501 may be configured with software encapsulating such capability.
  • processing unit 534 is connected via communication interface 540 to database 536 comprising digital representations of the scene and neural network 538 is implemented and running on processor 530 of processing unit 534.
  • neural network 538 is a neural network which has been trained according to the method described in relation to FIG. 3.
  • neural network 538 is an untrained neural network which is trained with data received via communication interface 528 from processing unit 516 of the luminescence object recognition system 501.
  • neural network 538 is a commercially available trained image recognition neural network, such as Inception/GoogLeNet, ResNet-50, ResNet-34 , MobileNet V2 or VGG-16.
  • Processor 530 of processing unit 534 is configured to execute instructions, for example retrieved from memory 532, and to carry out operations associated with system 500.
  • processor 530 is configured to train neural network 538 with labelled images received via communication interface 528 from the luminescence object recognition system 501 and optionally a digital representation of the scene retrieved from database 536 via communication interface 540 based on the data provided from system 501 as described in relation to blocks 330 and 332 of FIG. 3.
  • processor 530 is configured to determine a set of object identification hypothesis or refine the set of object identification hypotheses determined by the processing unit 516 of luminescence object recognition system 501 using trained neural network 538 and optionally the digital representations of the scene stored in database 536.
  • Processor 530 may be further programmed to optionally provide the progress of the training or the result of the refinement to a display device.
  • the set of object identification hypothesis may be determined by processor 530 using trained neural network 538 based on the data acquired by the luminescence object recognition system 501 , in particular images of scene 502, which is provided to processor 530 via communication interface 528, and optionally the digital representation of the scene which is stored in database 536 and retrieved by processor 530 via communication interface 540.
  • the set of object identification hypotheses received via communication interface 528 from the luminescence object recognition system may be refined by processor 530 using trained neural network 538 based on the data acquired by the luminescence object recognition system 501 , in particular images of scene 502, which is provided to the processor 530 via communication interface 528 and optionally the digital representation of the scene which is stored in database 536 and retrieved by processor 530 via communication interface 540.
  • system 500 further comprises a database (not shown) containing actions associated with pre-defined objects which is connected to the processing unit 516 or 534 via a communication interface as described in relation to FIG. 2a.
  • system 500 further comprises a display device having a screen (not shown) and being connected to processing unit 516 or 534 via a communication interface as described in relation to FIG. 2a.
  • FIG. 6 illustrates a non-limiting embodiment of a system 600 for recognizing at least one object in a scene and for remotely managing object recognition system(s).
  • system 600 may be used to implement blocks 402 to 410 and 422 to 426 of method 400 described in relation to FIG. 4.
  • system 600 comprises a visual Al recognition system 601 comprising a trained object recognition neural network 614, a cloud 616 and a display device 626.
  • several visual Al recognition systems 601.1 to 601. n are connected via communication interfaces with cloud 616.
  • luminescence object recognition system(s) (such as described in FIGs. 2a, 2b) are connected to cloud 616 instead or in addition to visual Al recognition system 601 (not shown).
  • the visual Al system 601 comprises a processing unit 612 housing computer processor 608 and internal memory 610 which is connected via communication interface 626 with a sensor unit comprising sensor 606 and via communication interface 630 with cloud 616.
  • Communication interfaces 626, 630, 632 may represent gateways and/or the visual Al system 601 may comprise gateway functionalities as described in relation to FIG. 5.
  • the sensor unit comprises exactly one sensor 606.
  • the sensor unit comprises more than one sensor. Suitable sensors may include, for example, cameras, such as a commercially available video cameras.
  • Sensor 606 of system 601 is used to monitor scene 602 comprising a waste bin as well as waste 604 (i.e. the object to be recognized) being thrown into the waste bin by a person.
  • the waste bin may be located indoors, for example in a kitchen, bathroom etc., or may be located outdoors, for example in a public place, like a pare.
  • the acquired sensor data is provided via communication interface 626 to processing unit 612.
  • Object recognition neural network 614 which has been trained according to the method described in relation to FIG. 3 is implemented and running on processor 608 of processing unit 612.
  • the processor 608 is configured to execute instructions, for example retrieved from memory 610, and to carry out operations associated with the system 600, namely o determine the at least one object in the scene based on
  • optionally the provided digital representation of the scene, o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
  • Suitable processors and internal memories are the ones previously described in relation to FIG. 2a.
  • the provided digital representation of the scene is stored in the service layer 624 of cloud 616 described below and retrieved by processor 608 via communication interface 630 prior to determining the object(s) being present in the scene.
  • the provided digital representation of the scene is stored in a database which is connected via a communication interface with processor 608 of processing unit 612 of system 601.
  • System 600 further comprises cloud 616 having two layers, namely an application layer 620 containing one or more applications 616 and a service layer 624 containing one or more databases 622.
  • the applications layer 620 as well as the services layer 624 may each be implemented using one or more servers in the cloud 616.
  • the cloud 616 comprises more or less layers.
  • the service layer 624 may include, for example, at least one of the following databases 622: a geographic database, an object recognition system database, an environment database, a legal database, a history database.
  • the geographic database may include geographic information involving visual Al object recognition systems managed by the system 600.
  • geographic information may include the GPS location of the environment, address information, etc. of system(s) 601.
  • the object recognition system database may include information about system(s) 601 managed by system 600 such as, for example, hardware configuration, date of creation, maintenance intervals, last inspection, and other information.
  • the environment database may include information about the environment in which managed system(s) 601 are installed in such as, for example, location of system(s) 601 in the scene, store identifier, store information, number of shelves in store, store size, preferences of store clients, number of people in the household, age of people in the household, preferences of people in the household, purchase orders, stock information and other information.
  • the legal database may store information about legal restrictions concerning the distribution of items, the commercial availability of items in a specific region or other information.
  • the history database may store information about the purchase history of articles, history of sold articles, etc..
  • Information stored in the service layer 624 may be retrieved by system(s) 601 via communication interface 630 prior to determining the objects present in the scene from the acquired sensor data.
  • the information may be retrieved using a scene identifier which is unique for every household system 601 is installed or for every system 601.
  • the transformation layer 620 may include any of a variety of applications that utilize information and services related to item management, including any of the information and services made available from the services layer 624.
  • the transformation layer 620 may include: a system inventory application, an item inventory management application, an order management application, further applications, or any suitable combination of the foregoing.
  • the system inventory application may provide an inventory of system(s) 601 managed within system 600, including properties (e.g., characteristics) about each system 601.
  • the inventory of systems may be a group (e.g., "fleet") of systems owned, leased, controlled, managed, and/or used by an entity, such as a vendor of visual Al recognition systems, a store owner, etc..
  • the item inventory application may provide an inventory of items being present in the store or household based on the information stored in the environment database or the history database.
  • the order management application may manage item orders of the shop/household.
  • the order management application may maintain information about all past and current item orders and process such orders.
  • the order management application may be configured to automatically order items based on the objects detected by system(s) 601 and information contained in the environment database. For example, the application may have one or more predefined thresholds, e.g., of number of remaining items, etc., after which being reached or surpassed (e.g., going below a number of remaining items) additional items should be ordered.
  • the applications may be configured via interfaces to interact with other applications within the application layer 620, including each other. These applications or portions thereof may be programmed into gateways and/or system 601 of the system 600 as well.
  • System 600 further comprises a display device 626 connected via communication interface 632 with cloud 616.
  • the display device is a smartphone.
  • the display device may be a stationary display device or may include further mobile display devices.
  • the display device can be used to display the objects recognized with system 601 or further information, such as determined actions, on the screen of the display device.
  • the display device can also be used to access the application layer 620 in cloud 616 and to manage system(s) 601 connected to cloud 616.
  • Information may be communicated between components of the system 600, including system(s) 601 , gateways, and components of the cloud 616, in any of a variety of ways.
  • Such techniques may involve the transmission of information in transaction records, for example using blockchain technology.
  • Such transaction records may include public information and private information, where public information can be made more generally available to parties, and more sensitive information can be treated as private information made available more selectively, for example, only to certain users, owners of system(s) 601.
  • the information in the transaction record may include private data that may be encrypted using a private key specific to a system 601 and may include public data that is not encrypted.
  • the public data may also be encrypted to protect the value of this data and to enable the trading of the data, for example, as part of a smart contract.
  • the distinction between public data and private data may be made depending on the data and the use of the data.
  • the number of communications between components of the system 600 may be minimized, which in some embodiments may include communicating transactions (e.g., detected objects) to servers within the cloud 616 according to a predefined schedule, in which gateways are allotted slots within a temporal cycle during which to transmit transactions (e.g., transmit data from system 601 to cloud 616 or instructions from cloud 616 to system(s) 601) to/from one or more servers.
  • Transactions e.g., detected objects
  • gateways are allotted slots within a temporal cycle during which to transmit transactions (e.g., transmit data from system 601 to cloud 616 or instructions from cloud 616 to system(s) 601) to/from one or more servers.
  • Data may be collected over a predetermined period of time and grouped into a single transaction record prior to transmittal.

Abstract

Aspects described herein generally relate to methods and systems for object recognition utilizing luminescence identification and/or machine learning. More specifically, aspects described herein relate to methods and systems for recognition of at least one luminescent object being present in a scene using data generated by a luminescent object recognition system and further data on the environment of the scene. This allows to boost the accuracy of the object recognition to near 100% in cases with ambiguous luminescence identification. Moreover, aspects described therein relate to a method for training an object recognition neural network using data generated by a luminescent object recognition system as well as methods and systems for object recognition using the trained neural network. Since the luminescent objects are recognized by the luminescent object recognition system at pixel level, highly accurate bounding boxes or segmentation can be created automatically from the output data of the luminescent object recognition system and the labelled images can be directly used for training of the object recognition neural network to improve its performance. This renders time- consuming manual labeling of images superfluous and allows to generate neural networks which are specifically trained on the respective scene, thus reducing the number of objects each neural network must recognize to the number of objects occurring at the scene. This allows to reduce the necessary training time as well as the computing power necessary for object recognition using the trained neural network.

Description

System and method for object recognition utilizing color identification and/or machine learning
FIELD
Aspects described herein generally relate to methods and systems for object recognition utilizing luminescence identification and/or machine learning. More specifically, aspects described herein relate to methods and systems for recognition of at least one luminescent object being present in a scene using data generated by a luminescent object recognition system and further data on the scene. This allows to boost the accuracy of the object recognition to near 100% in cases with ambiguous luminescence identification. Moreover, aspects described therein relate to a method for training an object recognition neural network using data generated by a luminescent object recognition system as well as methods and systems for object recognition using the trained neural network. Since the luminescent objects are recognized by the luminescent object recognition system at pixel level, highly accurate bounding boxes or segmentation can be created automatically from the output data of the luminescent object recognition system and the labelled images can be directly used fortraining of the object recognition neural network to improve its performance. This renders timeconsuming manual labeling of images superfluous and allows to generate neural networks which are specifically trained on the respective scene, thus reducing the number of objects each neural network must recognize to the number of objects occurring at the scene. This allows to reduce the necessary training time as well as the computing power necessary for object recognition using the trained neural network.
BACKGROUND
Computer vision is a field in rapid development due to abundant use of electronic devices capable of collecting information about their surroundings via sensors such as cameras, distance sensors such as LIDAR or radar, and depth camera systems based on structured light or stereo vision to name a few. These electronic devices provide raw image data to be processed by a computer processing unit and consequently develop an understanding of a scene using artificial intelligence and/or computer assistance algorithms. There are multiple ways how this understanding of the scene can be developed. In general, 2D or 3D images and/or maps are formed, and these images and/or maps are analysed for developing an understanding of the scene and the objects in that scene. The object identification process has been termed remote sensing, object identification, classification, authentication, or recognition over the years. While shape and appearance of objects in the scene acquired as 2D or 3D images can be used to develop an understanding of the scene, these techniques have some shortcomings. One prospect for improving computer vision is to identify objects based on the chemical components present on the objects in the scene.
A number of techniques have been developed for recognition of an object in computer vision systems and include, for example, the use of image-based physical tags (e.g. barcodes, QR codes, serial numbers, text, patterns, holograms etc.) or scan-/close contact-based physical tags (e.g. viewing angle dependent pigments, upconversion pigments, metachromics, colors (red/green), luminescent materials). However, the use of image-based physical tags is associated with some drawbacks including (i) reduced readability in case the object comprising the image-based physical tag is occluded, only a small portion of the object is in view or the image-based physical tag is distorted, and (ii) the necessity to furnish the image-based physical tag on all sides of the object in large sizes to allow recognition from all sides and from a distance. Scanning and close contact-based tags also have drawbacks. Upconversion pigments are usually opaque and have large particles sizes, thus limiting their use in coating compositions. Moreover, they require strong light probes because they only emit low levels of light due to their small quantum yields. Many upconversion pigments have unique response times that are used for object recognition and classification, however, the measurement of the response time requires knowing the distance between the probe and the sample in order to calculate the time of flight for the light probe. Additionally, the upconversion response is much slower than the fluorescence and light reflection, thus requiring that the time of flight distance for that sensor/object system is known in advance to use the upconversion response for object recognition. This distance is, however, rarely known in computer vision applications. Similarly, viewing angle dependent pigment systems only work in close range and require viewing at multiple angles. Also, the color is not uniform for visually pleasant effects. The spectrum of incident light must be managed to get correct measurements. Within a single image/scene, an object that has angle dependent color coating will have multiple colors visible to the camera along the sample dimensions. Color-based recognitions are difficult because the measured color depends partly on the ambient lighting conditions. Therefore, there is a need for reference samples and/or controlled lighting conditions for each scene. Different sensors will also have different capabilities to distinguish different colors and will differ from one sensor type/maker to another, necessitating calibration files for each sensor. Luminescence based recognition under ambient lighting is a challenging task, as the reflective and luminescent components of the object are added together. Typically, luminescence-based recognition will instead utilize a dark measurement condition and a priori knowledge of the excitation region of the luminescent material so the correct light probe/source can be used. Another technique utilized for recognition of an object in computer vision is the use of passive or active electronic tags. Passive electronic tags are devices which are attached to objects to be recognized without requiring to be visible or to be supplied with power, and include, for example, RFID tags. Active electronic tags are powered devices attached to the object(s) to be recognized which emit information in various forms, such as wireless communications, light, radio, etc.. Use of passive electronic tags, such as RFID tags, require the attachment of a circuit, power collector, and antenna to the item/object to be recognized or the object recognition system to retrieve information stored on the tag, adding cost and complication to the design. To determine a precise location when using passive electronic tags, multiple sensors have to be used in the scene, thus further increasing the costs. Use of active electronic tags require the object to be recognized to be connected to a power source, which is cost- prohibitive for simple items like a soccer ball, a shirt, or a box of pasta and is therefore not practical.
Yet another technique utilized for recognition of an object in computer vision is the imagebased feature detection relying on known geometries and shapes stored in a database or image-based deep learning methods using algorithms which have been trained by numerous labelled images comprising the objects to be recognized. A frequent problem associated with image-based feature detection and deep learning methods is that the accuracy depends largely on the quality of the image and the position of the camera within the scene, as occlusions, different viewing angles, and the like can easily change the results. Moreover, detection of flexible objects that can change their shape is problematic as each possible shape must be included in the database to allow recognition. Furthermore, the visual parameters of the object must be converted to mathematical parameters at great effort to allow usage of a database of known geometries and shapes. Additionally, logo type images present a challenge since the can be present in multiple places within the scene (i.e. , a logo can be on a ball, a T- shirt, a hat, or a coffee mug) and the object recognition is by inference. Finally, there is always inherent ambiguity as similarly shaped objects may be misidentified as the object of interest. In case of image-based deep learning methods, such as CNNs, the accuracy of the object recognition is dependent on the quality of the training data set and large amounts of training material are needed for each object to be recognized/classified.
Moreover, object tracking methods are used for object recognition. In such methods, items in a scene are organized in a particular order and labelled. Afterwards, the objects are followed in the scene with known color/geometry/3D coordinates. However, “recognition” is lost if the object leaves the scene and re-enters. Apart from the above-mentioned shortcomings, these methods all lack the possibility to identify as many objects as possible within each scene with high accuracy and low latency using a minimum amount of resources in sensors, computing capacity, light probe etc.
Finally, luminescent object recognition systems relying on the illumination invariant luminescence of materials and the use of special light sources and/or sensor arrays to separate the reflected light from the luminesced light are known in the state of the art. However, the number of recognizable objects is limited by the number of distinct luminescent colors and the same or similar luminescent color cannot be used for the same object. To solve this problem, the information on luminescence obtained by the luminescent recognition system is combined with traditional visual Al systems. This combination allows to use similar luminescent materials as identification tags for objects having different shapes because the traditional Al system will be able to distinguish between the objects having similar luminescence based on the different shape. However, use of traditional Al systems requires training of such systems with huge amounts of training data which needs to be labelled manually.
It would therefore be desirable to provide a training method for an object recognition neural network as well as computer-implemented methods and systems for recognition of object(s), in particular luminescent and non-luminescent objects, in a scene which are not associated with the aforementioned drawbacks. More specifically, the computer-implemented methods and systems for recognition of luminescent object(s) in a scene should result in a near 100% accuracy in cases with ambiguous luminescence identification, i.e. in case the same or a luminescent material is used as tag for different objects, preferably without the use of visual object recognition systems. The method for training an object recognition neural network should allow automatic labelling of objects with high accuracy in acquired images of the scene as well as scene specific further training of the implemented trained neural network to prevent exhaustive training prior to implementation and to ensure flexible adaption of the implemented neural network to the occurrence of new objects in the scene. The trained neural network should be used to recognize objects in a scene with high accuracy.
DEFINITIONS
“Object recognition” refers to the capability of a system to identify an object in a scene, for example by using any of the aforementioned methods, such as analysing a picture with a computer and identifying/labelling a ball in that picture, sometimes with even further information such as the type of a ball (basketball, soccer ball, baseball), brand, the context, etc.. “Luminescent object recognition system” refers to a system which is capable of identifying an object being present in the scene by detecting the luminescence and optionally reflectance of the object upon illumination of the scene with a suitable light source.
A “scene” refers to the field of view of the object recognition system. An object recognition system may have multiple cameras to expand its field of view and therefore the scene it covers. An object recognition system may be comprised of multiple subsystems, each with multiple cameras, to cover increasingly large areas of space in sufficient detail. Systems may be located in fixed locations, or be mobile and transported via human or robotic means. Each subsystem may be further comprised of subsystems. For example, a household may be covered by a kitchen subsystem, comprised of a kitchen pantry subsubssytem and a kitchen waste canister subsubsystem, a garage subsystem, a living room subsystem, and a basement subsystem. Scenes from each of these subsystems may overlap and information from one subsystem may be informative to the other related subsystems, as each scene is located in an overall related environment. The location of each scene within a system may be indicative of the status of an item in that scene, for example, its placement in a waste or recycling bin scene may signal to the system that the item is being disposed of and a replacement should be ordered.
“Ambient lightning” (also known as general lightning in trade) refers to sources of light that are already available naturally (e.g. the sun, the moon) or artificial light being used to provide overall illumination in an area utilized by humans (e.g. to light a room). In this context, “ambient light source” refers to an artificial light source that affects all objects in the scene and provides a visually pleasant lighting of the scene to the eyes of an observer without having any negative influences on the health of the observer. The artificial light source may or may not be part of the artificial ambient lighting in a room. If it is part of the artificial ambient lightning in the room, it may act as the primary or secondary artificial ambient light source in a room.
“Digital representation” may refer to a representation of an object class, e.g. a known object, and to a representation of the scene in a computer readable form. In particular, the digital representation of object classes may, e.g. be data on object specific reflectance and/or luminescence properties of known objects. Such data may comprise RGB values, rg chromacity values, spectral luminescence patterns, reflectance patterns or a combination thereof. The data on object specific reflectance and/or luminescence properties may be interrelated with data on the respective object, such as object name, object type, bar code, QR code, article number, object dimensions, such as length, width, height, object volume, object weight or a combination thereof, to allow identification of the object upon determining the object specific reflectance and/or luminescence properties. The digital representation of the scene may, e.g. be data being indicative of the geographic location of the scene, data being indicative of the household, data on stock on hand, data on preferences, historical data of the scene, data being indicative of legal regulations and/or commercial availability valid for the scene or geographic location, dimensions of the scene or a combination thereof, said data being optionally interrelated with a scene identifier. Data being indicative of the geographic location of the scene may include GPS coordinates, IP address data, address data or a combination thereof. Historical data of the scene may include the order history. The scene identifier may be, for example, a user identity.
"Communication interface" may refer to a software and/or hardware interface for establishing communication such as transfer or exchange or signals or data. Software interfaces may be e. g. function calls, APIs. Communication interfaces may comprise transceivers and/or receivers. The communication may either be wired, or it may be wireless. Communication interface may be based on or it supports one or more communication protocols. The communication protocol may a wireless protocol, for example: short distance communication protocol such as Bluetooth®, or WiFi, or long distance communication protocol such as cellular or mobile network, for example, second-generation cellular network ("2G"), 3G, 4G, Long-Term Evolution ("LTE"), or 5G. Alternatively, or in addition, the communication interface may even be based on a proprietary short distance or long distance protocol. The communication interface may support any one or more standards and/or proprietary protocols.
"Computer processor" refers to an arbitrary logic circuitry configured to perform basic operations of a computer or system, and/or, generally, to a device which is configured for performing calculations or logic operations. In particular, the processing means, or computer processor may be configured for processing basic instructions that drive the computer or system. As an example, the processing means or computer processor may comprise at least one arithmetic logic unit ("ALU"), at least one floating-point unit ("FPU)", such as a math coprocessor or a numeric coprocessor, a plurality of registers, specifically registers configured for supplying operands to the ALU and storing results of operations, and a memory, such as an L1 and L2 cache memory. In particular, the processing means, or computer processor may be a multicore processor. Specifically, the processing means, or computer processor may be or may comprise a Central Processing Unit ("CPU"). The processing means or computer processor may be a (“GPU”) graphics processing unit, (“TPU”) tensor processing unit, ("CISC") Complex Instruction Set Computing microprocessor, Reduced Instruction Set Computing ("RISC") microprocessor, Very Long Instruction Word ("VLIW') microprocessor, or a processor implementing other instruction sets or processors implementing a combination of instruction sets. The processing means may also be one or more special-purpose processing devices such as an Application-Specific Integrated Circuit ("ASIC"), a Field Programmable Gate Array ("FPGA"), a Complex Programmable Logic Device ("CPLD"), a Digital Signal Processor ("DSP"), a network processor, or the like. The methods, systems and devices described herein may be implemented as software in a DSP, in a micro-controller, or in any other side-processor or as hardware circuit within an ASIC, CPLD, or FPGA. It is to be understood that the term processing means or processor may also refer to one or more processing devices, such as a distributed system of processing devices located across multiple computer systems (e.g., cloud computing), and is not limited to a single device unless otherwise specified.
"Neural network" refers to a collection of connected units or nodes called neurons. Each connection (also called edge) can transmit a signal to other neurons. An artificial neuron that receives a signal then processes it and can signal neurons connected to it. The "signal" at a connection is a real number, and the output of each neuron is computed by some non-linear function of the sum of its inputs. Neurons and edges typically have a weight that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection. Neurons may have a threshold such that a signal is sent only if the aggregate signal crosses that threshold. Typically, neurons are aggregated into layers. Different layers may perform different transformations on their inputs. Signals travel from the first layer (the input layer), to the last layer (the output layer), possibly after traversing the layers multiple times.
“Data storage medium” may refer to physical and other computer-readable media for carrying or storing computer-executable instructions and/or data structures. Such computer-readable media can be any available media that can be accessed by a general- purpose or specialpurpose computer system. Computer-readable media may include physical storage media that store computer-executable instructions and/or data structures. Physical storage media include computer hardware, such as RAM, ROM, EEPROM, solid state drives ("SSDs"), flash memory, phase-change memory ("PCM"), optical disk storage, magnetic disk storage or other magnetic storage devices, or any other hardware storage device(s) which can be used to store program code in the form of computer-executable instructions or data structures, which can be accessed and executed by a general-purpose or special-purpose computer system to implement the disclosed functionality of the invention.
“Database” may refer to a collection of related information that can be searched and retrieved. The database can be a searchable electronic numerical, alphanumerical, or textual document; a searchable PDF document; a Microsoft Excel® spreadsheet; or a database commonly known in the state of the art. The database can be a set of electronic documents, photographs, images, diagrams, data, or drawings, residing in a computer readable storage media that can be searched and retrieved. A database can be a single database or a set of related databases or a group of unrelated databases. “Related database” means that there is at least one common information element in the related databases that can be used to relate such databases.
SUMMARY
To address the above-mentioned problems in a perspective the following is proposed: a computer-implemented method for recognizing at least one object having object specific luminescence properties in a scene, the method comprising:
(i) providing to a computer processor via a communication interface data of the scene, said data of the scene including data on object specific reflectance and/or luminescence properties of at least one object being present in the scene;
(ii) providing to the computer processor via a communication interface digital representations of pre-defined objects and a digital representation of the scene;
(iii) determining - with the computer processor - the at least one object in the scene based on
• the provided data of the scene,
• the provided digital representations of pre-defined objects, and
• the provided digital representation of the scene;
(iv) optionally providing via the communication interface the at least one identified object and/or triggering with the computer processor an action associated with the identified object(s).
It is an essential advantage of the method according to the present invention that data besides luminescence and/or reflectance information, such as data on the scene, can be used to boost the accuracy of the object identification to near 100% in use cases with ambiguous luminescence-only identification. For example, geographic regions may only have distribution of one item per unique luminescent material used as tag and items identified in that region are assumed to be from the set of items distributed in that area. Alternatively, household or user data may know or infer likely items based on a search or purchase history, and to resolve ambiguity in favor of the previously searched for or purchased item. Some items may be legally restricted or not commercially available in a jurisdiction where identification occurs, and the ambiguity is resolved in favor of legal or commercially available items in the area.
Further disclosed is: a system for recognizing at least one object having object specific luminescence properties in a scene. According to a first embodiment, said system comprises: a light source comprising at least one illuminant for illuminating the scene a sensor unit for acquiring data of the scene including object specific reflectance and/or luminescence properties of at least one object being present in the scene upon illumination of the scene with the light source; a data storage medium comprising digital representations of pre-defined objects and digital representations of the scene; at least one communication interface for providing the acquired data of the scene, the digital representations of pre-defined objects and the digital representations of the scene; a processing unit in communication with the sensor unit and the communication interface, the processing unit programmed to o determine the at least one object in the scene based on
■ the provided data of the scene,
■ the provided digital representations of pre-defined objects, and
■ the provided digital representation of the scene, o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
The inventive systems combine a luminescence object recognition system based on the detection of object specific luminescence and optionally reflectance properties and further data, such as data on the scene, to boost the accuracy of the object identification to near 100% in use cases with ambiguous luminescence-only identification.
Further disclosed is: a method for training an object recognition neural network, the method comprising: a) providing via a communication interface to a computer processor data of a scene, said data of the scene including image(s) of the scene and data on object specific luminescence and optionally reflectance properties of at least one object being present in the scene, b) calculating with the computer processor for each provided image of the scene a labelled image of the scene by b1) annotating each classified pixel of each image with an object specific label based on the data on object specific luminescence and optionally reflectance properties, and b2) optionally creating bounding boxes around the objects determined in the images in step b1) based on the annotated pixels or segmenting the images obtained after step b1) based on the annotated pixels; c) providing via a communication interface the calculated labelled images of the scene and optionally a digital representation of the scene to the object recognition neural network; and d) training the object recognition neural network with the provided calculated labelled images of the scene and optionally with the provided digital representation of the scene as input, wherein the neural network is trained to recognize each labelled object in the calculated labelled images.
It is an essential advantage of the training method according to the present invention that data provided by a luminescence object recognition system and containing information on the object specific luminescence properties on a pixel level is used because this allows for highly accurate bounding boxes or segmentation to be created automatically. The labelled images are then used to train the object recognition neural network to improve the performance of the neural network automatically. Further data, such as data on the scene, can be included during calculation of the labelled images to further increase identification accuracy during calculation of the labelled images and to improve the quality of the neural network training. The inventive training method therefore allows to provide a trained object recognition neural network, wherein the training is performed automatically by using the data provided by the luminescence object recognition system, thus reducing the amount of user interaction normally required during training of visual Al recognition systems, for example by manually labelling the training images. To obtain data from luminescence object recognition systems, incentives may be used. For example, households may have their purchased luminescence object recognition system subsidized or receive other reimbursements in exchange for the labelled data their system generates. To increase the amount of training data, items can be coated with luminescent tags. For example, purveyors of object recognition neural networks may coat or tag items that are not normally coated with the luminescent tags in order to collect training data. This allows to easily and automatically collect training data on a huge number of items and to improve the object recognition of trained object recognition neural networks by continuously training such systems with unknown items. Moreover, this allows to create object recognition systems which are tailored to the specific scene by using highly defined training data automatically generated from the luminescence object recognition system used in said specific scene.
Further disclosed is: a computer-implemented method for recognizing at least one object in a scene, said method comprising:
(A) providing via a communication interface to a computer processor an object recognition neural network, in particular an object recognition neural network which has been trained according to the inventive method for training an object recognition neural network,
(B) providing via a communication interface to the computer processor data of the scene, said data of the scene including image(s) of the scene and/or data on object specific reflectance and/or luminescence properties of at least one object being present in the scene;
(C) optionally providing via a communication interface to the computer processor digital representations of pre-defined objects and/or a digital representation of the scene,
(D) determining - with the computer processor - at least one object in the scene based on the provided trained object recognition neural network, the provided data of the scene and optionally the provided digital representation of the scene; and
(E) optionally providing via a communication interface the at least one identified object and/or triggering at least one action associated with the identified object(s).
The trained neural network can either be used in combination with a luminescent object recognition system or can be used independently from the luminescent object recognition system in a similar scene. Use of the trained neural network in combination with the luminescent object recognition system results in an iterative improvement of the performance of each system, because the output of each system can be used to reduce the ambiguity in the object recognition of the other system, for example by performing sanity checks or to distinguish between objects with similar luminescence but different shapes or vice versa. Use of the trained object recognition neural network in combination with a luminescent object recognition systems allows to improve identification accuracy because information devoid of shape, i.e. the information on the object derived from object specific reflectance and/or luminescence properties, is combined with information on the shape, i.e. information obtained from the trained object recognition neural network. This allows to identify an object in case the information on reflectance and/or luminescence in combination with the information on the scene is not sufficient to clearly identify the object. Independent use of the trained neural network may be preferred if the use of a luminescent object recognition system is not possible or not desirable due to the higher costs associated with the use of the luminescence object recognition system. For example, a retailer may install the more expensive fluorescent color identification system in one of its stores and use the data it collects to train the traditional Al visual system used in its other stores with similar layouts and product offerings.
Further disclosed is: a system for recognizing at least one object in a scene, said system comprising: a sensor unit for acquiring data of the scene; a data storage medium comprising an object recognition neural network, in particular an object recognition neural network which has been trained according to the inventive method for training an object recognition neural network and optionally digital representations of pre-defined objects and/or a digital representation of the scene, at least one communication interface for providing the acquired data, the object recognition neural network, and optionally the digital representations of pre-defined objects and/or the digital representation of the scene, a processing unit in communication with the sensor unit and data storage medium, the processing unit programmed to o determine the at least one object in the scene based on
■ the provided data of the scene,
■ the provided object recognition neural network, and
■ optionally the provided digital representations of pre-defined objects and/or the digital representation of the scene, o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
Further disclosed is:
A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to perform the steps according to the computer-implemented methods described herein.
The disclosure applies to the methods, systems and non-transitory computer-readable storage media disclosed herein alike. Therefore, no differentiation is made between methods, systems and non-transitory computer-readable storage media. All features disclosed in connection with the methods are also valid for the systems and non-transitory computer-readable storage media disclosed herein.
Further disclosed is a system comprising a scene and at least identified object, wherein the object was recognized using the system or the methods disclosed herein.
Further disclosed is the use of the system or the methods disclosed herein for identifying objects in a scene. EMBODIEMENTS
Embodiments of the inventive computer-implemented object recognition method:
The inventive method is used to recognize at least one object having object specific reflectance and/or luminescence properties which is present in the scene. Luminescence is the property of light being emitted from a material without heat. A variety of luminescence mechanisms, such as chemiluminescence, mechanoluminescence, and electroluminescence are known. Photoluminescence is the emission of light/photons due to the absorption of other photons. Photoluminescence includes fluorescence, phosphorescence, upconversion, and Raman scattering. Photoluminescence, fluorescence and phosphorescence are able to change the color appearance of an object under ordinary light conditions. While there is a difference between the chemical mechanisms and time scales of fluorescence and phosphorescence, for most computer vision systems they will appear identical.
Some objects are naturally luminescent and can therefore be directly recognized with the proposed system and/or method without further modification of the object.
In case the object is not naturally luminescent, the luminescence has to be imparted. Such objects having object specific luminescence and reflectance properties comprise at least one luminescence material, each luminescence material having a predefined luminescence property. The object can be imparted with the at least one luminescence material by a variety of methods. In one example, luminescent material(s) are dispersed in a coating material which is applied by spray coating, dip coating, coil coating, roll-to-roll coating and other application methods. After optional drying, the applied coating material is cured to form a solid and durable luminescence coating layer on the object surface. In another example, the luminescence material(s) are printed onto the surface of the object. In yet another example, the luminescence material(s) are dispersed into a composition and the composition is afterwards extruded, molded, or casted to obtain the respective object. Other examples include genetical engineering of biological materials (vegetables, fruits, bacteria, tissue, proteins, etc.) or the addition of luminescent proteins in any of the ways mentioned herein. Since the luminescence spectral pattern of the luminescence material(s) are known, these luminescent material(s) can be used as an identification tag by interrelating the object comprising said luminescence material(s) with the respective luminescence spectral pattern(s). By using luminescent chemistry of the object as a tag, object recognition is possible irrespective of the shape of the object or partial occlusions.
Suitable luminescent materials are commercially available, and their selection is mainly limited by the durability of the fluorescent materials and compatibility with the material of the object to be recognized. Preferred examples of luminescene materials include fluorescent materials, for example the BASF Lumogen® F series of dyes, such as, for example, yellow 170, orange 240, pink 285, red 305, a combination of yellow 170 and orange 240 or any other combination thereof. Another example of suitable fluorescent materials are Clariant Hostasol® fluorescent dyes Red GG, Red 5B, and Yellow 3G. Optical brighteners are a class of fluorescent materials that are often included in object formulations to reduce the yellow color of many organic polymers. They function by fluorescing invisible ultraviolet light into visible blue light, thus making the produced object appear whiter. Many optical brighteners are commercially available, including BASF Tinopal® SFP and Tinopal® NFW and Clariant Telalux® KSI and Telalux® OB1.
Step (i):
In step (i) of the inventive method, data of the scene is provided. Said data of the scene includes data on object specific reflectance and/or luminescence properties of at least one object being present in the scene. The data may be acquired with an object recognition system able to acquire the luminescence and/or reflectance properties of objects being present in the scene. Such systems are commonly known in the state of the art and normally include a light source and a sensor unit.
In an aspect, the data on the scene is acquired with a sensor unit upon illumination of the scene with at least one light source comprising at least one illuminant and the acquired data is provided via the communication interface to the computer processor. The acquired data may be stored on a data storage medium prior to providing the acquired data to the processor. The data storage medium may be an internal data storage medium of the object recognition system or may be a database connected to the object recognition system via a communication interface. The data on the scene can be acquired continuously, at pre-defined time intervals or upon the detection of a triggering event. Pre-defined time intervals may be based on the scene, such as location, preferences of persons present in the scene etc. Triggering events may include detection of a motion in the scene, switching on/off a light in the scene, detection of a sound in the scene or the vicinity of the scene or a combination thereof.
In one example, the light source comprises at least two different illuminants, preferably 2 to 20 different illuminants, more preferably 3 to 12 different illuminants, in particular 4 to 10 different illuminants. The at least one illuminant, in particular all illuminants, may have a peak center wavelength from 385 to 700 nm. Use of illuminants having the aforementioned peak center wavelength renders it possible to use the light source of the inventive system as a primary or secondary ambient light source in a room. This allows to perform object recognition under ambient lightning conditions without the necessity to use defined lighting conditions (such as dark rooms) and to easily integrate the object recognition system into the ambient lightning system already present in the room without resulting in unpleasant lightning conditions in the room. In principle, the illuminant(s) of the light source can be commonly known illuminants, such as illuminants comprising at least one solid-state lighting system (LED illuminants), illuminants comprising at least one incandescent illuminant (incandescent illuminants), illuminants comprising at least one fluorescent illuminant (fluorescent illuminants) or a combination thereof. According to a preferred embodiment, the at least one illuminant is an illuminant comprising at least one solid-state lighting system, in particular at least one narrowband LED. With particular preference, all illuminants of the light source are illuminants comprising at least one solid-state lighting system, in particular at least one narrowband LED. “Narrowband LED” may refer to an individual color LED (i.e. an LED not having a white output across the entire spectrum) having a full-width-half-max (FWHM) - either after passing through a bandpass filter or without the use of a bandpass filter - 5 to 60 nm, preferably of 3 to 40 nm, more preferably of 4 to 30 nm, even more preferably of 5 to 20 nm, very preferably of 8 to 20 nm. The FWHM of each illuminant is obtained from the emission spectrum of each illuminant and is the difference of each wavelength at half of the maximum values of the emission spectrum. Use of LED illuminants reduces the adverse effects on the health which can be associated with the use of fluorescent lights as previously described. Moreover, use of LED illuminants also has various advantages over the use of illuminants comprising incandescent lights: firstly, they allow fast switching between the illuminants of the light source, thus allowing faster acquisition times of the scene under various illumination conditions and therefore also faster object recognition. Secondly, LED illuminants require less energy compared to incandescent illuminants for the same amount of in band illumination, thus allowing to use a battery driven object recognition system. Thirdly, LED illuminants require less amount of time to achieve a consistent light output and a steady state operating temperature, thus the object recognition system is ready faster. Fourthly, the lifetime of LED illuminants is much higher, thus requiring reduced maintenance intervals. Fifthly, the FWHD of the LED illuminants is narrow enough such that the use of a bandpass filter is not necessary, thus reducing the complexity of the system and therefore the overall costs.
In one example, the light source is configured to project at least one light pattern on the scene. Suitable light sources are disclosed, for example, in WO 2020/245441 A1 . In another example, the light source is illuminating the scene without the use of a light pattern.
In one example, the light source is a switchable light source. “Switchable light source” refers herein to a light source comprising at least 2 illuminants, wherein the light source is configured to switch between the at least 2 illuminants. The illuminant(s) and/or the solid state lighting system(s) of the illuminant(s) may be switched on sequentially and the switching of the sensor(s) present in the sensor unit may be synchronized to the switching of the illuminant(s) and/or the solid state lighting system(s) of the illuminant(s) such that each sensor acquires data when each illuminant and/or each solid state lighting system of each illuminant is switched on. The switching of the illuminant(s) of the light source and the sensors of the sensor unit may be set to allow acquisition of object specific luminescence and/or reflectance properties under ambient lighting conditions as described later on with respect to the determination of further object specific luminescence and/or reflectance properties. In another example, the light source may be a non-switchable light source.
In one example, the at least one light source comprises at least one light source filter positioned optically intermediate the illuminant(s) of the light source and the scene. Suitable light source filters include bandpass filters or dynamic light filters or notch filters or linear polarizers. The light source may comprise a single filter for all illuminants of the light source or may comprise a filter for each illuminant of the light source. Bandpass filters may be used to narrow the emitted spectral light to obtain the previously described FWHM. Dynamic light filters are configured to continuously operate over the light spectral range of interest and to provide blocking of at least one band of interest on demand, particularly at wavelengths covered by the luminescence spectral pattern of the at least one object. If a plurality of dynamic light filters are used, they are preferably configured to be synchronized with each other to block the same spectral band or bands simultaneously. Notch filters are configured to block light entering the scene from a window at at least one distinct spectral band within the spectral range of light continuously. In one example, the linear polarizer is coupled with a quarter waveplate and the quarter waveplate is oriented with its fast and slow axes at an angle in the range of 40 to 50 degrees, preferably of 42 to 48 degrees, more preferably of 44 to 46 degrees relative to the linear polarizer.
The light source may further include further includes diffuser and/or focusing optics. In one example, the light source comprises separate diffuser and/or focusing optics for each illuminant of the light source. In case of LED illuminants, single focusing and diffuser optics may be used for all LEDs of the LED illuminant. Suitable focusing optics comprise an individual frosted glass for each illuminant of the light source. In another example, the light source comprises a single diffuser and/or focusing optic for all illuminants of the light source.
The at least one sensor of the sensor unit can be an optical sensor with photon counting capabilities, in particular a monochrome camera, an RGB camera, a multispectral camera, a hyperspectral camera or a combination thereof. The sensor unit can comprise at least one sensor filter, in particular at least one multi-bandpass filter or at least one multi-dichroic beamsplitter or a linear polarizer. Each sensor filter may be matched to spectral light emitted by the illuminant(s) of the light source to block the reflective light originating from illuminating the scene with the respective illuminant from the fluorescent light originating from illuminating the scene with the respective illuminant. In one example, each sensor comprises a sensor filter, such as multi-bandpass filters having complementary transmission valleys and peaks or a linear polarizer. The linear polarizer may be coupled with a quarter waveplate which is oriented with its fast and slow axes at an angle in the range of 40 to 50 degrees, preferably of 42 to 48 degrees, more preferably of 44 to 46 degrees relative to the linear polarizer. In another example, the sensor unit comprises a single sensor filter for all sensors present in the sensor unit. Suitable single camera filters include multi-dichroic beam splitters.
In example, the light source filter and the sensor filter are configured as separate filters. In another example, the light source filter and the sensor filter are configured as a single filter, such as a single bandpass filter. Use of a single filter for the light source and the sensor unit allows to physically separate the luminescence from the reflectance upon illumination of the scene with the light source.
The sensor unit may further contain collection optics positioned optically intermediate the sensor filter and each sensor of the sensor unit or positioned optically intermediate the sensor filter of each sensor of the sensor unit and the scene. The collection optics enable efficient collection of the reflected and fluorescent light upon illumination of the scene with the light source.
In an aspect, data of the scene further includes an at least partial 3D map of the scene. The at least partial 3D map of the scene may be obtained from a scene mapping tool, for example by time of flight measurements or the usage of structured light. When knowing the distances from the light source used to produce the structured light to objects in the scene, a 3D map of the scene can be formed, thus giving information about specific coordinates of the respective objects within the scene.
Suitable object recognition systems which are able to acquire data on object specific luminescence and/or reflectance properties are, for example, disclosed in applications WO 2020/178052A1 , WO 2020/245443A2, WO 2020/245442A1 , WO 2020/245441 A1 , WO 2020/245439A1 and WO 2020/245444A1. Step (ii):
In step (ii) of the inventive method, digital representations of each pre-defined objects and a digital representation of the scene is provided via a communication interface to the computer processor. In an aspect, the digital representations of each pre-defined objects each comprises object specific reflectance and/or luminescence properties optionally interrelated with object data. Object specific reflectance and/or luminescence properties are, for example, RGB values, rg chromacity values, spectral luminescence patterns, reflectance patterns or a combination thereof. Object data may include the object name, the object type, a bar code, a QR code, an article number, object dimensions, such as length, width, height, object volume, object weight or a combination thereof. Interrelation of the object specific luminescence and/or reflectance properties with the object data allows to identify the object upon determining the object specific reflectance and/or luminescence properties as described later on.
In an aspect, the digital representation of the scene comprises data being indicative of the geographic location of the scene, data being indicative of the household, data on stock on hand, data on preferences, historical data of the scene, data being indicative of legal regulations and/or commercial availability valid for the scene or geographic location, dimensions of the scene or a combination thereof, said data being optionally interrelated with a scene identifier. Data being indicative of the geographic location of the scene may include GPS coordinates, IP address data, address data or a combination thereof. Historical data of the scene may include the order history. The scene identifier may be, for example, a user identity.
In an aspect, step (ii) includes providing at least one data storage medium having stored thereon the digital representations of pre-defined objects and/or the digital representation of the scene, obtaining the digital representations of pre-defined objects and/or the digital representation of the scene and providing the obtained digital representation(s). In one example, the digital representation of the scene is obtained by searching the data stored on the data storage medium based on the scene identifier and retrieving the digital representation of the scene interrelated with the scene identifier from the data storage medium.
The order of steps (i) and steps (ii) of the inventive method may be reversed, i.e. step (ii) may be performed prior to step (i).
Step (Hi):
In step (iii) of the inventive method, at least one object in the scene is determined with the computer processor based on the data provided in step (i) and the digital representations provided in step (ii). In an aspect, step (iii) further includes - prior to determining the at least one object in the scene - determining further object specific reflectance and/or luminescence properties from the provided data of the scene, in particular from the provided data on object specific reflectance and/or luminescence properties. The further object specific reflectance and/or luminescence properties may be determined with the computer processor used in step (iii) or may be determined with a further processing unit. The further processing unit may be located on in a cloud environment or may be a stationary processing unit. “Cloud environment” may refer to the on-demand availability of computer system resources, especially data storage (cloud storage) and computing power, without direct active management by the user and may include at least one of the following service modules: infrastructure as a service (laaS), platform as a service (PaaS), software as a service (SaaS), mobile "backend" as a service (MBaaS) and function as a service (FaaS). The stationary processing unit may be located within the object recognition system comprising the previously described light source and sensor unit.
Determining further object specific reflectance and/or luminescence properties is generally optional but may result in a higher accuracy in determining the object, especially under ambient lightning conditions as described hereinafter. Determining further reflectance and/or luminescence properties may include at least one of: generating differential data by subtracting data of the scene acquired by at least sensor under ambient lightning and data of the scene acquired by at least one sensor under ambient lightning and illumination by the light source and optionally converting the differential data (also called delta-calculation hereinafter), determining the regions of luminescence in the generated differential data or in the data of the scene, determining the luminescence spectral pattern and/or the reflective spectral pattern.
Generating differential data may be necessary in case a physical separation of luminesced light and reflected light upon illumination of the scene with the light source is used to obtain data on object specific luminescence and/or reflectance properties because the filters used to achieve the physical separation are only able to block the reflective light from the illuminators of the light source and the corresponding portions of the ambient lighting but not all of the reflective light from a white light source used as artificial ambient light source in the scene. Thus, the object specific reflectance and/or luminescence properties caused by the use of the light source cannot be detected directly. This problem may be circumvented by performing the so-called delta-calculation, i.e. subtracting data collected under the ambient lighting from data collected under ambient lighting and illumination with the light source. The data necessary for performing the delta-calculation can be obtained, for example, by synchronizing the illuminant(s) of the light source and the sensor(s) of the sensor unit such that the acquisition duration (i.e. the time each color sensitive sensor is switched on) of at least one sensor of the sensor unit and the illumination duration (i.e. the time each illuminant is switched on) of each illuminant of the light source only overlap partially, i.e. at least one sensor is switched on during a time where no illuminant of the light source is switched on, thus allowing to acquire data of the scene under illumination conditions being devoid of the illumination contributed by the light source. The delta-calculation, i.e. data (light source illumination + ambient lighting conditions) - data (ambient lighting conditions) results in data only containing information on the object specific reflectance and/or luminescence properties which is due to the illumination of the scene with the light source. However, for this data to be accurate, both sets of data must be recorded with the same contribution from ambient lighting. Flickering (i.e. the variation of brightness of a light source depending on the type of lighting, the duty cycle of the lighting, and the type of electrical power supplied to the lighting) of light sources which is commonly observed is therefore a problem, especially if the sensor’s acquisition duration (exposure time) is short. In the worst case, when the acquisition duration is very short compared with the flicker cycle and the flicker goes from bright (100% on) to fully dark (0% on), the ambient light contribution can vary by 100% depending on when in the flicker cycle the acquisition begins. To mitigate this effect, the illuminant(s) and sensor(s) used to acquire the data provided in step (i) are preferably synchronized as described later on in relation with the inventive system. This allows to acquire the data provided in step (i) in combination with white light sources (e.g. ambient light sources), i.e. under real-world conditions, because the accuracy of the determination of the object in step (iii) is no longer dependent on the use of highly defined lightning conditions (such as dark rooms).
The differential data may be converted, for example if an RGB color camera is used as sensor of the sensor unit. In this case, rg chromaticity values can be obtained from the RGB values of the differential data by using the following equations (1) and (2) r = - - - (1)
(/?+G+B) v ’ ~ (R+G+B')
Figure imgf000022_0001
In one example, regions of luminescence may be determined after generating the differential image. In another example, regions of luminescence may be determined directly from the acquired sensor data. Determination of regions of luminescence allows to determine the regions to analyze and classify as containing luminescent object(s). In one example, this is performed by analyzing the brightness of the pixels acquired with the luminescence channel (in case physical separation of luminesced and reflected light is used) because non- luminescent regions are black while luminescent regions, when illuminated by a suitable illuminant of the light source, will have some degree of brightness. The analysis can be performed by using a mask to block out black (i.e. non-luminescent regions), an edge detector to mark any region above a certain brightness under any illuminant as being part of the luminescent region or a combination thereof.
The luminescence spectral pattern and/or the reflective spectral pattern for the determined regions of luminescence can be determined if a multispectral or hyperspectral camera is used as sensor. In case differential data is calculated, the luminescence spectral pattern can be determined from the spectral pattern acquired by the luminescence channel (i.e. the sensor of the sensor unit only acquiring luminescence of the object upon illumination of the scene with the light source) and the reflective spectral pattern and the luminescence spectral pattern can be determined from the spectral pattern acquired by the reflectance and luminescence channel (i.e. the sensor of the sensor unit acquiring reflectance and luminescence of the object upon illumination of the scene with the light source). These spectral patterns can be magnitude normalized to give a measurement of chroma similar to the rg chromaticity values from the RGB color cameras. In case no differential data is calculated, the luminescence spectral pattern may be calculated based on the acquired radiance data of the scene at different wavelengths, such as the acquired radiance data of the scene within the spectral bands that are omitted/blocked/filtered (e.g. based on the spectral distribution of the light filter).
In an aspect, determining the at least one object in the scene includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene and/or the determined further object specific reflectance and/or luminescence properties and the provided digital representations of pre-defined objects, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided digital representation of the scene.
In one example, the object identification hypotheses are corresponding to the best matching luminescence and/or reflectance properties. In another example, the object identification hypotheses are corresponding to the best matching objects being associated with the best matching luminescence and/or reflectance properties. The associated confidence score is preferably corresponding to the degree of matching obtained during calculation of the best matching reflectance and/or luminescence properties as described hereinafter. Determining the set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene and/or the determined further object specific reflectance and/or luminescence properties and the provided digital representations of predefined objects may include calculating the best matching reflectance and/or luminescence properties and obtaining the object(s) assigned to the best matching reflectance and/or luminescence properties.
In one example, the best matching reflectance and/or luminescence properties are calculated by applying any number of matching algorithms on the provided data of the scene and/or the determined further object specific reflectance and/or luminescence properties and the provided digital representations of pre-defined objects.
In another example, the best matching reflectance and/or luminescence properties are calculated by providing a data driven model of light spectral distribution and intensity on the at least one object to be recognized by analyzing an at least partial 3D map of the scene, merging the analyzed data with light source specific radiance values, calculating the radiance of light incident at points on the at least one object, and combining the calculated radiance of light incident at the points on the at least one object with the measured radiance of light returned to the at least one sensor of the sensor array from points on the at least one object, calculating the object specific reflectance and/or luminescence properties using the provided data driven model, and applying any number of matching algorithms on the calculated object specific reflectance and/or luminescence properties and the provided digital representations of pre-defined objects.
The radiance of light incident at a specific point in the scene can be formulated via the function of light intensity I(x,y, z) with (x,y,z) designating the space coordinates of the specific point within the scene. The light intensity I(x,y, z) may be obtained in the simplest case by the sum of the light intensities of all light sources at the specific point (x, y, z) according to formula (3):
/(x,y,z) = £fc/fc(x,y,z) (3) wherein
(x,y,z) are the space coordinates of the specific point in the scene and
Ik is the number of light sources present in the scene.
The calculated radiance of light incident at the points in the scene is combined with a measured radiance of light returned to the sensor(s) of the sensor unit from points in the scene, particularly from points on the object to be recognized. Based on such combination of calculated radiance and measured radiance, a model of light spectral distribution and intensity at the object in the scene is formed.
Suitable matching algorithms include lowest root mean squared error, lowest mean absolute error, highest coefficient of determination, matching of maximum wavelength value, nearest neighbors, nearest neighbors with neighborhood component analysis, trained machine learning algorithms or a combination thereof.
In one example, the object(s) assigned to the best matching reflectance and/or luminescence properties are obtained by retrieving the object(s) associated with the best matching reflectance and/or luminescence properties from the provided digital representations of the pre-defined objects. This may be preferred if the digital representations of pre-defined objects contain reflectance and/or luminescence properties interrelated with the respectively assigned object. In another example, the object(s) assigned to the best matching reflectance and/or luminescence properties are obtained by searching a database for said object(s) based on the determined best matching reflectance and/or luminescence properties. This may be preferred if the digital representation of pre-defined objects only contains reflectance and/or luminescence properties but no further information on the object assigned to these properties. The further database may be connected to the computer processor via a communication interface.
Refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided digital representation of the scene may include determining - based on the digital representation of the scene - confidence score(s) for the determined set of object identification hypothesis and using the determined confidence scores to refine the confidence scores associated with the determined set of object identification hypotheses to identify the at least one object.. Determining the confidence score(s) for the set of determined object identification hypotheses based on the provided digital representation of the scene may include determining the likelihood of the presence of objects associated with the determined object identification hypotheses in the scene based on the data contained in the provided digital representation of the scene and associating higher confidence score(s) to objects having a higher probability to be present in the scene based on the data contained in the digital representation of the scene. For example, geographic regions may only have distribution of one item per unique luminescent color and items identified in that region are assumed to be from the set of items distributed in that area and thus associated with the highest confidence score. Alternatively, a search or purchase history may be used to associate previously searched for or purchased item with the highest confidence score. Some items may be legally restricted in a jurisdiction where identification occurs, and the items not legally restricted in the jurisdiction may be associated with the highest confidence score. The object can be determined from the two determined confidence scores by a number of different algorithms. In one example, the object(s) present in the scene may be determined by adding the two confidence scores together, and selecting the object associated with the highest sum value. In another example, the object(s) present in the scene may be determined by multiplying the two confidence scores and selecting the object associated with the highest product value. In yet another example, the object(s) present in the scene may be determined by raising one confidence score to an integer power, multiplying the obtained value by the other confidence score, and selecting the object associated with the highest product value. The suitable algorithm may be determined empirically, using historical data to decide the proper weighting of the two determined confidence scores. For example, it may be beneficial to use a certain algorithm when both confidence scores are high, but use a different one when the confidence score associated with the object identification hypotheses is high and confidence score determined using the digital representation of the scene is low, and yet a different algorithm when the confidence score associated with the object identification hypothesis is low and confidence score determined using the digital representation of the scene is high. Various machine learning models may be used to determine the best weightings or account for different regimes of the two confidence scores using historical data. Refinement of determined object identification hypothesis results in an increase in the accuracy of the object recognition in cases with ambiguous object identification based on the acquired object specific luminescence and/or reflectance properties. This allows to boost the object recognition accuracy of the inventive method to near 100%
In one example, refining the set of determined object identification hypotheses about the object(s) to be recognized is performed by a computer processor being different from the computer processor determining the set of object identification hypotheses about the object(s) to be recognized. In this case, the determined object identification hypotheses may be provided to the further computer processor via a communication interface. The further computer processor may be present in a stationary processing unit or located in a cloud environment.
In an alternative aspect, determining the at least one object in the scene includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided digital representation of the scene, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided data of the scene and/or the determined further reflectance and/or luminescence properties and the provided digital representations of pre-defined objects.
Determining the set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided digital representation of the scene may include determining the likelihood of the presence of object(s) in the scene based on the provided digital representation of the scene and generating a set of object identification hypotheses and associated confidence scores based on the determined likelihood. The set of object identification hypothesis may include a list of objects wherein each object present in the list is associated with a confidence score, i.e. the likelihood of the occurrence of the object in the respective scene based on the data contained in the digital representation of the scene.
The set of determined object identification hypotheses about the object(s) to be recognized is then refined by revising at least certain of said associated confidence scores based on the provided data of the scene and/or the determined further reflectance and/or luminescence properties and the provided digital representation of pre-defined objects. This may include determining confidence scores associated with object(s) present in the scene by calculating the best matching reflectance and/or luminescence properties as previously described and using the determined confidence score(s) to refine the confidence scores associated with the determined object identification hypothesis to identify the at least one object. Refinement of the confidence score obtained using the digital representation of the scene with the confidence score(s) obtained by calculation the best matching reflectance and/or luminescence properties may be performed as previously described.
Optional step (iv):
In optional step (iv), the at least one identified object is provided via the communication interface and/or an action associated with the identified object(s) is triggered by the computer processor. In an aspect, this includes displaying at least part of the identified objects on the screen of a display device, optionally in combination further data on the recognized objects(s) and/or with at least one message. In one example, the at least one message on the screen of the display device is displayed by retrieving - with the computer processor - the at least one message from a data storage medium, in particular a database, based on the identified object(s) and optionally the provided digital representation of the. Further data on recognized object(s) may include the provided data on object specific reflectance and/or luminescence properties, the determined luminescence spectral pattern and/or reflectance pattern, the calculated best matching reflectance and/or luminescence properties, the position of the object(s) in the scene or a combination thereof.
In an aspect, the action is at least one pre-defined action associated with the detected object(s) or associated with the detected object(s) and data contained in the provided digital representation of the scene. The at least one pre-defined action may include ordering of a new product, storing data on recognized object(s) on at least one storage medium, removing recognized object(s) stored on at least one storage medium from said storage medium, updating stock keeping records, creating a list of recognized object(s), providing information on recognized object(s) or created list(s) via a communication interface to a further computer processor, or a combination thereof.
In one example, the action is triggered automatically, for example after detection of the respective object in the scene. The triggering does therefore not require any user interaction. However, the processor may provide a message to the user informing the user about the triggered actions and updating the user on the status of the action to be performed. In another example, the action is triggered after user interaction, for example after the user has approved the determined pre-defined action.
Figure imgf000028_0001
Suitable sensor units include the ones previously described in relation to step (i) of the inventive method.
In an aspect, the system further comprises at least one light source configured to illuminate the scene upon acquisition of data of the scene with the at least one sensor of the sensor unit. Suitable light sources are the ones previously described in connection with step (i) of the inventive method.
The light source may be synchronized with the sensor unit. In an aspect, at least one processing unit of the inventive system is configured to determine the synchronization or to synchronize the light source and the sensor unit based on a synchronization which was determined using a further processing unit (i.e. a processing unit being present separate from the inventive system). In case the separation of luminesced and reflected light is performed physically by the inventive system, the synchronization of the light source and the sensor unit allows the inventive systems to operate under real world conditions using ambient lighting by subtracting data of the scene acquired under ambient light from data of the scene acquired under ambient light and illumination from the light source (i.e. by performing the previously described delta calculation). Synchronization of the light source and the sensor unit mitigates the problems encountered with flickering of the light sources in the scene when the acquisition duration of each sensor is very short compared with the flicker period. In this case, the ambient light contribution can vary by 100% depending on when in the flicker cycle the acquisition begins. When the acquisition duration is much larger than the flicker cycle time, small changes in the phase (i.e. the acquisition duration during a flicker cycle) between the flicker and the acquisition number of flicker cycles recorded will lead to small differences between the acquired data because the difference in brightness due to the starting phase is divided by the total number of cycles recorded. However, as the acquisition duration approaches the flicker cycle time, the total number of flicker cycles recorded decreases while the difference in flicker cycle phase recorded remains the same, so the difference increases. Thus, the result of the delta-calculation is only accurate if the same ambient lightning contribution is present during the capture of the images which are to be subtracted, i.e. the accurate determination of the contribution of each illuminant to the measured luminescence and reflectance is highly dependent on the acquisition duration of each sensor as well as its timing with respect to the flicker cycle of the light sources being present in the scene. Using a highly defined synchronization allows for the inventive systems to compensate for the changes occurring in the acquired images due to the ambient light changes, thus rendering object recognition possible under ambient lightning instead of using highly defined illumination conditions, such as dark rooms, unpleasant lightning conditions, such as IR lightning conditions, or lightning conditions with adverse health effects or that are detrimental to common items, such as significant levels of UV lighting.
In one example, the synchronization of the at least one illuminant of the light source and the least one sensor of the sensor unit may be determined according to the method described in unpublished patent application US 63/139,299. Briefly, this method includes the following steps: i. providing a digital representation of the light source and the sensor unit via a communication interface to the computer processor, ii. determining - with a computer processor - the flicker cycle of all illuminants present in the scene or providing via a communication interface a digital representation of the flicker cycle to the computer processor, iii. determining - with the computer processor - the illumination durations for each illuminant of the light source based on the provided digital representations, iv. determining - with the computer processor - the acquisition durations for each sensor of the sensor unit based on the provided digital representations, determined illumination duration and optionally the determined flicker cycle, and v. determining - with the computer processor - the illumination time points for each illuminant of the light source and the acquisition time points for each sensor of the sensor unit based on the data determined in step (d) and optionally in step (b), and vi. optionally providing the data determined in step (e) via a communication interface.
Use of this method to determine the synchronization is especially preferred if delta-calculation is performed, i.e. if the separation of luminesced and reflected light used for object detection is performed physically.
In another example, the synchronization is determined by determining the illumination duration of each illuminant or each solid state lighting system of each illuminant required to obtain sufficient sensor exposure as previously described, adapting the acquisition duration to the illumination duration and defining the switching order of the illuminants. This is preferred if separation of luminesced and reflected light used for object detection is performed computationally.
The determined synchronization can be provided to the control unit described later on for controlling the light source and sensors according to the determined synchronization. The processing unit may be configured to adjust the determined synchronization based on the acquired data as described below, for example by determining the flicker cycle and/or sensitivity of each sensor during regular intervals and adjusting the durations and/or start points if needed.
In an aspect, the system further comprises a control unit configured to control the light source and/or the sensor unit. Suitable control units include Digilent Digital Discovery controllers providing ~1 microsecond level control or microcontrollers, such as PJRC Teensy® USB Development Boards. Microcontrollers or microprocessors refer to semiconductor chips that contain a processor as well as peripheral functions. In many cases, the working and program memory is also located partially or completely on the same chip. The control unit may either be present within the processing unit, i.e. it is part of the processing unit, or it may be present as a separate unit, i.e. it is not part of the processing unit.
The control unit is preferably configured to control the light source by switching on and off the at least one illuminant and/or at least one solid lighting system of the at least one illuminant at at least one defined illumination time point for a defined illumination duration. The time points for switching on and off each illuminant and/or each solid state lighting system of each illuminant and each sensor is received by the control unit from the processing unit. The control unit is therefore preferably connected via a communication interface with the processing unit. In one example, the determined synchronization is provided to the control unit and is not adjusted after providing the synchronization to the control unit. In this case, a fixed synchronization is used during object recognition.
In another example, the determined synchronization can be dynamically adjusted based on real time evaluation of the sensor readings to ensure that different levels of ambient lighting or different distances from the system to the object are considered, thus increasing the accuracy of object recognition. This may be performed, for example, by determining the flicker cycle and/or the sufficient exposure of each sensor and adjusting the acquisition duration and/or the illumination duration and/or the defined time points for each sensor and/or each illuminant accordingly. The flicker cycle and the adjustments may be determined by the processing unit and the determined adjustments are provided via to communication interface to the control unit.
In one example, the control unit is configured to switch on the illuminant(s) or the solid lighting system(s) of the illuminant(s) according to their respective wavelength (i.e. from the shortest to the longest or vice versa) and to switch on each sensor of the sensor device sequentially. In another example, the control unit is configured to switch on the illuminant(s) or the solid lighting system(s) of the illuminant(s) in an arbitrary order, i.e. not sorted according to their wavelength, and to switch on the corresponding sensor associated with the respective illuminant or the respective solid lighting system of the respective illuminant. In case the light source comprises multiple illuminants with the same color or illuminants comprising multiple solid lighting systems with the same color (for example two blue, two green and two red illuminants or solid lighting systems), the control unit may be configured to cycle through each color twice, i.e. by switching on bluel , greenl , red1 , blue2, green2, red2, to achieve a more uniform white balance over time.
In one example, the control unit is configured to switch on each sensor without switching on any illuminant and/or any solid lighting system of any illuminant after each illuminant and/or each solid lighting system of each illuminant has been switched on (i.e. after one cycle is complete) to acquire the background data (i.e. data without the light source of the inventive system being switched on) required for delta-calculation. Measurement of the background data is performed using the same defined time points and defined duration(s) for each sensor as used during the cycling through the illuminants/solid lighting systems of the illuminants (i.e. if defined durations of 1/60, 2/60, 3/60 and 4/60 of a second were used during data acquisition with the illuminants/solid lighting systems being switched on, the same durations are used for acquisition of the background data). In another example, the background measurements are made at different intervals, such as for every sensor capture or between multiple cycles, depending on the dynamism of the scene, desired level of accuracy, and desired acquisition time per cycle. The acquired background data is subtracted from the illuminator/solid lighting system “on” acquired data using the corresponding acquisition duration to yield the differential image as previously described. This allows to account for common sources of indoor lighting flicker and thus allows to use the inventive systems under real-life conditions with a high accuracy of object recognition.
The control unit may be configured to add extra illumination to the scene by switching on an illuminant/solid lighting system of an illuminant at a time when all sensors of the sensor unit are switched off to achieve better color balance between the illuminants and/or the solid lighting systems of the illuminant and to improve the white balance of the overall illumination.
In an aspect, the system comprises a scene mapping tool configure to map the scene to obtain an at least partial 3D map of the scene as previously described in relation to the inventive method.
In an aspect, the processing unit comprises a first processing unit in communication with the sensor unit, the communication interface and a second processing unit, the first processing unit programmed to o determine a set of object identification hypotheses about the object(s) to be recognized in the scene based on the
■ the provided data of the scene and the provided digital representations of predefined objects or
■ the provided digital representation of the scene, and o provide via the communication interface the determined set of object identification hypotheses and optionally data of the scene to the second processing unit; a second processing unit in communication with the communication interface and the first processing unit, the second processing unit programmed to o refine the received set of object identification hypothesis to identify at least one object by revising at least certain of said associated confidence scores based on the provided digital representation of the scene or based on the provided data of the scene and the provided digital representations of pre-defined objects, and o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
In an aspect, the system further comprises a display unit configured to display the determined object(s) and optionally further data. The display unit may be a display device having a screen on which the determined objects and optionally further data may be displayed to the user. Suitable display units include stationary display devices (e.g. personal computers, television screen, screens of smart home systems being installed within a wall/on a wall) or mobile display devices (e.g. smartphones, tablets, laptops). The display device can be connected with the processing unit via a communication interface which may be wired or wireless. The further data may include data acquired on the object specific reflectance and/or luminescence properties, determined further object specific reflectance and/or luminescence properties, data from the control unit, such as switching cycles of illuminant(s) and sensor(s), used matching algorithms, results obtained from the matching process and any combination thereof.
Embodiments of the inventive training method:
Step a):
In step a) of the inventive training method, data of a scene is provided. Said data includes image(s) of the scene and data on object specific luminescence and optionally reflectance properties of at least one object being present in the scene. Such data may be obtained using a luminescence object recognition system, for example a luminescence object recognition system described previously, which comprises at least one light source and at least one sensor unit configured to acquire object specific luminescence and optionally reflectance properties upon illuminating the scene with the light source. The data on the scene is provided via a communication interface to a computer processor. The computer processor may be located within the luminescence object recognition system or may be located in a further processing unit which is present separate from the luminescence object recognition system. In one example, steps b1) and optionally b2) are performed with the same processor. In another example, steps b1) and optionally b2) are performed with separate processors. For example, step b1) is performed with the processor of the luminescence object recognition system while optional step b2) is performed with a further processor being located separate from the luminescence object recognition system.
Step b):
In step b), a labelled image is calculated with the computer processor. This includes annotating each classified pixel of each image with an object specific label based on the provided data on object specific luminescence and optionally reflectance properties in a first step b1). The object specific label indicates the name of the object which was identified by the computer processor based on the provided object specific luminescence and optionally reflectance properties as described in the following. In an aspect, step b1) includes providing via a communication interface to the computer processor digital representations of pre-defined objects and optionally a digital representation of the scene, detecting, using the computer processor, for each image of the scene regions of luminescence by classifying the pixels associated with the detected regions of luminescence, determining the object(s) associated with the detected regions of luminescence and being present in each image by determining the best matching reflectance and/or luminescence properties for each detected region of luminescence, obtaining the object(s) assigned to the best matching reflectance and/or luminescence properties and optionally refining the obtained object(s) based on the provided digital representation of the scene, and annotating each classified pixel of each image with an object specific label based on the determined object(s) and the associated detected regions of luminescence.
The digital representations of pre-defined objects contain data on object specific luminescence and optionally reflectance properties which may be interrelated with the respective object name. The digital representation of the scene contains the data previously described in relation to the inventive object recognition method. The digital representations can be provided to the computer processor by retrieving them from a data storage medium, such as a database as previously described, for example by using a scene identifier to retrieve the appropriate digital representation of the scene.
Detection of regions of luminescence, in particular regions having similar luminescence to ensure that overlapping objects with different luminescence are identified as different objects, can be accomplished as previously described in relation to the determination of further object specific luminescence and/or reflectance properties. In one example, differential data is generated prior to detecting regions of luminescence as previously described in relation to the determination of further object specific luminescence and/or reflectance properties.
In one example, luminescence spectral patterns and/or reflective spectral patterns are determined for the determined regions of luminescence prior to determining the best matching luminescence and optionally reflectance properties. The luminescence spectral patterns and/or reflective spectral patterns can be determined as previously described in relation to the inventive method for object recognition.
Afterwards, the object(s) associated with the detected regions of luminescence are determined with the computer processor by determining the best matching luminescence and optionally reflectance properties for each detected region of luminescence as previously described in relation to the inventive method for object recognition. After determining the best matching luminescence and optionally reflectance properties, object(s) assigned to these best matching properties are retrieved from the provided digital representations of pre-defined objects or from a database as previously described. The list of obtained object(s) may be refined as previously described based on the provided digital representation of the scene, for example if the list of obtained object(s) contains more than one object. This ensures a higher recognition accuracy in cases with ambiguous identification based on the provided luminescence and optionally reflectance properties.
Finally, each classified pixel in each image is annotated with an object specific label based on the determined object(s) and the associated detected regions of luminescence.
In optional step b2), bounding boxes are created around the objects determined in step b1) based on the annotated pixels or the images obtained in step b1) are segmented based on the annotated pixels. Bounding boxes and image segmentation based on the annotated pixels can be done using methods commonly known in the art for image segmentation and bounding box creation. Bounding boxes describe the spatial location of the objects determined in step b1). In one example, the bounding box is rectangular, and is defined by the x and y coordinates of the upper-left corner and the x and y coordinates of the lower-right corner of the rectangle. In another example, a bounding box representation using the (x)(y)-axis coordinates of the bounding box center and the width and height of the box is used. Bounding boxes can, for example, be created based on coordinate information of the annotated pixels. Image segmentation is a commonly used technique in digital image processing and analysis to partition an image into multiple parts or regions based on the characteristics of the pixels in the image. In this case, image segmentation may involve clustering regions of pixels according to contiguous matching chromaticities.
Step c) :
In step c) of the inventive training method, the labelled images calculated in step b) are provided - optionally in combination with the digital representation of the scene - to an object recognition neural network. Suitable object recognition neural networks are convolution neural networks (CNNs) known in the state of the art, such as Inception/GoogLeNet, ResNet-50, ResNet-34, MobileNet v2, VGG-16, MobileNet v2-SSD and YoloV3, V4 or V5. Each layer of the CNN is known as a feature map. The feature map of the input layer is a 3D matrix of pixel intensities for different color channels (e.g. RGB). The feature map of any internal layer is an induced multi-channel image, whose ‘pixel’ can be viewed as a specific feature. Every neuron is connected with a small portion of adjacent neurons from the previous layer (receptive field). Different types of transformations can be conducted on feature maps, such as filtering and pooling. Filtering (convolution) operation convolutes a filter matrix (learned weights) with the values of a receptive field of neurons and takes a non-linear function (such as sigmoid, ReLLI) to obtain final responses. Pooling operation, such as max pooling, average pooling, L2-pooling and local contrast normalization, summaries the responses of a receptive field into one value to produce more robust feature descriptions. With an interleave between convolution and pooling, an initial feature hierarchy is constructed, which can be fine-tuned in a supervised manner by adding several fully connected (FC) layers to adapt to different visual tasks. According to the tasks involved, the final layer with different activation function is added to get a specific conditional probability for each output neuron. And the whole network can be optimized on an objective function (e.g. mean squared error or cross-entropy loss) via the stochastic gradient descent (SGD) method. A typical CNN performing object recognition has totally 13 convolutional (conv) layers, 3 fully connected layers, 3 max-pooling layers and a softmax classification layer. The conv feature maps are produced by convoluting 3*3 filter windows, and feature map resolutions are reduced with 2 stride max-pooling layers.
In an aspect, the object recognition neural network is a deep convolutional neural network comprising a plurality of convolutional neural network layers followed by one or more fully connected neural network layers. The deep convolutional neural network may comprise a pooling layer after each convolutional layer or after a plurality of convolutional layers and/or may comprise a non-linear layer after each convolutional layer, in particular between the convolutional layer and the pooling layer.
Step d):
In step d) of the inventive training method, the provided object recognition neural network is trained using the provided calculated labelled images of the scene and optionally the provided digital representation of the scene as input to recognize each labelled object in the inputted calculated labelled images.
In an aspect, training the object recognition neural network includes verifying the accuracy of the object recognition neural network by providing images of a scene comprising known objects, comparing the produced output values with expected output values, and modifying the object recognition neural network using a back-propagation algorithm in case the received output values do not correspond to the known objects. Modifying the object recognition neural network using a back-propagation algorithm can be performed, for example, as described in Ian G. et. al, “Deep Learning”, Chapter 6.5 “Back-Propagation and Other Differentiation Algorithms”, MIT Press, 2016, pages 200 to 220 (ISBN: 9780262035613). Embodiments of the inventive computer-implemented method or recognizing at least one object in a scene:
Step (A):
In step (A) of the inventive method for object recognition, a trained object recognition neural network is provided via a communication interface to the computer processor. In a preferred embodiment, the provided object recognition neural network has been trained according to the training method described previously, i.e. by automatic labelling of data generated by a luminescence object recognition system. The computer processor may be located on a remote computing device or may be located in a cloud environment. The trained object recognition neural network may be stored on a data storage medium, such as an internal medium of the remote computing device or in a cloud environment and may be accessed by the computer processor via a communication interface. Storage of the trained object recognition neural network in a cloud environment allows to access the latest version of the trained object recognition neural network without performing any firmware updates.
Step (B) :
In step (B) of the inventive method, data of the scene is provided via a communication interface to the computer processor. In one example, data of the scene includes image(s) of the scene. Images of the scene may be acquired, for example, by use of a camera, such as a commercially available video camera. In another example, data of the scene includes data on object specific reflectance and/or luminescence properties of at least one object having these properties and being present in the scene. Such data can be acquired, for example, by use of a luminescent object recognition system comprising a light source and a sensor unit as described previously. In yet another example, data of the scene includes image(s) of the scene as well as data on object specific reflectance and/or luminescence properties of at least one object being present in the scene.
Optional step (C):
In optional step (C), digital representations of pre-defined objects and/or a digital representation of the scene is provided to the computer processor via a communication interface. The digital representation of the scene contains the previously described data and may improve the recognition accuracy of the object recognition because its use allows to identify an object in case of ambiguous object identification based on the provided data of the scene. Step (D):
In step (D), at least one object being present in the scene is determined with the computer processor based on the provided trained object recognition neural network, the provided data of the scene and optionally the provided digital representation of the scene. In an aspect, step d) includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene, the provided digital representations of pre-defined objects and optionally the provided digital representation of the scene, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores using the provided object recognition neural network and optionally the provided digital representation of the scene.
In one example, the object identification hypothesis about the object(s) to be recognized in the scene may include best matching luminescence and/or reflectance properties. This best matching luminescence and/or reflectance properties can be determined as previously described from the provided data on object specific luminescence and/or reflectance properties or from further object specific luminescence properties determined from said data as described previously. In another example, the object identification hypothesis about the object(s) to be recognized in the scene may include best matching objects obtained using the best matching luminescence and/or reflectance properties. This aspect is preferred if the trained object recognition neural network is used in combination with a luminescence object recognition system and allows to perform object recognition using information devoid of shape (i.e. the data on object specific luminescence and/or reflectance properties) in combination with information on the shape (i.e. images of the scene) to improve the recognition accuracy in case of ambiguous object recognition when only using information devoid of shape or information on the shape. The recognition accuracy can be further boosted by using information on the scene to remove any remaining object recognition ambiguity.
In an alternative aspect, step (D) includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene, the provided object recognition neural network and optionally the provided digital representation of the scene, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores using the provided data of the scene, the provided digital representations of pre-defined objects and optionally the provided digital representation of the scene.
The object identification hypothesis about the object(s) to be recognized in the scene may be the object(s) identified using the trained object recognition neural network. Refining the set of object identification hypotheses about the object(s) to be recognized includes determining best matching luminescence and/or reflectance properties as previously described, obtaining the objects associated with the best matching luminescence and/or reflectance properties and comparing the obtained objects to the determined object identification hypothesis as previously described to identify the object(s) present in the scene.
In another alternative aspect, step (D) includes inputting the provided data of the scene, in particular the provided image(s) of the scene, and optionally the provided digital representation of the scene into the provided object recognition neural network. This aspect may be preferred if the trained object recognition neural network is used without the luminescence object recognition system, for example if the use of such a luminescence object recognition system is not preferred or not possible.
Optional step (E):
In optional step (E) of the inventive method, the at least one identified object is provided via a communication interface and/or at least one action associated with the identified object(s) is triggered. The identified object may be provided by displaying the identified object(s) on the screen of a display device as previously described. The at least one action with may be triggered may be an action as previously described.
Embodiments of the inventive system for recognizing at least one object in a scene:
Suitable sensor units include commercially available video cameras as well as cameras described in relation to the inventive system for recognizing at least one object having object specific luminescence properties in a scene. The system may further comprise at least one light source, for example at least one light source previously described. The system may further comprise at least one control unit which is configured to synchronize the light source and the sensor unit as previously described. Suitable data storage media and processing units include the ones previously described. Determination of the at least one object in the scene with the processing unit is performed as described previously in combination with the inventive method for recognizing at least one object in the scene. Embodiments of the inventive use of the inventive methods and systems:
In an aspect of the inventive use, the system comprising the neural network trained according to the inventive training method is used in a different scene than the scene used to generate the training data set with the proviso that the same objects to be detected are expected to occur in the different scene than in the scene used to generate the training data set. The training data set is preferably generated as described in relation to the inventive training method. This allows to use the inventive system comprising appropriately trained neural network in similar scenes without having to train the neural network for each specific scene, thus allowing to transfer the inventive system to different scene where the same objects to be recognized are expected to occur without negatively influencing the recognition accuracy.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other features of the present invention are more fully set forth in the following description of exemplary embodiments of the invention. To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure number in which that element is first introduced. The description is presented with reference to the accompanying drawings in which:
Fig. 1 is a block diagram of a computer-implemented method for recognizing at least one object having object specific luminescence properties in a scene according to the invention described herein;
Fig. 2a is a first example of a system for recognizing at least one object having object specific luminescence properties in a scene according to the invention described herein;
Fig. 2b is a second example of a system for recognizing at least one object having object specific luminescence properties in a scene according to the invention described herein;
Fig. 3 is an example of a method for training an object recognition neural network using data on object specific luminescence and optionally reflectance properties of at least one object being present in the scene according to the invention described herein;
Fig. 4 is an example of a computer-implemented method for recognizing at least one object in a scene using a trained object recognition neural network according to the invention described herein;
Fig. 5 is an example of a system for training an object recognition neural network according to the method of FIG. 3 or for recognizing at least one object in a scene using a trained object recognition neural network in combination with a luminescent object recognition system according to the invention described herein. Fig. 6 is an example of a system for recognizing at least one object in a scene using a trained object recognition neural network according to the invention described herein.
DETAILED DESCRIPTION
The detailed description set forth below is intended as a description of various aspects of the subject-matter and is not intended to represent the only configurations in which the subjectmatter may be practiced. The appended drawings are incorporated herein and constitute a part of the detailed description. The detailed description includes specific details for the purpose of providing a thorough understanding of the subject-matter. However, it will be apparent to those skilled in the art that the subject-matter may be practiced without these specific details.
FIG. 1 depicts a non-limiting embodiment of a method 100 for recognizing at least one object having object specific luminescence and/or reflectance properties in a scene. In this example, the object to be recognized is imparted with luminescence by use of a fluorescent coating on the surface of the object and the scene is located indoors. In another example, the scene may be located outdoors. In this example, a display device is used to display the determined objects on the screen, in particular via a GUI. Suitable luminescence object recognition systems which can be used to perform method 100 are described, for example, in relation to Figures 2a and 2b, in unpublished patent application US 63/139,299 and in published patent applications US 2020/279383 A1 , WO 2020/245443 A2, WO 2020/245442 A1 , WO 2020/245441 A1 , WO 2020/245439 A1 and WO 2020/245444 A1 .
In block 102 of method 100, routine 101 determines whether ambient light compensation (ALC) is to be performed, i.e. whether the amount of ambient light in the scene is above a defined threshold. This will normally be the case if method 100 is performed in a scene light by ambient light sources, such as sunlight, other natural lightning or artificial ambient lighting. In contrast, no ALC will be required if the object recognition is performed in the absence of ambient light, for example in a dark environment, or if detection of fluorescence is performed in filtered regions, such as described in published patent applications WO 2020/245443 A2 and WO 2020/245442 A1. In case routine 101 determines that ambient light compensation (ALC) is to be performed, routine 101 proceeds to block 104, otherwise routine 101 proceeds to block 116. In block 104, routine 101 determines whether flicker of the ambient light requires the flickering to be compensated or not. Flicker compensation is normally necessary if object recognition is performed indoors and separation of luminesced and reflected light upon illumination of the scene with the light source is achieved physically, for example by use of a filter before each sensor of the sensor unit, such as described in US application number 63/139,299. If it is determined that flicker compensation is to be performed, routine 101 proceeds to block 106, otherwise routine 101 proceeds to block 116 described later on, for example if separation of luminesced and reflected light is achieved computationally, such as described in published patent applications US 2020/279383 A1 and WO 2020/245444 A1 , or if the ambient light present in the scene is exclusively resulting from sunlight or other natural light sources.
In block 106, routine 101 determines whether the flicker compensation is to be performed using phase-locking (i.e. setting the switch on of each sensor to a pre-defined time point) or is to be performed using a multiple of the flicker cycle. This determination may be made according to the programming of the processor implementing routine 101. In one example, a pre-defined programming is used, for example if the illumination setup of the scene is known prior to installation of the luminescence object recognition system. In another example, the processor determines the configuration and type of illuminants present in the scene, for example by connecting the illuminants via Bluetooth to the processor such that the processor is able to retrieve their configuration and type. In case routine 101 determines in block 106 that phaselocking is to be performed, it proceeds to block 108, otherwise it proceeds to block 112.
In block 108, routine 101 determines and sets the phase-lock for each sensor of the sensor unit. This may be accomplished by determining the light variation or the line voltage fluctuation present in the scene using commonly known methods. Normally, the flicker cycle of commonly used illuminants depends on the utility frequency present at the scene. If a 60 Hz utility frequency is used, the frequency of the flicker cycle will be 120 Hz. If a 50 Hz utility frequency is used, the flicker cycle will be 100 Hz. In one example, phase lock is performed relative to the light variation or relative to the line voltage fluctuation.
After the phase-lock is set for each sensor (i.e. after defined acquisition time points for switching on sensor have been determined), routine 101 proceeds to block 110.
In block 110, routine 101 determines and sets the acquisition duration for each sensor and the illumination duration for each illuminant. The acquisition and illumination durations may be determined as previously described, for example by using the method described in unpublished US application 63/139,299. The setting may be performed according to pre- defined values which may be provided to routine 101 from an internal storage or a database. In case the method is repeated, the determination may be made based on previously acquired sensor data and object recognition accuracy. In case two sensors are used, each illuminant may be switched on when each sensor is switched on. If each sensor is switched on sequentially, then each illuminant may be switched on twice during each lightning cycle. The illumination duration is set to achieve a reasonable measurement within the range of the respective sensor, while leaving room for effect of the additional ambient lighting. Typically, a shorter illumination duration for the sensor measuring reflectance + luminescence is needed as compared to the sensor measuring luminescence only, as the measurement for the reflectance + luminescence contains the reflected light from the illuminator(s), and reflection is typically much stronger than luminescence. In case each illuminant is switched on twice, the illumination duration for each switch-on may vary.
In block 112, routine 101 determines and sets fixed acquisition durations for each sensor. The acquisition durations may be determined as previously described, for example by using the method described in unpublished US application 63/139,299. The fixed acquisition durations may be adapted to the flicker cycle present in the scene. For a 60 Hz utility frequency having a flicker of 120 Hz, acquisition durations of 1/60, 2/60, 3/60 and 4/60 of a second may be used. For a 50 Hz utility frequency having a flicker of 100 Hz, acquisition durations of 1/50, 2/50, 3/50 and 4/50 of a second may be used. The defined acquisition durations may either be preprogrammed or may be retrieved by routine 101. Retrieving the defined acquisition durations may include determining the utility frequency used in the scene, the type of sensors present in the sensor unit and the type of illuminants of the light source and retrieving the defined acquisition durations associated with the determined utility frequency, the determined type of sensors and the determined type of illuminants from a storage medium, such as the internal storage or a database.
In block 114, routine 101 determines and sets the defined acquisition time points to switch on each sensor and the illumination duration for each illuminant. This determination may be made as previously described in relation to block 110.
In block 116, routine 101 determines and sets the sequence of each illuminant and each sensor (i.e. in which order each illuminant and each sensor is switched on and off). Routine 101 may determine the sequence based on pre-defined criteria, such a specific order based on the wavelength of the illuminants or it may arbitrarily select the order. Based on the order of the illuminates, routine 101 may either determine the order of each sensor or may use a pre-defined order, for example sequential order of the sensors. In block 118, routine 101 instructs the light source to illuminate the scene with the illuminants and to acquire data on object specific luminescence and/or reflectance properties according to the settings determined in blocks 108, 110 and 116 or 112, 114, 116 or in block 116 (in case ALC is not required). In one example, this is performed by providing the settings determined in blocks 108, 110 and 116 or 112, 114, 116 or in block 116 to a control unit connected with the sensor unit and the light source. In another example, the light source and the sensor unit are controlled directly using the processor implanting routine 101. The acquired data may be stored on an internal memory of the sensor unit or may be stored in a database which is connected to the sensor unit via a communication interface.
In block 120, routine 101 determines whether further processing of the acquired data, for example delta calculation and/or identification of luminescence regions and/or conversion of the data resulting from the delta calculation and/or determination of luminescence/reflectance patterns is to be performed. If this is the case, routine 101 proceeds to block 122, otherwise routine 101 proceeds to block 126 described later on. The determination may be made based on the programming and may depend, for example, on the data contained in the digital representations of pre-defined objects used to determine the object(s) present in the scene and/or on the conditions present upon acquisition of data of the scene (i.e. if ALC is required or not) and/or on the specific hardware configuration of the luminescence object recognition system (i.e. if separation of luminesced and reflected light upon illumination of the scene with the light source is achieved computationally or physically).
In block 122, routine 101 determines whether the further processing is to be performed remotely, i.e. with a further processing device being present separately from the processor implementing routine 101. This may be preferred if the further processing requires a large computing power. If routine 101 determines in block 122 that the further processing is to be done remotely, it proceeds to block 140 described later on, otherwise it proceeds to block 124.
In block 124, routine 101 determines further luminescence and/or reflectance properties as previously described by determining differential data (i.e. performing the delta-calculation previously described) and optionally converting the differential data and/or by identifying luminescence regions in the acquired or differential data and/or by determining the luminescence and/or reflectance spectral patterns from the acquired or differential data. The processed data may be stored on a data storage medium, such as the internal storage or a database prior to performing the blocks described in the following. In block 126, routine 101 determines whether to perform a flicker analysis or flicker measurement. If this is the case, routine 101 proceeds to block 150, otherwise it proceeds to block 128.
In block 128, routine 101 retrieves digital representations of pre-defined objects and the digital representation of the scene from at least one data storage medium, such as a database. In one example, the digital representation of pre-defined objects and the digital representations of the scene are stored on different data storage media. In another example, the digital representation of pre-defined objects and the digital representations of the scene are stored on the same data storage medium. The data storage medium/media is/are connected to the processor implementing routine 101 via a communication interface. The digital representation of the scene may be retrieved by routine 101 using a scene identifier which is indicative of the scene for which data is acquired in block 118. The scene identifier may be provided to routine 101 upon installation of the luminescence object recognition system and is preferably a unique identifier of the scene. The scene identifier can be a name, a number or a combination thereof.
In block 130, routine 101 determines a set of object identification hypothesis based on
- the data acquired in block 118 or the further object specific luminescence and/or reflectance properties determined in block 124/142 and the digital representations of pre-defined objects retrieved in block 128 (option A) or
- the digital representation of the scene retrieved in block 128 (option B).
Routine 101 may choose the respective option according to its programming.
In one example of option A, the object identification hypotheses are corresponding to the best matching luminescence and/or reflectance properties and the associated confidence score is corresponding to the degree of matching obtained during calculation of the best matching reflectance and/or luminescence properties. In another example of option A, the object identification hypotheses are corresponding to the objects interrelated with the best matching luminescence and/or reflectance properties and the associated confidence score is corresponding to the degree of matching obtained during calculation of the best matching reflectance and/or luminescence properties. The best matching reflectance and/or luminescence properties may be calculated by applying any number of matching algorithms as previously described or by using a data driven model of light spectral distribution and intensity on the at least one object to be recognized to calculate the object specific reflectance and/or luminescence properties and applying any number of matching algorithms as previously described. In one example, the object(s) assigned to the best matching reflectance and/or luminescence properties are obtained by retrieving the object(s) associated with the best matching reflectance and/or luminescence properties from the provided digital representations of the pre-defined objects. This may be preferred if the digital representations of pre-defined objects contain reflectance and/or luminescence properties interrelated with the respectively assigned object. In another example, the object(s) assigned to the best matching reflectance and/or luminescence properties are obtained by searching a database for said object(s) based on the determined best matching reflectance and/or luminescence properties. The further database may be connected to the computer processor via a communication interface.
In option B, the object identification hypotheses and the associated confidence score is corresponding to the likelihood of the presence of the object(s) in the scene. The object identification hypotheses may be determined as previously described by determining the likelihood of the presence of object(s) in the scene based on the provided digital representation of the scene.
After routine 101 has determined a set of object identification hypothesis in block 130, routine 101 proceeds to block 132 and refines the set of object identification hypothesis determined in block 130. Depending on the option used in block 130 to determine the set of object identification hypothesis, routine 101 is programmed to perform the refinement of the determined object based on: the digital representation of the scene retrieved in block 128 in case the set of object identification hypothesis has been determined according to option A previously described or the data acquired in block 118 or the further object specific luminescence and/or reflectance properties determined in block 124/142 and the digital representations of predefined objects retrieved in block 128 in case the set of object identification hypothesis has been determined according to option B previously described.
Refining the set of object identification hypothesis determined in block 130 for option A may include determining - based on the digital representation of the scene - confidence score(s) for the determined set of object identification hypothesis and using the determined confidence scores to refine the confidence scores associated with the set of object identification hypotheses determined in block 130 (option A) to identify the at least one object. Determining the confidence score(s) for the set of determined object identification hypotheses based on the provided digital representation of the scene may include determining the likelihood of the presence of objects associated with the determined object identification hypotheses in the scene based on the data contained in the provided digital representation of the scene and associating higher confidence score(s) to objects having a higher probability to be present in the scene based on the data contained in the digital representation of the scene. The object can be determined from the two determined confidence scores by a number of different algorithms as previously described,
Refining the set of object identification hypothesis determined in block 130 for option B may include revising at least certain of the confidence scores associated with the object identification hypotheses determined in block 130 based on the provided data of the scene and/or the determined further reflectance and/or luminescence properties and the retrieved digital representation of pre-defined objects to identify the at least one object. This may include determining the confidence scores associated with object(s) present in the scene by calculating the best matching reflectance and/or luminescence properties as previously described and using the determined confidence score(s) to refine the confidence score(s) associated with the object identification hypothesis determined in block 130 to identify the at least one object as previously described. Refinement of the confidence score obtained using the digital representation of the scene with the confidence score(s) obtained by calculation the best matching reflectance and/or luminescence properties may be performed as previously described. The refinement step allows to increase the object recognition accuracy in case of ambiguous object identification based on the object specific luminescence and/or reflectance properties, thus boosting the overall object recognition accuracy of the inventive method.
In block 134, routine 101 provides the determined object(s) to a display device. The display device is connected via a communication interface to the processor implementing routine 101. The processor may provide further data associated with the determined object(s) for display on the screen, such as further data contained in the retrieved digital representation or further data retrieved from a database based on the determined object(s). Routine 101 may then proceed to block 102 or block 106 or block 118 and repeat the object recognition process according to its programming. Monitoring intervals of the scene may be pre-defined based on the situation used for object recognition or may be triggered by pre-defined events, such as entering or leaving of the room.
In block 136, the display device displays the data received from the processor in block 134 on the screen, in particular within a GUI.
In block 138, routine 101 determines actions associated with the determined objects and may display these determined actions to the user on the screen of the display device, this step being generally optional. The determined actions may be pre-defined actions as previously described. In one example, the determined actions may be performed automatically by routine 101 without user interaction. However, routine 101 may provide information about the status of the initiated action to the user on the screen of the display device. In another example, a user interaction is required after displaying the determined actions on the screen of the display device prior to initiating any action by routine 101 as previously described. Routine 101 may be programmed to control the initiated actions and to inform the user on the status of the initiated actions. After the end of block 138, routine 101 may return to block 102, 106 or 118 as previously described or may end method 100. For example, routine 101 may return to block 102 or 106 in case of low determined confidence scores to allow reexamination of the system settings and adjustment of the system settings to improve the determined confidence scores. In case of higher confidence scores, routine 101 may return to block 118 since the settings of the system seem to be appropriate for the monitored scene. Routine 101 may be programmed to return to block 102/106 or block 118 based on a threshold value for the determined confidence scores.
Block 140 of method 100 is performed in case routine 101 determines in block 122 that further processing of the data acquired in block 118 is performed remotely, i.e. by a processor being different from the processor implementing routine 101. In block 140, routine 101 provides the data acquired in block 118 to the further processing device which is connected with the processor implementing routine 101 via a communication interface. The further processing device may be a stationary processing device or may be located in a cloud environment and may implement method 100 using routine 10T.
In block 142, routine 10T determines further object specific luminescence and/or reflectance properties as described in relation to block 124.
In block 144, routine 10T determines whether a flicker analysis is to be performed as described in relation to block 126. If yes, routine 10T proceeds to block 150 described later on. Otherwise, routine 10T proceeds to block 146.
In block 146, routine 10T determines whether the object is to be determined with the further processor implementing routine 10T. If yes, routine 10T proceeds to block 128 and performs blocks 128 to 138 as described previously. Otherwise, routine 10T proceeds to block 148.
In block 148, routine 10T provides the further object specific luminescence and/or reflectance properties determined in block 142 to the computer processor implementing routine 101 , i.e. the computer processor performing steps 102 to 122 previously described. Afterwards, method 100 proceeds with block 128 as previously described. In block 150, routine 101 or routine 10T determines the effectiveness of flicker mitigation for example, by comparing background images acquired at different measurement times or determining whether background images are brighter than the raw images obtained upon illumination of the scene with the light source and measuring the resulting fluorescence (i.e. the raw data before the delta calculation is performed). If the background images are brighter than the raw images, it is likely that the flicker mitigation timing is not appropriate, and that the background images are acquired at a brighter period of the flicker cycle than the raw images.
In block 152, routine 101 or routine 101’ determines whether the flicker mitigation is satisfactory, for example by determining the ambient flicker contribution in the images and comparing the determined ambient flicker contribution to a pre-defined threshold value stored on a data storage medium. If the mitigation is satisfactory, routine 101 or 10T proceeds to block 128, otherwise routine 101 or 10T proceeds to block 154.
In block 154, routine 101 or routine 10T determines new phase-locking or multiples of the flicker cycle based on the results of block 150. The new phase-locking or multiples of the flicker cycle are then used upon repeating blocks 108 or 112.
FIG. 2a illustrates a non-limiting embodiment of a system 200 for recognizing at least one object having object specific luminescence and/or reflectance properties in a scene in accordance with a first embodiment of the invention which may be used to implement method 100 described in relation to FIG. 1 . In this example, system 200 monitors scene 202 comprising a waste bin as well as waste 204 (i.e. the object to be recognized) being thrown into the waste bin by a person. The waste bin may be located indoors, for example in a kitchen, bathroom etc., or may be located outdoors, for example in a public place, like a park. The information on the recognized object (i.e. the object being discarded) may be used for reordering purposes, for waste separation planning, etc.
System 200 is a luminescent object recognition system comprising a light source 206, a sensor unit 208, a control unit 210, a processing unit 216 and databases 218, 220. The light source 206 and sensor unit 208 are each connected via communication interfaces 222, 224 to the control unit 208. In this example, the light source 206 comprises 2 illuminants, such as LEDs, fluorescent illuminants, incandescent illuminants or a combination thereof. In another example, the light source 206 comprises only 1 illuminant or more than 2 illuminants. In this example, the sensor unit 208 comprises one sensor, such as a camera. In another example, the sensor unit 208 comprises at least 2 sensors. The light source 206 and/or the sensor unit 208 may comprise filters (not shown) in front of each illuminant and/or each sensor. Suitable combinations of light source and sensor unit for luminescent object recognition are, for example, disclosed in unpublished patent application US 63/139,299 and published patent applications US 2020/279383 A1 , WO 2020/245442 A1 , WO 2020/245441 A1 and WO 2020/245444 A1.
Control unit 210 is connected to processor 212 of processing unit 216 via communication interface 226 and is configured to control the illuminants of the light source and/or the sensors of the sensor unit by switching on at least one illuminant of the light source and/or at least one sensor of the sensor unit at pre-defined time point(s) for a pre-defined duration. To ensure that each sensor of sensor unit 208 acquires data upon illumination of the scene with at least one illuminant of light source 206, control unit 210 preferably synchronizes the switching of the illuminants of light source 206 and the sensors of sensor unit 208 as previously described, for example as described in relation to the ALC mentioned with respect to FIG. 1. Control unit 210 may receive instructions concerning the synchronization from processor 212. Suitable control units include Digilent Digital Discovery controllers or microcontrollers. In this example, control unit 210 is present separately from processing unit 216. In another example, processing unit 216 comprises control unit 210. In another example of system 200, processor 212 of processing unit 216 is used to control the illuminants of light source 206 and the sensors of sensor unit 208 and a control unit is not present.
The processing unit 216 houses computer processor 212 and internal memory 214 and is connected via communication interfaces 226, 228, 230 to the control unit 210 and databases 218 and 220. In this example of system 200, the processing unit 216 is part of the luminescent object recognition system 200. In another example (not shown), the processing unit 216 may be located on a cloud environment and system 200 may transfer the acquired data to said processing unit for further processing. This may reduce the costs of system 200 but requires cloud access to perform object recognition and may be preferable if a large amount of computing power is required to determine the objects in the scene using the processing unit. The processor 212 is configured to execute instructions, for example retrieved from memory 214, and to carry out operations associated with the system 200, namely o determine the at least one object in the scene based on
■ the provided data of the scene,
■ the provided digital representations of pre-defined objects, and
■ the provided digital representation of the scene, o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s). The processor 212 can be a single-chip processor or can be implemented with multiple components. In most cases, the processor 212 together with an operating system operates to execute computer code and produce and use data. In this example, the computer code and data resides within memory 214 that is operatively coupled to the processor 212. Memory 214 generally provides a place to hold data that is being used by the system 200. By way of example, memory 214 may include Read-Only Memory (ROM), Random-Access Memory (RAM), hard disk drive and/or the like. In another example, computer code and data could also reside on a removable storage medium and loaded or installed onto the computer system when needed. Removable storage mediums include, for example, CD-ROM, PC- CARD, floppy disk, magnetic tape, and a network component.
Database 218 comprises digital representations of pre-defined objects and is connected via communication interface 228 to processing unit 216. In one example, the digital representations of pre-defined objects stored in database 218 are used by processor 212 of processing unit 216 to determine a set of object identification hypotheses by calculating best matching luminescence and/or reflectance properties based on the retrieved digital representations and the provided data of the scene or the processed data of the scene. In another example, the digital representations of pre-defined objects stored in database 218 are used by processor 212 of processing unit 216 to refine the determined set of object identification hypotheses which have been determined using the digital representation of the scene.
Database 220 comprises digital representations of the scene and is connected via communication interface 230 to processing unit 216. In one example, the digital representations of the scene stored in database 220 are used by processor 212 of processing unit 216 to determine a set of object identification hypotheses based on the retrieved digital representations. In another example, the digital representations of the scene stored in database 220 are used by processor 212 of processing unit 216 to refine the set of object identification hypotheses which have been determined using the digital representations of predefined objects.
In one example, system 200 further comprises a database (not shown) containing actions associated with pre-defined objects which is connected to the processing unit 216 via a communication interface. This allows to trigger an action, for example re-ordering consumed items, upon detection of an object in the scene (for example upon detection of an object being thrown into the waste bin). For this purpose, the action may be retrieved from the database by the processor 212 and may be triggered automatically and/or may be shown on the screen of a display device (for example, if user interaction is required prior to triggering the action).
In one example, system 200 further comprises a display device having a screen (not shown) and being connected to processing unit 216 via a communication interface. The display device displays the at least one object determined and provided by processing device 216, in particular via a graphical user interface (GUI), to the user. The display device may be a stationary display device, such as a peripheral monitor, or a portable display device, such as a smartphone, tablet, laptop etc.. By way of example, the screen of the display device may be a monochrome display, color graphics adapter (CGA) display, enhanced graphics adapter (EGA) display, variable-graphics-array (VGA) display, Super GA display, liquid crystal display (e.g., active matrix, passive matrix and the like), cathode ray tube (CRT), plasma displays and the like. In another example, system 200 may not comprise a display device. In this case, the recognized objects may be stored in a database (not shown) or used as input data for a further processing unit (not shown).
FIG. 2b illustrates a non-limiting embodiment a system 201 for recognizing at least one object having object specific luminescence and/or reflectance properties in scene 202’ in accordance with a second embodiment of the invention and may be used to implement method 100 described in relation to FIG. 1. In this example, the luminescence object recognition system 201 monitors scene 202’ comprising a waste bin as well as waste 204’ (i.e. the object to be recognized) being thrown into the waste bin by a person. The waste bin may be located indoors, for example in a kitchen, bathroom etc., or may be located outdoors, for example in a public place, like a park. The information on the recognized object (i.e. the object being discarded) may be used for reordering purposes, for waste separation planning, etc.
System 201 comprises a similar luminescent object recognition system as described in relation to FIG. 2a. With respect to FIG. 2a previously described, system 201 comprises a further processing unit 234’. This allows for shifting tasks requiring high amounts of computing power to a further processing unit not being part of the luminescent object recognition system, thus reducing the costs and energy consumption of the luminescent object recognition system.
The processor 212’ of the first processing unit 216’ is configured to execute instructions, for example retrieved from memory 214’, and to carry out operations associated with the system 201 , namely to o determine a set of object identification hypotheses about the object(s) to be recognized in the scene based on the ■ the provided data of the scene and the provided digital representations of pre-defined objects or
■ the provided digital representation of the scene, and o provide via the communication interface the determined set of object identification hypotheses and optionally data of the scene to the second processing unit.
Suitable examples of processor 212’ and memory 214’ have been described in relation to FIG. 2a.
The digital representations of pre-defined objects or the digital representations of the scene are stored in database 218’ connected via communication interface 226’ to processor 212’ of the first processing unit 216’. The digital representations stored in this database are retrieved by processor 212’ upon determination of a set of object identification hypotheses about the object(s) to be recognized in the scene as previously described.
The first processing unit 216’ is connected via communication interface 228’ to the second processing unit 234’ to provide the determined set of object identification hypotheses and optionally data of the scene acquired by the luminescent object recognition system 240’ to the second processing unit 234’. Communication interface 228’ may be a gateway as described in relation to FIG. 5 below and/or the first processing unit 216’ or the second processing unit 234’ may contain a gateway functionality as described in relation to FIG. 5 below. Data of the scene provided to the second processing unit 234’ may have been processed by the first processing unit 216’, for example by determining further object specific luminescence and/or reflectance properties from the acquired data as previously described. The second processing unit 234’ may be located on a stationary local processing device or in a cloud environment. The processor 230’ of the second processing unit 234’ is configured to execute instructions, for example retrieved from memory 232’, and to carry out operations associated with the system 201 , namely to o refine the received set of object identification hypothesis to identify at least one object by revising at least certain of said associated confidence scores based on the provided digital representation of the scene or based on the provided data of the scene and the provided digital representations of pre-defined objects, and o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
The digital representations of pre-defined objects or the digital representations of the scene are stored in database 236’ connected via communication interface 238’ to processor 230’ of the second processing unit 234’. The digital representations stored in this database are retrieved by processor 230’ upon refining the set of object identification hypotheses received from the first processing unit 216’ as previously described.
System 201 may further comprise a database having stored therein actions interrelated with pre-defined objects and/or a display device having a screen (not shown) as described in relation to FIG. 2a.
FIG. 3 depicts a non-limiting embodiment of a method 300 for training an object recognition neural network using data on object specific luminescence and optionally reflectance properties of at least one object being present in the scene. The data on object specific luminescence and optionally reflectance properties can be acquired and processed using system 200 or 201 described in relation to FIGs. 2a and 2b or the luminescence object recognition systems disclosed in unpublished patent application US 63/139,299 and published patent applications US 2020/279383 A1 , WO 2020/245442 A1 , WO 2020/245441 A1 and WO 2020/245444 A1.
In block 302 of method 300, routine 301 retrieves data of a scene via a communication interface, said data including image(s) of the scene as well as data on object specific luminescence and optionally reflectance properties of at least one object having said properties and being present in the scene. Data on object specific luminescence and optionally reflectance properties can be acquired using system 200 or 201 described in relation to FIGs. 2a and 2b or the luminescence object recognition systems disclosed in unpublished patent application US 63/139,299 and published patent applications US 2020/279383 A1 , WO 2020/245442 A1 , WO 2020/245441 A1 and WO 2020/245444 A1 . In one example, data of the scene is retrieved by routine 301 via a communication interface from the luminescence object recognition system. This allows to provide the data to the computer processor implementing routine 301 directly after data acquisition. In another example, data of the scene is retrieved by the computer processor implementing routine 301 from a data storage medium, such as a database or an internal memory (for example located within the luminescence object recognition system), which is connected via a communication interface to the computer processor implementing routine 301. This may be preferred if the acquired data is stored prior to performing method 300. In one example, data of the scene acquired by the sensor unit of the luminescence object recognition system is retrieved by the processor of the object recognition system implementing routine 301. In another example, data of the scene acquired by the sensor unit of the luminescence object recognition system is retrieved by a further processor implementing routine 301 and being present separate from the processor of the luminescence object recognition system. In block 304, routine 301 determines whether differential data has to be generated (i.e. the delta calculation previously mentioned has to be performed). Performing the delta calculation is necessary if separation of the luminesced and reflected light acquired upon illumination of the scene with the light source of the object recognition system is achieved physically, for example by the use of specific filters in front of the illuminants and/or sensors. Routine 301 may be programmed to perform this determination based on data provided from the luminescence object recognition system, for example if flicker compensation (as described in FIG. 1) has been performed or not. If routine 301 determines that differential data has to be generated, it proceeds to block 306. Otherwise, it proceeds to block 308 described below.
In block 306, routine 301 generates differential data from the provided data of the scene (i.e. performs the delta calculation) as described previously.
In block 308, routine 301 retrieves digital representations of pre-defined objects and optionally digital representations of the scene via a communication interface from a data storage medium, such as a database as described in relation to FIG. 1. Routine 301 may retrieve the digital representation of the scene based on a scene identifier which may be contained in the data of the scene received from the luminescence object recognition system.
In block 310, routine 301 determines regions of luminescence, in particular regions having similar luminescence, in the data of the scene retrieved in block 302 or the differential data generated in block 306 by classifying the pixels associated with the detected regions of luminescence. Determination of regions of luminescence can be performed as previously described in relation to the determination of further object specific luminescence and/or reflectance properties by analysing the brightness of the pixels in the data retrieved in block 302 and classifying the pixels according to their brightness.
In block 312, routine 301 determines the best matching luminescence and optionally reflectance properties for each detected region of luminescence as previously described (see for example block 130 of FIG. 1). In one example, this may include determining luminescence spectral patterns and/or reflective spectral patterns for the determined regions of luminescence prior to determining the best matching luminescence and optionally reflectance properties.
In block 314, routine 301 obtains the object(s) assigned to the best matching luminescence and optionally reflectance properties determined in block 312. Obtaining the object(s) may be performed by retrieving the object(s) from the digital representations of pre-defined objects retrieved in block 308 or from a further database as previously described. In block 316, routine 301 refines the object(s) obtained in block 314 based on the digital representation of the scene retrieved in block 308, this block being generally optional. Refinement of the list of object(s) obtained after block 314 can be performed as described previously in relation to FIG. 1 , block 132 (option A) and can increase the recognition accuracy in case of ambiguous identification of objects based on the luminescence and optionally reflectance properties.
In block 318, routine 301 annotates each classified pixel of each image with an object specific label based on the object(s) obtained in block 314 or refined in block 316 and the associated regions of luminescence determined in block 310.
In block 320, routine 301 determines whether to create labelled images using bounding boxes, this step generally being optional. This determination may be made in accordance with the programming of routine 301. The processor implementing routine 301 and performing blocks 320 to 324 may be the same processor which performs blocks 304 to 318 or may be different from this processor. In case the routine 301 determines that labelled images are to be created using bounding boxes, routine 301 proceeds to block 322. Otherwise, routine 301 proceeds to block 324 described later on. In case a different processor is used for performing blocks 320 and 322/324, the annotated images generated in block 318 are provided to the further processor via a communication interface prior to performing block 322/324. Use of bounding boxes or image segmentation can increase the accuracy of the training data provided to the neural network, thus resulting in a higher accuracy of object detection using the neural network trained with such more accurate training data.
In block 322, routine 301 creates labelled images based on the annotated images created in block 320 using bounding boxes, this step generally being optional. Bounding box creating can be performed as previously described using coordinate information associated with the annotated pixels created in block 320.
In block 324, routine 301 creates labelled images by segmenting the annotated images created in block 320, this step being generally optional. Image segmentation can be performed by clustering regions of pixels as previously described.
In block 326, the neural network and optionally its dimensions are selected. Suitable object recognition neural networks include convolution neural networks (CNNs) known in the state of the art, such as deep convolutional neural networks comprising a plurality of convolutional neural network layers followed by one or more fully connected neural network layers. These deep convolutional neural networks may comprise a pooling layer after each convolutional layer or after a plurality of convolutional layers and/or may comprise a non-linear layer after each convolutional layer, in particular between the convolutional layer(s) and the pooling layer.
In block 328, the labelled images obtained in block 322 or 324 as well as unlabelled images of the scene provided in block 302 and optionally the digital representation of the scene are provided to the neural network selected in block 326. Use of the digital representation of the scene may improve the recognition accuracy in case of ambiguous object identification. For this purpose, the processor performing block 322 or 324 may be connected via a communication interface to the processing unit hosting the neural network, i.e. the processing unit having implemented the neural network selected in block 326. In case the digital representation of the scene has not yet been retrieved by the processor performing block 328, said digital representation is retrieved from a database as previously described in block 308. This block may further include dividing the data provided in this block into a training set, a validation set, and a verification (or “testing”) set prior to providing the data to the neural network in block 330. The training set is used to adjust the neural network via a back- propagation algorithm so that the neural network iteratively “learns” how to correctly recognize objects in the input data. The validation set, however, is primarily used to minimize overfitting. The validation set typically does not adjust the neural network as does the training set, but rather verifies that any increase in accuracy over the training data set yields an increase in accuracy over a data set that has not been applied to the neural network previously, or at least the neural network has not been trained on it yet (i.e. validation data set). If the accuracy over the training data set increases, but the accuracy over the validation data set remains the same or decreases, the process is often referred to be “overfitting” the neural network and training should cease. The verification set is used for testing the trained neural network in order to confirm the actual predictive power of the trained neural network and is preferably used in block 332 as described later on.
In one example, approximately 70% of the provided data of the scene (i.e. labelled images resulting from block 322 or 324 and images of the scene) sets are used for model training, 15% are used for model validation, and 15% are used for model verification. These approximate divisions can be altered as necessary to reach the desired result. The size and accuracy of the training data set can be very important to the accuracy of the neural network obtained by method 300. For example, for an illustrative embodiment of method 300, about 40.000 sets of data of the scene may be collected, each set including images of the scene, labelled images of the scene and a digital representation of the scene. The automatic labelling of the collected images of the scene allows to generate training data sets accurately, fast and efficient, thus allowing to obtain sufficiently trained neural networks within a short amount of time. The training data set may include samples throughout a full range of expected objects which may occur in the scene.
In block 330, the neural network provided in block 326 is trained with the training data set provided in block 328 according to methods well known in the state of the art (see for example Michael A. Nielsen, „Neural Networks and Deep Learning", Determination Press, 2015). Training is an iterative process that modifies the weights and bias gradients of each layer. In one example, the training is performed using the backpropagation technique to modify the layer's weights and biases. With each iteration of training data to adjust the weights and biases, the validation data is run on the neural network and one or more measures of accuracy is determined by comparison of the recognized objects with the objects being present in the training data. For example, generally the standard deviation and mean error of the output will improve for the validation data with each iteration and then the standard deviation and mean error will start to increase with subsequent iterations. The iteration for which the standard deviation and mean error is minimized is the most accurate set of weights and biases for that neural network model for that training set of data.
In block 332, the results of the neural network trained in block 330 are verified using the verification data set to determine whether the output of the neural network is sufficiently accurate when compared to the objects being present in the scene. If the accuracy is not sufficient, method 300 proceeds to block 336. Otherwise, method 300 ends and the obtained trained neural network can be used to recognize objects in a scene as described in relation to FIGs. 4 to 6 later on.
In block 336, larger and/or more accurate sets of training data are used to modify the neural network or a different type and/or different dimensions of the neural network are be selected to improve the accuracy using the training data set provided in block 328 and method 300 continues block 330 previously described.
The training method 300 allows to automatically label objects being present in the scene based on data generated by luminescence object recognition systems and to use the labelled data to train neural networks commonly used for object recognition. This allows to train neural networks installed in a specific scene, such as a store, with training data acquired from a luminescence object recognition system installed in a further store having a similar layout. Households may have their purchased luminescence object recognition system subsidized or receive other reimbursements in exchange for the labelled data their system generates, which is used for training traditional Al object recognition systems used by similar households (i.e. households being similar in geographical location, income, interests, brand preferences, etc.), or they may opt to keep their data private but have it used to improve other Al object recognition systems used in their household.
FIG. 4 depicts a non-limiting embodiment of a method 400 for recognizing at least one object being present in the scene using a trained neural network. In one example, the neural network has been trained using method 300 described in relation to FIG. 3. The scene may be located indoors or outdoors. In one example of method 400, the trained neural network is used in combination with a luminescent object recognition system (for example a system described in relation to FIGs. 2a and 2b). In another example of method 400, the trained neural network is used without a luminescent object recognition system.
In block 402, routine 401 implementing method 400 retrieves a trained neural network (TNN). In one example, the neural network has been trained using automatically labelled training data generated from luminescence object recognition systems (for example by using the training method described in relation to FIG. 3). The trained neural network may be located on a remote server or may be stored on a data storage medium, such as an internal memory of the processing device implementing routine 401. By locating the trained neural network on a remote server or a cloud server, costs of added memory and/or a more complex processor, and associated battery usage in using the neural network to determine the objects present in the scene can be avoided. Additionally, continuous, or periodic improvement of the neural network can more easily be done on a centralized server and avoid data costs, battery usage, and risks of pushing out a firmware update of the neural network. A remote server may also serve as a central repository storing training and/or collections of operative data sent from various luminescence object recognition systems to be used to train and develop existing neural networks. For example, a growing repository of data can be used to update and improve existing trained neural networks and to provide neural networks for future use.
In block 404, data of the scene is retrieved via a communication interface by routine 401 . Data on the scene can be acquired using a commercially available camera or by using a luminescence object recognition system as previously described (for example as described in relation to FIGs. 2a and 2b). In one example, data of the scene includes image(s) of the scene. In another example, data of the scene includes data on object specific reflectance and/or luminescence properties of at least one object have these properties and being present in the scene. The provided data may further include identifiers being indicative of the scene or the system used to acquire the data, for example a location, serial number, unique system identifier, etc.. The data may be stored on a data storage medium, such as a database or internal memory, prior to retrieving the data by routine 401.
In block 406, routine 401 determines whether the data retrieved in block 404 was acquired with a luminescence object recognition system. This may be determined based on the retrieved data. In case routine 401 determines that the data retrieved in block 404 was not acquired using a luminescence object recognition system, routine 401 proceeds to block 408. Otherwise, routine 401 proceeds to block 412 described later on.
In block 408, a digital representation of the scene is retrieved by routine 401 , this step being generally optional. It may be preferable to perform this step to increase recognition accuracy in case of ambiguous object recognition based on the data retrieved in block 404. The digital representations of the scene may be stored on a data storage medium, such as a database, and may be retrieved by routine 401 via a communication interface, for example by using a scene identifier contained in the data retrieved in block 404.
In block 410, at least one object being present in the scene is determined using the trained neural network retrieved in block 402, the data of the scene retrieved in block 404 and optionally the digital representation of the scene retrieved in block 408. This is performed by providing the data on the scene and optionally the digital representation of the scene to the trained neural network and using the trained neural network to identify the objects being present in the scene.
In block 412, routine 401 determines whether the data from the luminescence object recognition system retrieved in block 404 is to be processed by determining further object specific luminescence and/or reflectance properties as previously described, for example in relation to block 124 of FIG. 1. The determination may be made based on the programming and may depend, for example, on the retrieved data of the scene (e.g. the measurement conditions and/or the hardware configuration of the luminescence object recognition system). If routine 401 determines in block 412 that further processing is to be performed, routine 401 proceeds to block 414, otherwise routine 401 proceeds to block 416 described later on.
In block 414, further object specific luminescence and/or reflectance properties are determined as previously described, for example as described in relation to block 124. The determination may be made with the computer processor performing block 412 or remotely, i.e. with a second computer processor as described in relation to FIG. 1. In case a second computer processor is used, the data retrieved in block 404 is provided to the second computer processor prior to determining the further object specific luminescence and/or reflectance properties. The determined further properties may be provided to the first computer processor via a communication interface if desired.
In block 416, digital representations of pre-defined objects and optionally a digital representation of the scene are retrieved by routine 401 as described in relation to block 128 of FIG. 1. The computer processor may be the first or the second computer processor previously described.
In block 418, the computer processor determines a set of object identification hypothesis based on
- the data retrieved in block 404 or the further object specific luminescence and/or reflectance properties determined in block 414 and the digital representations of pre-defined objects retrieved in block 416 (option A) or
- the digital representation of the scene retrieved in block 416 (option B) as described in relation to block 130 of FIG. 1.
After the computer processor has determined a set of object identification hypothesis in block 418, routine 401 proceeds to block 420 and refines the set of object identification hypothesis determined in block 418 as described in relation to block 132 of FIG. 1. Depending on the option used in block 418 to determine the set of object identification hypothesis, routine 401 is programmed to perform the refinement of the determined object based on: the digital representation of the scene retrieved in block 416 in case the set of object identification hypothesis has been determined according to option A previously described or the data retrieved in block 404 or the further object specific luminescence and/or reflectance properties determined in block 414 and the digital representations of predefined objects retrieved in block 416 in case the set of object identification hypothesis has been determined according to option B previously described.
In block 422, routine 401 provides the determined object(s) to a display device as described in relation to block 134 of FIG. 1. Routine 401 may then proceed to block 404 and repeat the object recognition process or the method may be ended.
In block 424, the display device displays the data received from routine 401 in block 422 on the screen, in particular within a GUI. In block 426, routine 401 determines actions associated with the determined objects and may displays these determined actions to the user on the screen of the display device, this step being generally optional. The determined actions may be pre-defined actions as previously described. In one example, the determined actions may be performed automatically by the computer processor without user interaction. However, routine 401 may provide information about the status of the initiated action to the user on the screen of the display device. In another example, a user interaction is required after displaying the determined actions on the screen of the display device prior to initiating any action by the computer processor as previously described. Routine 401 may be programmed to control the initiated actions and to inform the user on the status of the initiated actions. After the end of block 426, routine 401 may return to block 402 or the method may be ended.
FIG. 5 illustrates a non-limiting embodiment of a system 500 comprising a luminescent object recognition system 501 and a visual Al recognition system 503. System 500 may be used to implement the training method 300 described in relation to FIG. 3 or the object recognition method 400 described in relation to FIG. 4. In this example, system 500 monitors scene 502 comprising a waste bin as well as waste 504 (i.e. the object to be recognized) being thrown into the waste bin by a person. The waste bin may be located indoors, for example in a kitchen, bathroom etc., or may be located outdoors, for example in a public place, like a park.
System 500 comprises a similar luminescent object recognition system 501 as described in relation to FIG. 2b. In this example of system 501 , database 516 contains digital representations of pre-defined objects and digital representations of the scene. In another example of system 501 (not shown), the digital representations of pre-defined objects and the digital representations of the scene may be stored in different databases.
The processing unit 516 of system 501 comprises a processor 512 which is configured to execute instructions, for example retrieved from memory 514, and to carry out operations associated with the system 500. In one example, processor 512 is programmed to perform blocks 302 to 324 described in relation to method 300 of FIG. 3, namely to create labelled images based on the data acquired by the sensor unit 508 upon illumination of scene 502 with the light source 506. These labelled images are then used to train neural network 538 which is implemented on processor 530.
In another example, processor 512 is programmed to perform blocks 404, 406 and 412 to 418 or blocks 404, 406 and 412 to 416 and 420 described in relation to method 400 of FIG. 4, namely to determine a set of object identification hypothesis or refine the set of object identification hypotheses determined by the processing unit 534 of the visual Al system 503. The set of object identification hypothesis may be determined as described previously in relation to block 418 of FIG. 4 by processor 512 based on
■ the acquired data of the scene or determined further object specific luminescence and/or reflectance properties and the provided digital representations of pre-defined objects or
■ the provided digital representation of the scene.
The determined set of object identification hypothesis is then provided via communication interface 528 to processor 530 of the visual Al system 503 for refinement.
Refinement of the set of object identification hypotheses provided by visual Al system 503 via communication interface 528 may be performed by processor 512 as described previously in relation to block 420 of FIG. 4 based on
■ the acquired data of the scene or determined further object specific luminescence and/or reflectance properties and the provided digital representations of pre-defined objects or
■ the provided digital representation of the scene.
The object(s) resulting from the refinement process may then be provided to a display device to display the determined object(s) on the screen of said display device.
The visual Al recognition system 503 comprises a processing unit 534 which is connected to the processing unit 516 of the luminescent object recognition system 501 via communication interface 528 to allow data exchange between system 501 and 503. Within this example, exactly one luminescence object recognition system 501 is connected with the visual Al recognition system 503. However, it is also possible to couple more than one luminescence object recognition system to the visual Al recognition system 503 (not shown). In one example, the communication interface 528 represents a gateway. In another example, the luminescence object recognition system 501 is coupled directly to visual Al recognition system 503. In this case, the 501 or 503 may be configured with any of the gateway functionality and components described herein and treated like a gateway by system 503 or 501 , at least in some respects. Each gateway may be configured to implement any of the network communication technologies described herein so the gateway may remotely communicate with, monitor, and manage system 501 or 503. Each gateway may be configured with one or more capabilities of a gateway and/or controller as known in the state of the art and may be any of a plurality of types of devices configured to perform the gateway functions defined herein. To ensure security of the transmitted data, each gateway may include a Trusted Platform Module (TPM) (for example in a hardware layer of a controller). The TPM may be used, for example, to encrypt portions of communications from/to system 501/503 to/from gateways, to encrypt portions of such information received at a gateway unencrypted, or to provide secure communications between system 501 , gateway 528 and system 503. For example, TPMs or other components of the system 500 may be configured to implement T ransport Layer Security (TLS) for HTTPS communications and/or Datagram Transport Layer Security (DTLS) for datagram-based applications. Furthermore, one or more security credentials associated with any of the foregoing data security operations may be stored on a TPM. A TPM may be implemented within any of the gateways, system 501 or system 503, for example, during production, and may be used to personalize the gateway or system 501/503. Such gateways, system 501 and/or system 503 may be configured (e.g., during manufacture or later) to implement cryptographic technologies known in the state of the art, such as a Public Key Infrastructure (PKI) for the management of keys and credentials.
In one example, gateway 528 connecting system 501 to system 503 or each gateway present within system 501 may be configured to process data received from system 501 or 503, including analyzing data that may have been generated or received by system 501 or 503, and providing instructions to system 501 , for example concerning acquisition of data of the scene. In addition, each gateway may be configured to provide one or more functions pertaining to triggering pre-defined actions as described in more detail in relation to FIG. 1. For this purpose, each gateway may be configured with software encapsulating such capability. In another example, system 501 connected via a communication interface directly to system 503 may be configured to process data and perform further functions described above. For this purpose, system 501 may be configured with software encapsulating such capability. By performing such processing at one or more gateways, and/or at system 501 themselves, as opposed to in a more centralized fashion on system 503, the system 500 may implement and enjoy the benefits of more distributed edge-computing techniques.
In this example of system 503, processing unit 534 is connected via communication interface 540 to database 536 comprising digital representations of the scene and neural network 538 is implemented and running on processor 530 of processing unit 534. In one example, neural network 538 is a neural network which has been trained according to the method described in relation to FIG. 3. In another example, neural network 538 is an untrained neural network which is trained with data received via communication interface 528 from processing unit 516 of the luminescence object recognition system 501. In yet another example, neural network 538 is a commercially available trained image recognition neural network, such as Inception/GoogLeNet, ResNet-50, ResNet-34 , MobileNet V2 or VGG-16.
Processor 530 of processing unit 534 is configured to execute instructions, for example retrieved from memory 532, and to carry out operations associated with system 500. In one example, processor 530 is configured to train neural network 538 with labelled images received via communication interface 528 from the luminescence object recognition system 501 and optionally a digital representation of the scene retrieved from database 536 via communication interface 540 based on the data provided from system 501 as described in relation to blocks 330 and 332 of FIG. 3.
In another example, processor 530 is configured to determine a set of object identification hypothesis or refine the set of object identification hypotheses determined by the processing unit 516 of luminescence object recognition system 501 using trained neural network 538 and optionally the digital representations of the scene stored in database 536. Processor 530 may be further programmed to optionally provide the progress of the training or the result of the refinement to a display device.
The set of object identification hypothesis may be determined by processor 530 using trained neural network 538 based on the data acquired by the luminescence object recognition system 501 , in particular images of scene 502, which is provided to processor 530 via communication interface 528, and optionally the digital representation of the scene which is stored in database 536 and retrieved by processor 530 via communication interface 540.
The set of object identification hypotheses received via communication interface 528 from the luminescence object recognition system may be refined by processor 530 using trained neural network 538 based on the data acquired by the luminescence object recognition system 501 , in particular images of scene 502, which is provided to the processor 530 via communication interface 528 and optionally the digital representation of the scene which is stored in database 536 and retrieved by processor 530 via communication interface 540.
In one example, system 500 further comprises a database (not shown) containing actions associated with pre-defined objects which is connected to the processing unit 516 or 534 via a communication interface as described in relation to FIG. 2a.
In one example, system 500 further comprises a display device having a screen (not shown) and being connected to processing unit 516 or 534 via a communication interface as described in relation to FIG. 2a.
FIG. 6 illustrates a non-limiting embodiment of a system 600 for recognizing at least one object in a scene and for remotely managing object recognition system(s). In this example, system 600 may be used to implement blocks 402 to 410 and 422 to 426 of method 400 described in relation to FIG. 4. In this example, system 600 comprises a visual Al recognition system 601 comprising a trained object recognition neural network 614, a cloud 616 and a display device 626. In another example, several visual Al recognition systems 601.1 to 601. n are connected via communication interfaces with cloud 616. In yet another example, luminescence object recognition system(s) (such as described in FIGs. 2a, 2b) are connected to cloud 616 instead or in addition to visual Al recognition system 601 (not shown).
The visual Al system 601 comprises a processing unit 612 housing computer processor 608 and internal memory 610 which is connected via communication interface 626 with a sensor unit comprising sensor 606 and via communication interface 630 with cloud 616. Communication interfaces 626, 630, 632 may represent gateways and/or the visual Al system 601 may comprise gateway functionalities as described in relation to FIG. 5. In this example, the sensor unit comprises exactly one sensor 606. In another example, the sensor unit comprises more than one sensor. Suitable sensors may include, for example, cameras, such as a commercially available video cameras. Sensor 606 of system 601 is used to monitor scene 602 comprising a waste bin as well as waste 604 (i.e. the object to be recognized) being thrown into the waste bin by a person. The waste bin may be located indoors, for example in a kitchen, bathroom etc., or may be located outdoors, for example in a public place, like a pare. The acquired sensor data is provided via communication interface 626 to processing unit 612. Object recognition neural network 614 which has been trained according to the method described in relation to FIG. 3 is implemented and running on processor 608 of processing unit 612. The processor 608 is configured to execute instructions, for example retrieved from memory 610, and to carry out operations associated with the system 600, namely o determine the at least one object in the scene based on
■ the implemented trained neural network 614
■ the provided data of the scene, and
■ optionally the provided digital representation of the scene, o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
Suitable processors and internal memories are the ones previously described in relation to FIG. 2a. In this example, the provided digital representation of the scene is stored in the service layer 624 of cloud 616 described below and retrieved by processor 608 via communication interface 630 prior to determining the object(s) being present in the scene. In another example (not shown), the provided digital representation of the scene is stored in a database which is connected via a communication interface with processor 608 of processing unit 612 of system 601.
System 600 further comprises cloud 616 having two layers, namely an application layer 620 containing one or more applications 616 and a service layer 624 containing one or more databases 622. The applications layer 620 as well as the services layer 624 may each be implemented using one or more servers in the cloud 616. In another example, the cloud 616 comprises more or less layers. The service layer 624 may include, for example, at least one of the following databases 622: a geographic database, an object recognition system database, an environment database, a legal database, a history database.
The geographic database may include geographic information involving visual Al object recognition systems managed by the system 600. For example, geographic information may include the GPS location of the environment, address information, etc. of system(s) 601.
The object recognition system database may include information about system(s) 601 managed by system 600 such as, for example, hardware configuration, date of creation, maintenance intervals, last inspection, and other information.
The environment database may include information about the environment in which managed system(s) 601 are installed in such as, for example, location of system(s) 601 in the scene, store identifier, store information, number of shelves in store, store size, preferences of store clients, number of people in the household, age of people in the household, preferences of people in the household, purchase orders, stock information and other information.
The legal database may store information about legal restrictions concerning the distribution of items, the commercial availability of items in a specific region or other information.
The history database may store information about the purchase history of articles, history of sold articles, etc..
Information stored in the service layer 624 may be retrieved by system(s) 601 via communication interface 630 prior to determining the objects present in the scene from the acquired sensor data. The information may be retrieved using a scene identifier which is unique for every household system 601 is installed or for every system 601.
The transformation layer 620 may include any of a variety of applications that utilize information and services related to item management, including any of the information and services made available from the services layer 624. The transformation layer 620 may include: a system inventory application, an item inventory management application, an order management application, further applications, or any suitable combination of the foregoing.
The system inventory application may provide an inventory of system(s) 601 managed within system 600, including properties (e.g., characteristics) about each system 601. The inventory of systems may be a group (e.g., "fleet") of systems owned, leased, controlled, managed, and/or used by an entity, such as a vendor of visual Al recognition systems, a store owner, etc.. The item inventory application may provide an inventory of items being present in the store or household based on the information stored in the environment database or the history database.
The order management application may manage item orders of the shop/household. The order management application may maintain information about all past and current item orders and process such orders. The order management application may be configured to automatically order items based on the objects detected by system(s) 601 and information contained in the environment database. For example, the application may have one or more predefined thresholds, e.g., of number of remaining items, etc., after which being reached or surpassed (e.g., going below a number of remaining items) additional items should be ordered. The applications may be configured via interfaces to interact with other applications within the application layer 620, including each other. These applications or portions thereof may be programmed into gateways and/or system 601 of the system 600 as well.
System 600 further comprises a display device 626 connected via communication interface 632 with cloud 616. In this example, the display device is a smartphone. In another example, the display device may be a stationary display device or may include further mobile display devices. The display device can be used to display the objects recognized with system 601 or further information, such as determined actions, on the screen of the display device. The display device can also be used to access the application layer 620 in cloud 616 and to manage system(s) 601 connected to cloud 616.
Information may be communicated between components of the system 600, including system(s) 601 , gateways, and components of the cloud 616, in any of a variety of ways. Such techniques may involve the transmission of information in transaction records, for example using blockchain technology. Such transaction records may include public information and private information, where public information can be made more generally available to parties, and more sensitive information can be treated as private information made available more selectively, for example, only to certain users, owners of system(s) 601. For example, the information in the transaction record may include private data that may be encrypted using a private key specific to a system 601 and may include public data that is not encrypted. The public data may also be encrypted to protect the value of this data and to enable the trading of the data, for example, as part of a smart contract. The distinction between public data and private data may be made depending on the data and the use of the data.
The number of communications between components of the system 600 may be minimized, which in some embodiments may include communicating transactions (e.g., detected objects) to servers within the cloud 616 according to a predefined schedule, in which gateways are allotted slots within a temporal cycle during which to transmit transactions (e.g., transmit data from system 601 to cloud 616 or instructions from cloud 616 to system(s) 601) to/from one or more servers. Data may be collected over a predetermined period of time and grouped into a single transaction record prior to transmittal.

Claims

Claims A computer-implemented method for recognizing at least one object having object specific luminescence properties in a scene, the method comprising:
(i) providing to a computer processor via a communication interface data of the scene, said data of the scene including data on object specific reflectance and/or luminescence properties of at least one object being present in the scene;
(ii) providing to the computer processor via a communication interface digital representations of pre-defined objects and a digital representation of the scene;
(iii) determining - with the computer processor - the at least one object in the scene based on
• the provided data of the scene,
• the provided digital representations of pre-defined objects, and
• the provided digital representation of the scene;
(iv) optionally providing via the communication interface the at least one identified object and/or triggering with the computer processor an action associated with the identified object(s). The method according to claim 1 , wherein the digital representation of the scene comprises data being indicative of the geographic location of the scene, data being indicative of the household, data on stock on hand, data on preferences, historical data of the scene, data being indicative of legal regulations and/or commercial availability valid for the scene or geographic location, dimensions of the scene or a combination thereof, said data being optionally interrelated with a scene identifier. The method according to claim 1 or 2, wherein determining the at least one object in the scene includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene and/or the determined further object specific reflectance and/or luminescence properties and the provided digital representations of pre-defined objects, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided digital representation of the scene. The method of claim 3, wherein determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided data of the scene and/or the determined further object specific reflectance and/or luminescence properties and the provided digital representations of pre-defined objects includes calculating the best matching reflectance and/or luminescence properties and obtaining the object(s) assigned to the best matching reflectance and/or luminescence properties. The method according to claim 4, wherein refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided digital representation of the scene includes determining - based on the digital representation of the environment of the scene - confidence score(s) for the determined set of object identification hypotheses and using the determined confidence scores to refine the confidence scores associated with the determined set of object identification hypotheses to identify the at least one object. The method according to any one of claims 1 or 2, wherein determining the at least one object in the scene includes determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided digital representation of the scene, each of said object identification hypothesis having an associated confidence score that respectively indicates certainty about said hypothesis, and refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided data of the scene and/or the determined further reflectance and/or luminescence properties and the provided digital representations of predefined objects. The method according to claim 6, wherein determining a set of object identification hypotheses about the object(s) to be recognized in the scene based on the provided digital representation of the scene includes determining the likelihood of the presence of object(s) in the scene based on the provided digital representation of the scene and generating a set of object identification hypotheses and associated confidence scores based on the determined likelihood. The method according to claim 6 or 7, wherein refining the set of determined object identification hypotheses about the object(s) to be recognized by revising at least certain of said associated confidence scores based on the provided data of the scene and/or the determined further reflectance and/or luminescence properties and the provided digital representation of pre-defined objects includes determining confidence score(s) associated with object(s) present in the scene by calculating the best matching reflectance and/or luminescence properties and using the determined confidence score(s) to refine the confidence score(s) associated with the determined object identification hypothesis to identify the at least one object. A system for recognizing at least one object having object specific luminescence properties in a scene, said system comprising: a light source comprising at least one illuminant for illuminating the scene a sensor unit for acquiring data of the scene including object specific reflectance and/or luminescence properties of at least one object being present in the scene upon illumination of the scene with the light source; a data storage medium comprising digital representations of pre-defined objects and digital representations of the scene; at least one communication interface for providing the acquired data of the scene, the digital representations of pre-defined objects and the digital representations of the scene; a processing unit in communication with the sensor unit and the communication interface, the processing unit programmed to o determine the at least one object in the scene based on
■ the provided data of the scene,
■ the provided digital representations of pre-defined objects, and
■ the provided digital representation of the scene, o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s). A method for training an object recognition neural network, the method comprising: a) providing via a communication interface to a computer processor data of a scene, said data of the scene including image(s) of the scene and data on object specific luminescence and optionally reflectance properties of at least one object being present in the scene, b) calculating with the computer processor for each provided image of the scene a labelled image of the scene by b1) annotating each classified pixel of each image with an object specific label based on the data on object specific luminescence and optionally reflectance properties, and b2) optionally creating bounding boxes around the objects determined in the images in step b1) based on the annotated pixels or segmenting the images obtained after step b1) based on the annotated pixels; c) providing via a communication interface the calculated labelled images of the scene and optionally a digital representation of the scene to the object recognition neural network; and d) training the object recognition neural network with the provided calculated labelled images of the scene and optionally with the provided digital representation of the scene as input, wherein the neural network is trained to recognize each labelled object in the calculated labelled images.
11. The method of claim 10, wherein the object recognition neural network is a deep convolutional neural network comprising a plurality of convolutional neural network layers followed by one or more fully connected neural network layers.
12. The method according to claim 10 or 11 , wherein step b1) includes providing via a communication interface to the computer processor digital representations of pre-defined objects and optionally a digital representation of the scene, detecting, using the computer processor, for each image of the scene regions of luminescence by classifying the pixels associated with the detected regions of luminescence, determining the object(s) associated with the detected regions of luminescence and being present in each image by determining the best matching luminescence and optionally reflectance properties for each detected region of luminescence, obtaining the object(s) assigned to the best matching luminescence and optionally reflectance properties and optionally refining the obtained object(s) based on the provided digital representation of the scene, and annotating each classified pixel of each image with an object specific label based on the determined object(s) and the associated detected regions of luminescence.
13. A computer-implemented method for recognizing at least one object in a scene, said method comprising: (A) providing via a communication interface to a computer processor a trained object recognition neural network, in particular an object recognition neural network which has been trained according to the method of any one of claims 10 to 12,
(B) providing via a communication interface to the computer processor data of the scene, said data of the scene including image(s) of the scene and/or data on object specific reflectance and/or luminescence properties of at least one object being present in the scene;
(C) optionally providing via a communication interface to the computer processor digital representations of pre-defined objects and/or a digital representation of the scene,
(D) determining - with the computer processor - at least one object in the scene based on the provided trained object recognition neural network, the provided data of the scene and optionally the provided digital representation of the scene; and
(E) optionally providing via a communication interface the at least one identified object and/or triggering at least one action associated with the identified object(s).
14. A system for recognizing at least one object in a scene, said system comprising: a sensor unit for acquiring data of the scene; a data storage medium comprising a trained object recognition neural network, in particular an object recognition neural network which has been trained according to the method of any one of claims 10 to 12, and optionally digital representations of pre-defined objects and/or a digital representation of the scene, at least one communication interface for providing the acquired data, the object recognition neural network, and optionally digital representations of pre-defined objects and/or the digital representation of the scene, a processing unit in communication with the sensor unit and data storage medium, the processing unit programmed to o determine the at least one object in the scene based on
■ the provided data of the scene,
■ the provided object recognition neural network, and
■ optionally the provided digital representations of pre-defined objects and/or the digital representation of the scene, o optionally provide via the communication interface the at least one identified object and/or trigger an action associated with the identified object(s).
15. A non-transitory computer-readable storage medium, the computer-readable storage medium including instructions that when executed by a computer, cause the computer to perform the steps according to the method of any one of claims 1 to 8 or according to the methods of any one of claims 10 to 13.
PCT/EP2023/056782 2022-03-23 2023-03-16 System and method for object recognition utilizing color identification and/or machine learning WO2023180178A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP22163719.2 2022-03-23
EP22163719 2022-03-23

Publications (1)

Publication Number Publication Date
WO2023180178A1 true WO2023180178A1 (en) 2023-09-28

Family

ID=80930216

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/056782 WO2023180178A1 (en) 2022-03-23 2023-03-16 System and method for object recognition utilizing color identification and/or machine learning

Country Status (1)

Country Link
WO (1) WO2023180178A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020178052A1 (en) 2019-03-01 2020-09-10 Basf Coatings Gmbh Method and system for object recognition via a computer vision application
WO2020245443A2 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh System and method for object recognition using fluorescent and antireflective surface constructs
WO2020245444A1 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh System and method for object recognition using 3d mapping and modeling of light
CA3140446A1 (en) * 2019-06-07 2020-12-10 Yunus Emre Kurtoglu Device and method for forming at least one ground truth database for an object recognition system
WO2020245441A1 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh System and method for object recognition using three dimensional mapping tools in a computer vision application
WO2020245442A1 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh System and method for object recognition under natural and/or artificial light
WO2020245439A1 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh Method and device for detecting a fluid by a computer vision application

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020178052A1 (en) 2019-03-01 2020-09-10 Basf Coatings Gmbh Method and system for object recognition via a computer vision application
WO2020245443A2 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh System and method for object recognition using fluorescent and antireflective surface constructs
WO2020245444A1 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh System and method for object recognition using 3d mapping and modeling of light
CA3140446A1 (en) * 2019-06-07 2020-12-10 Yunus Emre Kurtoglu Device and method for forming at least one ground truth database for an object recognition system
WO2020245441A1 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh System and method for object recognition using three dimensional mapping tools in a computer vision application
WO2020245442A1 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh System and method for object recognition under natural and/or artificial light
WO2020245439A1 (en) 2019-06-07 2020-12-10 Basf Coatings Gmbh Method and device for detecting a fluid by a computer vision application

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
IAN G.: "Back-Propagation and Other Differentiation Algorithms", 2016, MIT PRESS, article "Deep Learning", pages: 200 - 220
MICHAEL A. NIELSEN: "Neural Networks and Deep Learning", 2015, DETERMINATION PRESS
OSADCHY M ET AL: "Using specularities for recognition", PROCEEDINGS NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION . (ICCV). NICE, FRANCE, OCT. 13 - 16, 2003; [INTERNATIONAL CONFERENCE ON COMPUTER VISION], LOS ALAMITOS, CA : IEEE COMP. SOC, US, vol. CONF. 9, 13 October 2003 (2003-10-13), pages 1512 - 1519, XP010662570, ISBN: 978-0-7695-1950-0, DOI: 10.1109/ICCV.2003.1238669 *

Similar Documents

Publication Publication Date Title
CN104751195B (en) Utilize the image procossing of reference picture
CN106716876B (en) High dynamic range encodes light detection
RU2734870C2 (en) Information retrieval system and information search program
CN105209870A (en) Systems and methods for specifying and formulating customized topical agents
US11295152B2 (en) Method and system for object recognition via a computer vision application
CN101675713A (en) Method and system for automatically verifying the possibility of rendering a lighting atmosphere from an abstract description
EP3931750A1 (en) Method and system for object recognition via a computer vision application
CN106996913A (en) A kind of material identifier and Internet of Things
JP7277615B2 (en) Object recognition system and method using 3D mapping and modeling of light
WO2023180178A1 (en) System and method for object recognition utilizing color identification and/or machine learning
CN114127797A (en) System and method for object recognition under natural and/or artificial light
CN110461060A (en) A kind of intelligence landscape system and its working method
CN113777053B (en) High-flux detection method and device based on quantum dot fluorescence and multispectral camera
CN111316305A (en) System and method for authenticating a consumer product
EP4348592A1 (en) System and method for object recognition utilizing reflective light blocking
WO2023198580A1 (en) System and method for object recognition utilizing separate detection of luminescence and reflectance
CN113811880A (en) Device and method for forming at least one ground truth database for an object recognition system
Arevalo-Ramirez et al. Exploring the Potential of Vegetation Indices for Urban Tree Segmentation in Street View Images
JP2022535925A (en) Method and Apparatus for Detecting Fluids by Computer Vision Applications
CN215453067U (en) Multichannel hyperspectral camera based on OLED
CN111601418A (en) Color temperature adjusting method and device, storage medium and processor
Heng et al. MatSpectNet: Material Segmentation Network with Domain-Aware and Physically-Constrained Hyperspectral Reconstruction
CN113489870A (en) Clothes recognition device and system
JP2023183956A (en) Learning program, inference program, learning method, and inference method
CN102906762A (en) System and method for measuring a color value of a target

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23711095

Country of ref document: EP

Kind code of ref document: A1