WO2012140315A1 - Method, apparatus and computer program product for providing incremental clustering of faces in digital images - Google Patents

Method, apparatus and computer program product for providing incremental clustering of faces in digital images Download PDF

Info

Publication number
WO2012140315A1
WO2012140315A1 PCT/FI2012/050133 FI2012050133W WO2012140315A1 WO 2012140315 A1 WO2012140315 A1 WO 2012140315A1 FI 2012050133 W FI2012050133 W FI 2012050133W WO 2012140315 A1 WO2012140315 A1 WO 2012140315A1
Authority
WO
WIPO (PCT)
Prior art keywords
clusters
singletons
images
causing
clustering
Prior art date
Application number
PCT/FI2012/050133
Other languages
French (fr)
Inventor
Biswadeep Sengupta
Soumik Ukil
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Publication of WO2012140315A1 publication Critical patent/WO2012140315A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/30Scenes; Scene-specific elements in albums, collections or shared content, e.g. social network photos or video

Definitions

  • Embodiments of the present invention relate generally to image processing technology and, more particularly, relate to a method, apparatus and computer program product for providing incremental clustering of faces in digital images.
  • a method, apparatus and computer program product are therefore provided to enable clustering of faces in digital images.
  • a mechanism is provided for incrementally clustering of faces in digital images.
  • new images including faces can be added to existing albums or collections that have already been clustered, and the new images may be clustered in consideration of the existing clusters.
  • embodiments of the present invention may provide a relatively robust ability for managing a collection of images.
  • a method of providing incremental clustering of faces in digital images may include, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons.
  • the method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons and causing merging of the first set of clusters with the second set of clusters.
  • a computer program product for providing incremental clustering of faces in digital images.
  • the computer program product includes at least one computer-readable storage medium having computer-executable program code instructions stored therein.
  • the computer-executable program code instructions may include program code instructions for, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons.
  • the method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons and causing merging of the first set of clusters with the second set of clusters.
  • an apparatus for providing incremental clustering of faces in digital images may include at least one processor and at least one memory including computer program code.
  • the at least one memory and the computer program code may be configured, with the at least one processor, to cause the apparatus to perform at least, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons.
  • the method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons and causing merging of the first set of clusters with the second set of clusters.
  • an apparatus for providing incremental clustering of faces in digital images may include means for causing merging, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons.
  • the method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons and causing merging of the first set of clusters with the second set of clusters.
  • Embodiments of the invention may provide a method, apparatus and computer program product for employment, for example, in mobile or fixed environments.
  • computing device users may enjoy an improved capability for clustering of faces in digital images.
  • FIG. 1 illustrates a block diagram of a mobile terminal that may benefit from an example embodiment of the present invention
  • FIG. 2 is a schematic block diagram of a wireless communications system according to an example embodiment of the present invention.
  • FIG. 3 illustrates an apparatus for enabling the provision of incremental clustering of faces in digital images according to an example embodiment of the present invention
  • FIG. 4 shows a flow diagram illustrating one example of processing in order to provide incremental clustering of faces in digital images according to an example embodiment of the present invention
  • FIG. 5 illustrates a distance matrix used for clustering singletons in accordance with an example embodiment of the present invention
  • FIG. 6 illustrates an example flowchart for clustering images according to an example embodiment
  • FIG. 7 illustrates an example flowchart for clustering features through splitting according to an example embodiment
  • FIG. 8 illustrates and example splitting scenario according to various example embodiments
  • FIG. 9 illustrates an example flowchart for merging clusters according to an example embodiment.
  • FIG. 10 is a flowchart according to an example method for providing incremental clustering of faces in digital images according to an example embodiment of the present invention.
  • circuitry refers to (a) hardware- only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present.
  • This definition of 'circuitry' applies to all uses of this term herein, including in any claims.
  • the term 'circuitry' also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware.
  • the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
  • Some example embodiments may provide for more efficient clustering involving new images and previously clustered images.
  • some embodiments may enable new images to be merged with existing clusters. The new images may then be merged within themselves and also be clustered with previous singleton images to create new clusters. A singleton image may be defined as a cluster with only one member. The new clusters may then be merged with the existing (and now perhaps also augmented) clusters to form a new cluster set. Accordingly, there is no need to re-process faces that have been previously processed into the existing clusters and incremental clustering may be performed in an efficient manner.
  • FIG. 1 one example embodiment of the invention, illustrates a block diagram of a mobile terminal 10 that may benefit from embodiments of the present invention.
  • a mobile terminal as illustrated and hereinafter described is merely illustrative of one type of device that may benefit from embodiments of the present invention and, therefore, should not be taken to limit the scope of embodiments of the present invention. While several embodiments of the mobile terminal 10 may be illustrated and hereinafter described for purposes of example, other types of mobile terminals, such as portable digital assistants (PDAs), pagers, mobile televisions, gaming devices, all types of computers (e.g., laptops or mobile computers), cameras, audio/video players, radio, global positioning system (GPS) devices, or any combination of the aforementioned, and other types of communications systems, may readily employ embodiments of the present invention.
  • PDAs portable digital assistants
  • pagers mobile televisions
  • gaming devices e.g., gaming devices
  • computers e.g., laptops or mobile computers
  • GPS global positioning system
  • the mobile terminal 10 may include an antenna 12 (or multiple antennas) in operable communication with a transmitter 14 and a receiver 16.
  • the mobile terminal 10 may further include an apparatus, such as a controller 20 or other processor, that provides signals to and receives signals from the transmitter 14 and receiver 16, respectively.
  • the signals may include signaling information in accordance with the air interface standard of the applicable cellular system, and/or may also include data corresponding to user speech, received data and/or user generated data.
  • the mobile terminal 10 may be capable of operating with one or more air interface standards, communication protocols, modulation types, and access types.
  • the mobile terminal 10 may be capable of operating in accordance with any of a number of first, second, third and/or fourth-generation communication protocols or the like.
  • the mobile terminal 10 may be capable of operating in accordance with second- generation (2G) wireless communication protocols IS- 136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)), or with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with 3.9G wireless communication protocol such as E-UTRAN (evolved- universal terrestrial radio access network), with fourth-generation (4G) wireless communication protocols or the like.
  • 2G wireless communication protocols IS- 136 (time division multiple access (TDMA)
  • GSM global system for mobile communication
  • CDMA code division multiple access
  • third-generation (3G) wireless communication protocols such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA)
  • 3.9G wireless communication protocol such as E-UTRAN (evol
  • the apparatus may include circuitry implementing, among others, audio and logic functions of the mobile terminal 10.
  • the controller 20 may comprise a digital signal processor device, a microprocessor device, and various analog to digital converters, digital to analog converters, and/or other support circuits. Control and signal processing functions of the mobile terminal 10 are allocated between these devices according to their respective capabilities.
  • the controller 20 thus may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission.
  • the controller 20 may additionally include an internal voice coder, and may include an internal data modem. Further, the controller 20 may include functionality to operate one or more software programs, which may be stored in memory.
  • the controller 20 may be capable of operating a connectivity program, such as a conventional Web browser.
  • the connectivity program may then allow the mobile terminal 10 to transmit and receive Web content, such as location-based content and/or other web page content, according to a Wireless Application Protocol (WAP), Hypertext Transfer Protocol (HTTP) and/or the like, for example.
  • WAP Wireless Application Protocol
  • HTTP Hypertext Transfer Protocol
  • the mobile terminal 10 may also comprise a user interface including an output device such as an earphone or speaker 24, a ringer 22, a microphone 26, a display 28, and a user input interface, which may be coupled to the controller 20.
  • the user input interface which allows the mobile terminal 10 to receive data, may include any of a number of devices allowing the mobile terminal 10 to receive data, such as a keypad 30, a touch display (not shown), a microphone or other input device.
  • the keypad 30 may include numeric (0-9) and related keys (#, *), and other hard and soft keys used for operating the mobile terminal 10.
  • the keypad 30 may include a conventional QWERTY keypad arrangement.
  • the keypad 30 may also include various soft keys with associated functions.
  • the mobile terminal 10 may include an interface device such as a joystick or other user input interface.
  • the mobile terminal 10 further includes a battery 34, such as a vibrating battery pack, for powering various circuits that are used to operate the mobile terminal 10, as well as optionally providing mechanical vibration as a detectable output.
  • the mobile terminal 10 includes a media capturing element, such as a camera, video and/or audio module, in communication with the controller 20.
  • the media capturing element may be any means for capturing an image, video and/or audio for storage, display or transmission.
  • the camera module 36 may include a digital camera capable of forming a digital image file from a captured image.
  • the camera module 36 includes all hardware, such as a lens or other optical component(s), and software necessary for creating a digital image file from a captured image.
  • the camera module 36 may include only the hardware needed to view an image, while a memory device of the mobile terminal 10 stores instructions for execution by the controller 20 in the form of software necessary to create a digital image file from a captured image.
  • the camera module 36 may further include a processing element such as a co-processor which assists the controller 20 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data.
  • the encoder and/or decoder may encode and/or decode according to a JPEG standard format or another like format.
  • the camera module 36 may provide live image data to the display 28.
  • the display 28 may be located on one side of the mobile terminal 10 and the camera module 36 may include a lens positioned on the opposite side of the mobile terminal 10 with respect to the display 28 to enable the camera module 36 to capture images on one side of the mobile terminal 10 and present a view of such images to the user positioned on the other side of the mobile terminal 10.
  • the mobile terminal 10 may further include a user identity module (UIM)
  • UIM user identity module
  • the UIM 38 which may generically be referred to as a smart card.
  • the UIM 38 is typically a memory device having a processor built in.
  • the UIM 38 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USIM), a removable user identity module (R-UIM), or any other smart card.
  • SIM subscriber identity module
  • UICC universal integrated circuit card
  • USIM universal subscriber identity module
  • R-UIM removable user identity module
  • the UIM 38 typically stores information elements related to a mobile subscriber.
  • the mobile terminal 10 may be equipped with memory.
  • the mobile terminal 10 may include volatile memory 40, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data.
  • RAM volatile Random Access Memory
  • the mobile terminal 10 may also include other non- volatile memory 42, which may be embedded and/or may be removable.
  • the non-volatile memory 42 may additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory or the like.
  • EEPROM electrically erasable programmable read only memory
  • the memories may store any of a number of pieces of information, and data, used by the mobile terminal 10 to implement the functions of the mobile terminal 10.
  • FIG. 2 is a schematic block diagram of a wireless communications system according to an example embodiment of the present invention.
  • a system in accordance with an example embodiment of the present invention includes a first communication device (e.g., mobile terminal 10) and in some cases also a second communication device 48 that may each be capable of communication with a network 50.
  • the second communication device 48 may be another mobile terminal, or a fixed computer or computer terminal (e.g., a personal computer (PC)).
  • the second communication device 48 is provided to illustrate that example embodiments may be practiced on multiple devices or in connection with multiple devices.
  • the communications devices of the system may be able to communicate with network devices or with each other via the network 50.
  • the network devices with which the communication devices of the system communicate may include a service platform 60.
  • the mobile terminal 10 (and/or the second communication device 48) is enabled to communicate with the service platform 60 to provide, request and/or receive information.
  • not all systems that employ embodiments of the present invention may comprise all the devices illustrated and/or described herein.
  • the network 50 includes a collection of various different nodes, devices or functions that are capable of communication with each other via corresponding wired and/or wireless interfaces.
  • the illustration of FIG. 2 should be understood to be an example of a broad view of certain elements of the system and not an all inclusive or detailed view of the system or the network 50.
  • the network 50 may be capable of supporting
  • first-generation 1G
  • second-generation 2G
  • 2.5G 2.5G
  • third-generation 3G
  • 3.5G 3.9G
  • fourth-generation (4G) mobile communication protocols Long Term Evolution (LTE), LTE advanced (LTE-A), and/or the like.
  • LTE Long Term Evolution
  • LTE-A Long Term Evolution advanced
  • One or more communication terminals such as the mobile terminal 10 and the second communication device 48 may be capable of communication with each other via the network 50 and each may include an antenna or antennas for transmitting signals to and for receiving signals from a base site, which could be, for example a base station that is a part of one or more cellular or mobile networks or an access point that may be coupled to a data network, such as a local area network (LAN), a metropolitan area network (MAN), and/or a wide area network (WAN), such as the Internet.
  • LAN local area network
  • MAN metropolitan area network
  • WAN wide area network
  • other devices such as processing devices or elements (e.g., personal computers, server computers or the like) may be coupled to the mobile terminal 10 and the second communication device 48 via the network 50.
  • the mobile terminal 10 and the second communication device 48 may be enabled to communicate with the other devices (or each other), for example, according to numerous communication protocols including Hypertext Transfer Protocol (HTTP) and/or the like, to thereby carry out various communication or other functions of the mobile terminal 10 and the second
  • the mobile terminal 10 and the second communication device 48 may communicate in accordance with, for example, radio frequency (RF), Bluetooth (BT), Infrared (IR) or any of a number of different wireline or wireless communication techniques, including LAN, wireless LAN (WLAN), Worldwide Interoperability for Microwave Access (WiMAX), WiFi, ultra-wide band (UWB), Wibree techniques and/or the like.
  • RF radio frequency
  • BT Bluetooth
  • IR Infrared
  • LAN wireless LAN
  • WiMAX Worldwide Interoperability for Microwave Access
  • WiFi WiFi
  • UWB ultra-wide band
  • Wibree techniques and/or the like.
  • the mobile terminal 10 and the second communication device 48 may be enabled to communicate with the network 50 and each other by any of numerous different access mechanisms.
  • W-CDMA wideband code division multiple access
  • CDMA2000 global system for mobile communications
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • WLAN wireless access mechanisms
  • WiMAX wireless access mechanisms
  • DSL digital subscriber line
  • Ethernet Ethernet and/or the like.
  • the service platform 60 may be a device or node such as a server or other processing device.
  • the service platform 60 may have any number of functions or associations with various services.
  • the service platform 60 may be a platform such as a dedicated server (or server bank) associated with a particular information source or service (e.g., face recognition, image tagging, clustering based on face recognition and/or the like), or the service platform 60 may be a backend server associated with one or more other functions or services.
  • the service platform 60 represents a potential host for a plurality of different services or information sources.
  • the functionality of the service platform 60 is provided by hardware and/or software components configured to operate in accordance with known techniques for the provision of information to users of communication devices. However, at least some of the functionality provided by the service platform 60 is information provided in accordance with example embodiments of the present invention.
  • the service platform 60 may host an apparatus for providing services related to clustering images based on faces in the images to a device practicing an embodiment of the present invention. As such, in some embodiments, the service platform 60 may itself perform example embodiments, while in other
  • the service platform 60 may facilitate (e.g., by the provision of image data or processing of image data) operation of an example embodiment at another device (e.g., the mobile terminal 10 and/or the second communication device 48). In still other example embodiments, the service platform 60 may not be included at all. In other words, in some embodiments, operations in accordance with an example embodiment may be performed at the mobile terminal 10 and/or the second communication device 48 without any interaction with the network 50 and/or the service platform 60.
  • FIG. 3 An example embodiment will now be described with reference to FIG. 3, in which certain elements of an apparatus for enabling the provision of incremental clustering of faces in digital images are displayed.
  • the apparatus of FIG. 3 may be employed, for example, on the service platform 60, the mobile terminal 10 or second communication device 48 of FIG. 2.
  • the apparatus of FIG. 3 may also be employed on a variety of other devices. Therefore, example embodiments should not be limited to application on devices such as the service platform 60, the mobile terminal 10 or second communication device 48 of FIG. 2.
  • embodiments may be employed on a combination of devices including, for example, those listed above.
  • some example embodiments may be embodied wholly at a single device (e.g., the service platform 60, the mobile terminal 10 or the second communication device 48) or by devices in a client/server relationship (e.g., the service platform 60 serving information to the mobile terminal 10 and/or the second
  • the apparatus 65 may include or otherwise be in communication with a processor 70, a user interface 72, a communication interface 74 and a memory device 76.
  • the memory device 76 may include, for example, one or more volatile and/or non- volatile memories.
  • the memory device 76 may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like the processor 70).
  • the memory device 76 may be configured to store information, data, applications, instructions or the like for enabling the apparatus to carry out various functions in accordance with example embodiments of the present invention.
  • the memory device 76 could be configured to buffer input data for processing by the processor 70.
  • the memory device 76 could be configured to store instructions for execution by the processor 70.
  • the apparatus 65 may, in some embodiments, be a network device (e.g., service platform 60) or other devices (e.g., the mobile terminal 10 or the second communication device 48) that may operate independent of or in connection with a network. However, in some embodiments, the apparatus 65 may be instantiated at one or more of the service platform 60, the mobile terminal 10 and the second communication device 48. Thus, the apparatus 65 may be any computing device configured to employ an example embodiment of the present invention. However, in some embodiments, the apparatus 65 may be embodied as a chip or chip set (which may in turn be employed at one of the devices mentioned above).
  • the apparatus 65 may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard).
  • the structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon.
  • the apparatus 65 may therefore, in some cases, be configured to implement an embodiment of the present invention on a single chip or as a single "system on a chip.”
  • a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
  • the processor 70 may be embodied in a number of different ways.
  • the processor 70 may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
  • the processor 70 may include one or more processing cores configured to perform independently.
  • a multi-core processor may enable multiprocessing within a single physical package.
  • the processor 70 may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
  • the processor 70 may be configured to execute instructions stored in the memory device 76 or otherwise accessible to the processor 70. Alternatively or additionally, the processor 70 may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor 70 may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor 70 is embodied as an ASIC, FPGA or the like, the processor 70 may be specifically configured hardware for conducting the operations described herein.
  • the instructions may specifically configure the processor 70 to perform the algorithms and/or operations described herein when the instructions are executed.
  • the processor 70 may be a processor of a specific device (e.g., a mobile terminal or network device) adapted for employing an embodiment of the present invention by further configuration of the processor 70 by instructions for performing the algorithms and/or operations described herein.
  • the processor 70 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 70.
  • ALU arithmetic logic unit
  • the communication interface 74 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data from/to a network and/or any other device or module in communication with the apparatus 50.
  • the communication interface 74 may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network.
  • the communication interface 74 may alternatively or also support wired communication.
  • the communication interface 74 may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.
  • the user interface 72 may be in communication with the processor 70 to receive an indication of a user input at the user interface 72 and/or to provide an audible, visual, mechanical or other output to the user.
  • the user interface 72 may include, for example, a keyboard, a mouse, a joystick, a display, a touch screen(s), touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms.
  • the apparatus 65 is embodied as a server or some other network devices, the user interface 72 may be limited, or eliminated.
  • the user interface 72 may include, among other devices or elements, any or all of a speaker, a microphone, a display, and a keyboard or the like.
  • the processor 70 may comprise user interface circuitry configured to control at least some functions of one or more elements of the user interface, such as, for example, a speaker, ringer, microphone, display, and/or the like.
  • the processor 70 and/or user interface circuitry comprising the processor 70 may be configured to control one or more functions of one or more elements of the user interface through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor 70 (e.g., memory device 76, and/or the like).
  • computer program instructions e.g., software and/or firmware
  • a memory accessible to the processor 70 e.g., memory device 76, and/or the like.
  • the processor 70 may be embodied as, include or otherwise control a clustering manager 80.
  • the processor 70 may be said to cause, direct or control the execution or occurrence of the various functions attributed to the clustering manager 80 as described herein.
  • the clustering manager 80 may be any means such as a device or circuitry operating in accordance with software or otherwise embodied in hardware or a combination of hardware and software (e.g., processor 70 operating under software control, the processor 70 embodied as an ASIC or FPGA specifically configured to perform the operations described herein, or a combination thereof) thereby configuring the device or circuitry to perform the corresponding functions of the clustering manager 80 as described herein.
  • a device or circuitry e.g., the processor 70 in one example
  • executing the software forms the structure associated with such means.
  • the clustering manager 80 may be configured to initially perform clustering with respect to faces in a set of images. After the initial clustering, a set of clusters (e.g., an initial set of clusters) in which each respective cluster is defined by multiple corresponding images including the same face is formed.
  • a set of clusters e.g., an initial set of clusters
  • an initial set of singletons may also be defined where each singleton is essentially a cluster comprising one member image. In other words, each singleton is an image for which the face therein does not have a matching face in another image.
  • the additional images may be incrementally added to the photo album, gallery or image collection that was initially clustered.
  • the clustering manager 80 may be configured to merge the additional images into existing clusters (e.g., the initial clusters), without processing (or clustering) the set of images over again.
  • the clustering manager 80 may then be configured to cluster the initial set of singletons with the additional images to see if any additional clusters are formed.
  • the additional clusters (if any are formed) may then be merged with the initial clusters as modified by the merging of the initial clusters with the additional images.
  • the process described above may then be repeated such that the term "initial" may represent any previously existing set or previously performed operation, rather than a first instance of any particular set or operation.
  • FIG. 4 illustrates an example flow chart showing incremental clustering by the clustering manager 80 as described above.
  • a set of images Nl including faces may be provided to the system and clustered at operation 90 to form kl clusters and si singletons.
  • the clustering performed at operation 90 may be accomplished by a clustering algorithm based on an input distance matrix between elements D in the images, and a threshold for the distances T.
  • the input distance may be a mutual subspace distance.
  • a second set of images N2 including faces may be provided to the system.
  • the clustering manager 80 may be configured to merge the second set of images N2 with the existing kl clusters and also with the si singletons.
  • the second set of images N2 may be merged with the existing kl clusters at operation 92 to generate modified kl clusters with leftover s2 singletons.
  • the modified kl clusters may be identical to the existing kl clusters if none of the faces in the second set of images N2 are within a threshold distance from the faces in the existing k2 clusters.
  • a simple way to combine the N1+N2 faces together may be to recluster all of the images based on the faces therein.
  • Example embodiments of the present invention avoid this computational waste by avoiding processing of previously processed faces in the merging performed at operation 92.
  • clustering information that is already present in the system can be reused by the clustering manager 80 for subsequent merging and clustering operations.
  • the distances between each of the new faces in the second set of images N2 and the corresponding existing kl clusters are computed so that faces in the second set of images N2 are associated with a closest one of the existing kl clusters in response to the distance being less than a threshold Tl. Faces in images of the second set of images N2 that are greater than the distance defined by the threshold Tl are left as the s2 singletons.
  • the s2 singletons may then be clustered at operation 94 with the si singletons to define k2 clusters for any s2 singletons that happen to cluster with the si singletons.
  • the si singletons have already been processed by the clustering algorithm at least one time at this point and thus, the distances between the si singletons and the faces of the existing kl clusters has already been determined to be greater than the threshold T and these distances do not need to be recomputed.
  • a distance matrix of size sl+ s2) x (sl+ s2)
  • si singletons may be clustered with s2 singletons to determine the k2 clusters.
  • Remaining singletons s3 may exist including those singletons from the s2 singletons and the si singletons that did not cluster with each other.
  • the k2 clusters and the kl clusters (initial or modified) have not to been compared to each other yet at this point.
  • the cluster manager 80 may therefore be configured to compare the k2 clusters with the modified kl clusters based on the distance between the clusters.
  • the k2 clusters and the modified kl clusters may be merged at operation 96 in order to generate k3 clusters. Thereafter, the operations described above may be repeated where Nl and N2 represent subsequent additional sets of images with faces therein that are to be incrementally clustered according to an example embodiment.
  • the clustering to be performed within the above described example may be any of a number of algorithms such as k-means, normalized cuts, spectral clustering, and/or the like. Similarly, different distance metrics may be used for clustering, singleton merging, cluster merging and/or the like, in various example embodiments.
  • the cluster manager 80 may operate as described below in connection with the descriptions of FIGS. 6-9.
  • FIG. 6 illustrates a generalized flowchart of some example embodiments.
  • facial features may be extracted and normalized at 100.
  • Inputs used to generate the extracted and normalized facial features may include the outputs of a face detector, which may include face coordinates, and coordinates of the left and right eyes. Using these inputs, face normalization and feature extraction may be performed.
  • locations of the left and right eyes may be used to normalize the faces, such that the faces are upright and cropped based on the distance between the eyes.
  • histogram features based on local binary patterns of different gradient images may be extracted.
  • each feature may be a concatenation of the image feature and a feature from the image mirrored about an axis through the feature's center (e.g., mirrored about the Y axis through the feature's center).
  • the extracted features may be normalized such that each feature is a unit vector.
  • the features may then be split into groups or clusters.
  • the features may be split into clusters based on relative distances between the features.
  • the splitting may continue in a recursive fashion (as indicated by the arrow from operation 110 back to the input of 110) until convergence is achieved for all clusters of features relative to a distance threshold.
  • the distances between pairs of features that have been generated may be calculated.
  • a distance metric D(A,B) may be computed.
  • the distance metric may be based on subspaces (A, A') an d ( ⁇ , ⁇ '), where A' and B' are features corresponding to mirror images of A and B, respectively.
  • the set may be split into two clusters in response to a determination with respect to the distance threshold.
  • the splitting operation may be continued with each new cluster in a recursive fashion.
  • any clustering method may be used, including, but not limited to, k-means, k-medoids, graph- cuts, and Gaussian mixture models.
  • the splitting may recursively continue until an analysis of each cluster indicated that the cluster satisfies the stopping criterion.
  • a central element of the cluster may be identified.
  • the central element may be called the medoid of the cluster.
  • the medoid may be the feature for which the distance to its farthest neighbor is minimum amongst all elements of the cluster.
  • the medoid may also be the feature of a cluster that has the lowest average distance to the other features within the cluster.
  • the distance from the medoid to the feature within the cluster that is farthest from the medoid (also referred to as the min-max distance) may be compared to a threshold distance. If the min-max distance is greater than the threshold distance, then a split may be performed. If the min-max distance is less than the threshold distance, then no further splitting is needed for that cluster because the stopping criterion for the cluster has been met.
  • the threshold distance may be an attribute that is set in any of a number of ways.
  • the threshold distance may be input as a user preference by the user via a user interface.
  • the threshold distance may be set during a manufacturing process.
  • the threshold distance may be calculated based on a user's preferences as indicted by the user's actions captured by the device, without requiring the express inputting of a value for the threshold distance by the user.
  • modifying the threshold distance different degrees of clustering may be performed. In this regard, a shorter threshold distance may result in an increased granularity of clustering, with a cost of increased processing. On the other hand, a longer threshold distance may tend to reduce granularity and avoid the creation of unnecessary clusters that can result from too much splitting.
  • clusters may be analyzed again to determine if merging of clusters is needed. As such, at 120, clusters may be merged based on relative distance between the clusters. Similar to splitting, the process of merging clusters may be performed in a repetitive fashion (as indicated by the arrow from operation 120 back to the input of 120) until a stopping criterion is satisfied for each cluster.
  • the medoid of each cluster may again be utilized.
  • the distances between the medoids of each cluster may then be determined, where the distance may be calculated in the same manner as indicated above using the distances between subspaces.
  • the distances between the cluster pairs may be determined, and using, for example, the same clustering algorithm as used for splitting, merging of the clusters may be performed.
  • the merging of clusters may be continued until all clusters satisfy the stopping criterion.
  • the distances between the clusters based on the medoids of the clusters may be compared to the threshold distance.
  • This threshold distance may be same distance as described above with respect to the splitting. If the distance between clusters (or cluster medoids) is less than the distance threshold, the clusters are said to be "connected” and the clusters may be merged. A collection of connected clusters may be referred to as a clique. On the other hand, if the distance between clusters (or cluster medoids) is greater than the distance threshold, then clusters are sufficiently distinct and need not be merged.
  • the analysis of the medoid-to-medoid distances may be performed in a repetitive fashion, until an iteration of the process where the number of remaining clusters is left unchanged.
  • the output clusters may be identified at 130.
  • the output clusters may be associated with a particular individual, and, for example, the associated images may be tagged based on the clusters.
  • a distance metric may be calculated from that face to the existing clusters. If the calculated distance is less than a merge threshold, which may be the same as the distance threshold, the singular face may be merged into the existing cluster. Again, this distance measure may be based on a projection of the face onto a subspace formed by taking a subset of the vectors in a cluster, similar to distance measurement described above and otherwise provided herein.
  • the distance between the subspaces defined by pairs of clusters may be determined, and if this distance is less than a threshold (e.g., the threshold distance), the clusters may be merged. Upon completion of these additional merging operations, the output clusters may be identified and associated with an individual at 130. Accordingly, various example embodiments may perform the multi-pass processing described in FIG. 6, and otherwise herein, to cluster faces, based on hierarchical splitting into two clusters at each recursive operation. Further, according to some example embodiments, a particular distance metric calculation may be leveraged for use between image features and between clusters for splitting and merging, respectively. Additionally, a single distance threshold may be utilized for some or all operations of the multi-pass algorithm.
  • a threshold e.g., the threshold distance
  • Some example embodiments also avoid the need to explicitly determine cluster centers prior to the process and the process can automatically estimate the number of clusters and cluster centers as needed during the execution. Further, according to some example embodiments, a flexible clustering framework may be provided that may utilize any clustering algorithm to perform recursive splitting into two clusters.
  • FIG. 6 illustrates an example splitting process as provided herein.
  • FIG. 9 provides additional detail and alternatives to the operation of merging the clusters into clusters at 120.
  • the process may begin with receiving a plurality of features derived from a plurality of facial images at 200.
  • the received features may be extracted by a number of different techniques that, for example, rely on face detection and the outputs often provided by face detection.
  • a given face detection technique may provide face coordinates, eye coordinates, and a face normalization and feature extraction method which may be used as inputs.
  • a face detector may be utilized, according to various example embodiments, to detect faces within the images. For each detected face, the right and left eyes may be detected, and the faces may be normalized according to, for example, the distance between the eyes. Features may then be extracted from the normalized faces, and passed to the clustering algorithm for splitting the collection of features into clusters.
  • the plurality of features may be split into two clusters using a clustering algorithm. The splitting me be undertaken in an instance in which the medoid of the plurality of features is farther than a threshold distance from any feature within the plurality of features.
  • To calculate the distances used to determine whether or not to split the plurality of features or a cluster the following technique may be used. It is noteworthy that, according to various example embodiments, all distances calculated throughout a process of clustering facial images may be calculated as follows.
  • a normalization technique may be used to orthogonalize vectors generated based on the pair of elements.
  • a Gram- Schimdt normalization may be used to orthogonalize vectors (A, A') and (B, B'), where A' and B' are the features corresponding to the mirror images of A and B, respectively.
  • subspaces (a, a ') and (b, b ') may be generated.
  • the projection P 2 of B onto the subspace (a, a ') may be defined as:
  • a distance D may be defined as,
  • the distances between features may be considered to determine whether splitting of a given cluster is necessary.
  • the distance based on the above, may be determined between a medoid for the cluster and the feature farthest form the medoid (the min-max distance), and this distance may be compared to the threshold distance. If the min-max distance is greater than the threshold distance, then the features may be split into two clusters using any clustering algorithm as indicated above. If the min-max distance is less than the threshold distance, then the features do not need to be spilt, and the process ends.
  • each cluster is recursively split into two clusters using a clustering algorithm in an instance in which a medoid of the respective cluster is farther than the threshold distance from any feature within the respective cluster.
  • the clusters may be continuously split until the min-max distance of all clusters are less than the threshold distance.
  • the final clusters from the recursive splitting may be included as input clusters at 230 for the merging process as described with respect to FIG. 9.
  • FIG. 8 illustrates a splitting process pictorially.
  • the collection of features represented at 300 may be analyzed as described above and split into clusters 302 and 304.
  • the clusters 302 and 304 are analyzed and if the min-max distance for the cluster is greater than the threshold distance, the cluster 302 and 304 are further split into two additional clusters for each, thereby forming clusters 306, 308, 310, and 312.
  • these clusters may be considered for further splitting, but, as indicated in FIG. 8, the min-max distance of each is less than the threshold distance, and therefore no further splitting occurs with respect to these clusters. Accordingly, the operations of FIG.
  • the threshold distance used for the analysis may determine the nature of the clusters formed.
  • any clustering algorithm may be to perform the splitting, such as, for example, the k-means algorithm, which can provide low computational complexity and a small memory footprint.
  • a plurality of input clusters may be received, as provided, for example, from the example process described with respect to FIG. 7.
  • the clusters may include a first cluster and at least a second cluster.
  • medoids for each cluster are determined.
  • inter-cluster distances may be determined at 420.
  • the identity of the feature that is most distant from the medoid is identified.
  • the Euclidean distance between features may be used as the distance metric.
  • the distance D(A,B) may be calculated as the subspaces corresponding to (A me doid, A max i mum ) and (Bmedoid, B max i mum ), and the distance between the subspaces may calculated as described above.
  • the distance between the clusters may be the distance between the medoids of the pair of clusters being considered, which may be compared to the threshold distance to facilitate determining whether the cluster should be merged or not.
  • the clusters may be merged to generate a merged cluster in an instance in which the inter-cluster distance is less than the distance threshold.
  • the distances between clusters may be calculated as described above, and all pairs of clusters that have a distance less than the threshold distance may be merged.
  • the clusters are merged only if the clusters to be merged form a clique, where a clique may be a set where all pairs of members have a distance less than the distance threshold.
  • the merging may be performed repeatedly until the number of clusters between successive iterations does not change and then the process is exited.
  • a distance metric between the singleton and the existing clusters may be determined.
  • A is a singleton and B is a cluster with more than one member
  • n features out of B may be selected.
  • the features (B l s B 2 ,... B n ) may then be orthogonalized, for example, using Gram-Schmidt normalization to form the subspace (hi, b 2 , ...b n ).
  • clusters with more than one element may be merged using a distance metric between non- singleton clusters (clusters with more than one element).
  • a maximum of m features out of A and B may be selected, where the number of elements selected may be 3 ⁇ 4 and n 2 .
  • the features (A l s A 2 ... A nl ) may be orthogonalized using, for example, Gram-Schmidt normalization to form the subspace ( ⁇ 3 ⁇ 4 ⁇ 3 ⁇ 4 a pursui).
  • the subspace corresponding to B, (bi,b2 b n 2) may be
  • predefined merging threshold e.g., the threshold distance
  • the description provided above and generally herein illustrates example methods, example apparatuses, and example computer program products for clustering facial images.
  • some of the example embodiments may be utilize for group any type of image in to various different types of categories.
  • Other types of images that may be used include images of building or landmarks, images of animals, medical images, landscape images, or the like.
  • the threshold distance may be set based upon what type of images are being grouped. In some cases, the threshold may be set once for each set of images.
  • FIG. 10 is a flowchart of a system, method and program product according to example embodiments of the invention. It will be understood that each block of the flowchart, and combinations of blocks in the flowchart, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory device of an apparatus employing an embodiment of the present invention and executed by a processor in the apparatus.
  • any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the resulting computer or other programmable apparatus embody a mechanism for implementing the functions specified in the flowchart block(s).
  • These computer program instructions may also be stored in a computer-readable storage memory (as opposed to a transmission medium such as a carrier wave or electromagnetic signal) that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the function specified in the flowchart block(s).
  • the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart block(s).
  • the operations of FIG. 10 when executed, convert a computer or processing circuitry into a particular machine configured to perform an example embodiment of the present invention.
  • the operations of FIG. 10 define an algorithm for configuring a computer or processing circuitry (e.g., processor 70) to perform an example embodiment.
  • a general purpose computer may be provided with an instance of the clustering manager 80, which performs the algorithm shown in FIG. 10 (e.g., via configuration of the processor 70), to transform the general purpose computer into a particular machine configured to perform an example embodiment.
  • blocks of the flowchart support combinations of means for performing the specified functions, combinations of operations for performing the specified functions and program instructions for performing the specified functions. It will also be understood that one or more blocks of the flowchart, and combinations of blocks in the flowchart, can be implemented by special purpose hardware-based computer systems which perform the specified functions or operations, or combinations of special purpose hardware and computer instructions.
  • one embodiment of a method according to an example embodiment as shown in FIG. 10 may include, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons at operation 510.
  • the method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons at operation 520 and causing merging of the first set of clusters with the second set of clusters at operation 530.
  • certain ones of the operations above may be modified or further amplified as described below.
  • additional optional operations may also be included (an example of which is shown in dashed lines in FIG. 10). It should be appreciated that each of the modifications, optional additions or amplifications below may be included with the operations above either alone or in combination with any others among the features described herein.
  • all of the operations 510 to 530 may be repeated each time an additional set of images with faces is received so that incremental clustering of the faces in the digital images may be performed without requiring re-processing (e.g., distance calculations) for those faces that have already been previously processed.
  • the additional set of images may include any number from one to a plurality of images that are to be incrementally added to an existing library, gallery, collection or set of images that have already been clustered.
  • the method may further include an initial operation of causing performance of clustering with respect to faces in the first set of digital images to define the first set of clusters in which each cluster includes multiple images in which facial features are within a threshold distance from each other, and to define the first set of singletons in which each singleton does not have another image in which facial features are within the threshold distance from each other at operation 500.
  • causing merging of the first set of clusters with the second set of clusters may further include utilizing a hierarchical algorithm to generate a third set of clusters.
  • causing clustering of the first set of singletons with the second set of singletons further generates a third set of singletons based on singletons from the first set of singletons that were not within a threshold distance of any singletons in the second set of singletons.
  • causing merging of the faces in the second set of images into corresponding clusters among the first set of clusters is performed in response to receipt of a second set of images.
  • an apparatus for performing the method of FIG. 10 above may comprise a processor (e.g., the processor 70) configured to perform some or each of the operations (500-530) described above.
  • the processor 70 may, for example, be configured to perform the operations (500-530) by performing hardware implemented logical functions, executing stored instructions, or executing algorithms for performing each of the operations.
  • the apparatus may comprise means for performing each of the operations described above.
  • examples of means for performing operations 500-530 may comprise, for example, the clustering manager 80.
  • the processor 70 may be configured to control or even be embodied as the clustering manager 80, the processor 70 and/or a device or circuitry for executing instructions or executing an algorithm for processing information as described above may also form example means for performing operations 500-530.
  • An example of an apparatus may include at least one processor and at least one memory including computer program code.
  • the at least one memory and the computer program code may be configured to, with the at least one processor, cause the apparatus to perform the operations 500-530 (with or without the modifications and amplifications described above in any combination).
  • An example of a computer program product may include at least one computer-readable storage medium having computer-executable program code portions stored therein.
  • the computer-executable program code portions may include program code instructions for performing operation 500-530 (with or without the modifications and amplifications described above in any combination).
  • the operations (500-530) described above, along with any of the modifications may be implemented in a method that involves facilitating access to at least one interface to allow access to at least one service via at least one network.
  • the at least one service may be said to perform at least operations 500-530.

Abstract

A method for providing incremental clustering of faces in digital images may include, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons. The method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons and causing merging of the first set of clusters with the second set of clusters. An apparatus and computer program product corresponding to the method are also provided.

Description

METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR PROVIDING INCREMENTAL CLUSTERING OF FACES IN DIGITAL IMAGES
TECHNOLOGICAL FIELD
Embodiments of the present invention relate generally to image processing technology and, more particularly, relate to a method, apparatus and computer program product for providing incremental clustering of faces in digital images.
BACKGROUND
As digital photography technology continues to evolve and be
incorporated into devices such as mobile phones, users inevitably are leveraging this technology to capture more images and video. Additionally, social networking websites have also contributed to this digital image revolution by providing increasingly simple ways to share images, in some instances, in near real time. As users continue to capture and collect increasing numbers of images, the ability to organize and manage these images can become cumbersome and inefficient. For example, attempting to find images of a particular individual within a collection of images may require a user to individually view each image in a collection to obtain a group of images that include the desired person. This process can be time consuming and tedious.
BRIEF SUMMARY OF SOME EXAMPLES
A method, apparatus and computer program product are therefore provided to enable clustering of faces in digital images. In this regard, in some example embodiments, a mechanism is provided for incrementally clustering of faces in digital images. Thus, for example, new images including faces can be added to existing albums or collections that have already been clustered, and the new images may be clustered in consideration of the existing clusters. As such, embodiments of the present invention may provide a relatively robust ability for managing a collection of images.
In an example embodiment, a method of providing incremental clustering of faces in digital images is provided. The method may include, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons. The method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons and causing merging of the first set of clusters with the second set of clusters.
In another example embodiment, a computer program product for providing incremental clustering of faces in digital images is provided. The computer program product includes at least one computer-readable storage medium having computer-executable program code instructions stored therein. The computer-executable program code instructions may include program code instructions for, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons. The method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons and causing merging of the first set of clusters with the second set of clusters.
In another example embodiment, an apparatus for providing incremental clustering of faces in digital images is provided. The apparatus may include at least one processor and at least one memory including computer program code. The at least one memory and the computer program code may be configured, with the at least one processor, to cause the apparatus to perform at least, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons. The method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons and causing merging of the first set of clusters with the second set of clusters.
In yet another example embodiment, an apparatus for providing incremental clustering of faces in digital images is provided. The apparatus may include means for causing merging, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons. The method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons and causing merging of the first set of clusters with the second set of clusters.
Embodiments of the invention may provide a method, apparatus and computer program product for employment, for example, in mobile or fixed environments. As a result, for example, computing device users may enjoy an improved capability for clustering of faces in digital images.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING(S)
Having thus described some embodiments of the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
FIG. 1 illustrates a block diagram of a mobile terminal that may benefit from an example embodiment of the present invention;
FIG. 2 is a schematic block diagram of a wireless communications system according to an example embodiment of the present invention;
FIG. 3 illustrates an apparatus for enabling the provision of incremental clustering of faces in digital images according to an example embodiment of the present invention;
FIG. 4 shows a flow diagram illustrating one example of processing in order to provide incremental clustering of faces in digital images according to an example embodiment of the present invention;
FIG. 5 illustrates a distance matrix used for clustering singletons in accordance with an example embodiment of the present invention;
FIG. 6 illustrates an example flowchart for clustering images according to an example embodiment;
FIG. 7 illustrates an example flowchart for clustering features through splitting according to an example embodiment; FIG. 8 illustrates and example splitting scenario according to various example embodiments;
FIG. 9 illustrates an example flowchart for merging clusters according to an example embodiment; and
FIG. 10 is a flowchart according to an example method for providing incremental clustering of faces in digital images according to an example embodiment of the present invention.
DETAILED DESCRIPTION OF SOME EMBODIMENTS
Some embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. Indeed, various embodiments of the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, the terms "data," "content," "information" and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in accordance with embodiments of the present invention. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present invention.
Additionally, as used herein, the term 'circuitry' refers to (a) hardware- only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even if the software or firmware is not physically present. This definition of 'circuitry' applies to all uses of this term herein, including in any claims. As a further example, as used herein, the term 'circuitry' also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware. As another example, the term 'circuitry' as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
As defined herein a "computer-readable storage medium," which refers to a non-transitory, physical storage medium (e.g., volatile or non-volatile memory device), can be differentiated from a "computer-readable transmission medium," which refers to an electromagnetic signal.
As indicated above, management of large albums, galleries and other collections of images may become difficult given the ease with which large collections can be created. In many cases, users may desire to group images of the same person together into what is called a cluster. In large collections, manual clustering of image may be extremely tedious. To avoid manual clustering, face clustering algorithms have been developed to automatically group faces belonging to the same person together. Thus, some auto-tagging of photo albums may be accomplished. However, these auto-tagging techniques typically involve running an algorithm to cluster faces over the whole collection of images. Thus, all of the images to be clustered are typically available at the same time and the algorithm runs over the entire collection.
In reality, however, images tend to be added to collections incrementally.
As various events unfold, new images are produced to expand already existing collections of images. To cluster expanded collections of images, would typically involve re -running the clustering algorithm over the entire new set of images. This approach would use more processing power than is needed, since even previously clustered images would be clustered again.
Some example embodiments may provide for more efficient clustering involving new images and previously clustered images. In this regard, for example, some embodiments may enable new images to be merged with existing clusters. The new images may then be merged within themselves and also be clustered with previous singleton images to create new clusters. A singleton image may be defined as a cluster with only one member. The new clusters may then be merged with the existing (and now perhaps also augmented) clusters to form a new cluster set. Accordingly, there is no need to re-process faces that have been previously processed into the existing clusters and incremental clustering may be performed in an efficient manner. FIG. 1, one example embodiment of the invention, illustrates a block diagram of a mobile terminal 10 that may benefit from embodiments of the present invention. It should be understood, however, that a mobile terminal as illustrated and hereinafter described is merely illustrative of one type of device that may benefit from embodiments of the present invention and, therefore, should not be taken to limit the scope of embodiments of the present invention. While several embodiments of the mobile terminal 10 may be illustrated and hereinafter described for purposes of example, other types of mobile terminals, such as portable digital assistants (PDAs), pagers, mobile televisions, gaming devices, all types of computers (e.g., laptops or mobile computers), cameras, audio/video players, radio, global positioning system (GPS) devices, or any combination of the aforementioned, and other types of communications systems, may readily employ embodiments of the present invention.
The mobile terminal 10 may include an antenna 12 (or multiple antennas) in operable communication with a transmitter 14 and a receiver 16. The mobile terminal 10 may further include an apparatus, such as a controller 20 or other processor, that provides signals to and receives signals from the transmitter 14 and receiver 16, respectively. The signals may include signaling information in accordance with the air interface standard of the applicable cellular system, and/or may also include data corresponding to user speech, received data and/or user generated data. In this regard, the mobile terminal 10 may be capable of operating with one or more air interface standards, communication protocols, modulation types, and access types. By way of illustration, the mobile terminal 10 may be capable of operating in accordance with any of a number of first, second, third and/or fourth-generation communication protocols or the like. For example, the mobile terminal 10 may be capable of operating in accordance with second- generation (2G) wireless communication protocols IS- 136 (time division multiple access (TDMA)), GSM (global system for mobile communication), and IS-95 (code division multiple access (CDMA)), or with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS), CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA), with 3.9G wireless communication protocol such as E-UTRAN (evolved- universal terrestrial radio access network), with fourth-generation (4G) wireless communication protocols or the like. As an alternative (or additionally), the mobile terminal 10 may be capable of operating in accordance with non-cellular communication mechanisms. For example, the mobile terminal 10 may be capable of communication in a wireless local area network (WLAN) or other communication networks.
It is understood that the apparatus, such as the controller 20, may include circuitry implementing, among others, audio and logic functions of the mobile terminal 10. For example, the controller 20 may comprise a digital signal processor device, a microprocessor device, and various analog to digital converters, digital to analog converters, and/or other support circuits. Control and signal processing functions of the mobile terminal 10 are allocated between these devices according to their respective capabilities. The controller 20 thus may also include the functionality to convolutionally encode and interleave message and data prior to modulation and transmission. The controller 20 may additionally include an internal voice coder, and may include an internal data modem. Further, the controller 20 may include functionality to operate one or more software programs, which may be stored in memory. For example, the controller 20 may be capable of operating a connectivity program, such as a conventional Web browser. The connectivity program may then allow the mobile terminal 10 to transmit and receive Web content, such as location-based content and/or other web page content, according to a Wireless Application Protocol (WAP), Hypertext Transfer Protocol (HTTP) and/or the like, for example.
The mobile terminal 10 may also comprise a user interface including an output device such as an earphone or speaker 24, a ringer 22, a microphone 26, a display 28, and a user input interface, which may be coupled to the controller 20. The user input interface, which allows the mobile terminal 10 to receive data, may include any of a number of devices allowing the mobile terminal 10 to receive data, such as a keypad 30, a touch display (not shown), a microphone or other input device. In embodiments including the keypad 30, the keypad 30 may include numeric (0-9) and related keys (#, *), and other hard and soft keys used for operating the mobile terminal 10. Alternatively, the keypad 30 may include a conventional QWERTY keypad arrangement. The keypad 30 may also include various soft keys with associated functions. In addition, or alternatively, the mobile terminal 10 may include an interface device such as a joystick or other user input interface. The mobile terminal 10 further includes a battery 34, such as a vibrating battery pack, for powering various circuits that are used to operate the mobile terminal 10, as well as optionally providing mechanical vibration as a detectable output.
In some embodiments, the mobile terminal 10 includes a media capturing element, such as a camera, video and/or audio module, in communication with the controller 20. The media capturing element may be any means for capturing an image, video and/or audio for storage, display or transmission. For example, in an example embodiment in which the media capturing element is a camera module 36, the camera module 36 may include a digital camera capable of forming a digital image file from a captured image. As such, the camera module 36 includes all hardware, such as a lens or other optical component(s), and software necessary for creating a digital image file from a captured image. Alternatively, the camera module 36 may include only the hardware needed to view an image, while a memory device of the mobile terminal 10 stores instructions for execution by the controller 20 in the form of software necessary to create a digital image file from a captured image. In an example embodiment, the camera module 36 may further include a processing element such as a co-processor which assists the controller 20 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to a JPEG standard format or another like format. In some cases, the camera module 36 may provide live image data to the display 28. Moreover, in an example embodiment, the display 28 may be located on one side of the mobile terminal 10 and the camera module 36 may include a lens positioned on the opposite side of the mobile terminal 10 with respect to the display 28 to enable the camera module 36 to capture images on one side of the mobile terminal 10 and present a view of such images to the user positioned on the other side of the mobile terminal 10.
The mobile terminal 10 may further include a user identity module (UIM)
38, which may generically be referred to as a smart card. The UIM 38 is typically a memory device having a processor built in. The UIM 38 may include, for example, a subscriber identity module (SIM), a universal integrated circuit card (UICC), a universal subscriber identity module (USIM), a removable user identity module (R-UIM), or any other smart card. The UIM 38 typically stores information elements related to a mobile subscriber. In addition to the UIM 38, the mobile terminal 10 may be equipped with memory. For example, the mobile terminal 10 may include volatile memory 40, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data. The mobile terminal 10 may also include other non- volatile memory 42, which may be embedded and/or may be removable. The non-volatile memory 42 may additionally or alternatively comprise an electrically erasable programmable read only memory (EEPROM), flash memory or the like. The memories may store any of a number of pieces of information, and data, used by the mobile terminal 10 to implement the functions of the mobile terminal 10.
FIG. 2 is a schematic block diagram of a wireless communications system according to an example embodiment of the present invention. Referring now to FIG. 2, an illustration of one type of system that would benefit from embodiments of the present invention is provided. As shown in FIG. 2, a system in accordance with an example embodiment of the present invention includes a first communication device (e.g., mobile terminal 10) and in some cases also a second communication device 48 that may each be capable of communication with a network 50. The second communication device 48 may be another mobile terminal, or a fixed computer or computer terminal (e.g., a personal computer (PC)). The second communication device 48 is provided to illustrate that example embodiments may be practiced on multiple devices or in connection with multiple devices. Thus, there may be multiplicity with respect to instances of other devices that may be included in the network 50 and that may practice example embodiments independent of, or in connection with, the network 50. The communications devices of the system may be able to communicate with network devices or with each other via the network 50. In some cases, the network devices with which the communication devices of the system communicate may include a service platform 60. In an example embodiment, the mobile terminal 10 (and/or the second communication device 48) is enabled to communicate with the service platform 60 to provide, request and/or receive information. However, in some embodiments, not all systems that employ embodiments of the present invention may comprise all the devices illustrated and/or described herein.
In an example embodiment, the network 50 includes a collection of various different nodes, devices or functions that are capable of communication with each other via corresponding wired and/or wireless interfaces. As such, the illustration of FIG. 2 should be understood to be an example of a broad view of certain elements of the system and not an all inclusive or detailed view of the system or the network 50. Although not necessary, in some embodiments, the network 50 may be capable of supporting
communication in accordance with any one or more of a number of first-generation (1G), second-generation (2G), 2.5G, third-generation (3G), 3.5G, 3.9G, fourth-generation (4G) mobile communication protocols, Long Term Evolution (LTE), LTE advanced (LTE-A), and/or the like.
One or more communication terminals such as the mobile terminal 10 and the second communication device 48 may be capable of communication with each other via the network 50 and each may include an antenna or antennas for transmitting signals to and for receiving signals from a base site, which could be, for example a base station that is a part of one or more cellular or mobile networks or an access point that may be coupled to a data network, such as a local area network (LAN), a metropolitan area network (MAN), and/or a wide area network (WAN), such as the Internet. In turn, other devices such as processing devices or elements (e.g., personal computers, server computers or the like) may be coupled to the mobile terminal 10 and the second communication device 48 via the network 50. By directly or indirectly connecting the mobile terminal 10, the second communication device 48 and other devices to the network 50, the mobile terminal 10 and the second communication device 48 may be enabled to communicate with the other devices (or each other), for example, according to numerous communication protocols including Hypertext Transfer Protocol (HTTP) and/or the like, to thereby carry out various communication or other functions of the mobile terminal 10 and the second
communication device 48, respectively.
Furthermore, although not shown in FIG. 2, the mobile terminal 10 and the second communication device 48 may communicate in accordance with, for example, radio frequency (RF), Bluetooth (BT), Infrared (IR) or any of a number of different wireline or wireless communication techniques, including LAN, wireless LAN (WLAN), Worldwide Interoperability for Microwave Access (WiMAX), WiFi, ultra-wide band (UWB), Wibree techniques and/or the like. As such, the mobile terminal 10 and the second communication device 48 may be enabled to communicate with the network 50 and each other by any of numerous different access mechanisms. For example, mobile access mechanisms such as wideband code division multiple access (W-CDMA), CDMA2000, global system for mobile communications (GSM), general packet radio service (GPRS) and/or the like may be supported as well as wireless access mechanisms such as WLAN, WiMAX, and/or the like and fixed access mechanisms such as digital subscriber line (DSL), cable modems, Ethernet and/or the like.
In an example embodiment, the service platform 60 may be a device or node such as a server or other processing device. The service platform 60 may have any number of functions or associations with various services. As such, for example, the service platform 60 may be a platform such as a dedicated server (or server bank) associated with a particular information source or service (e.g., face recognition, image tagging, clustering based on face recognition and/or the like), or the service platform 60 may be a backend server associated with one or more other functions or services. As such, the service platform 60 represents a potential host for a plurality of different services or information sources. In some embodiments, the functionality of the service platform 60 is provided by hardware and/or software components configured to operate in accordance with known techniques for the provision of information to users of communication devices. However, at least some of the functionality provided by the service platform 60 is information provided in accordance with example embodiments of the present invention.
In an example embodiment, the service platform 60 may host an apparatus for providing services related to clustering images based on faces in the images to a device practicing an embodiment of the present invention. As such, in some embodiments, the service platform 60 may itself perform example embodiments, while in other
embodiments, the service platform 60 may facilitate (e.g., by the provision of image data or processing of image data) operation of an example embodiment at another device (e.g., the mobile terminal 10 and/or the second communication device 48). In still other example embodiments, the service platform 60 may not be included at all. In other words, in some embodiments, operations in accordance with an example embodiment may be performed at the mobile terminal 10 and/or the second communication device 48 without any interaction with the network 50 and/or the service platform 60.
An example embodiment will now be described with reference to FIG. 3, in which certain elements of an apparatus for enabling the provision of incremental clustering of faces in digital images are displayed. The apparatus of FIG. 3 may be employed, for example, on the service platform 60, the mobile terminal 10 or second communication device 48 of FIG. 2. However, it should be noted that the apparatus of FIG. 3, may also be employed on a variety of other devices. Therefore, example embodiments should not be limited to application on devices such as the service platform 60, the mobile terminal 10 or second communication device 48 of FIG. 2. Alternatively, embodiments may be employed on a combination of devices including, for example, those listed above. Accordingly, some example embodiments may be embodied wholly at a single device (e.g., the service platform 60, the mobile terminal 10 or the second communication device 48) or by devices in a client/server relationship (e.g., the service platform 60 serving information to the mobile terminal 10 and/or the second
communication device 48). Furthermore, it should be noted that the devices or elements described below may not be mandatory and thus some may be omitted in certain embodiments.
Referring now to FIG. 3, an apparatus 65 for enabling the provision of incremental clustering of faces in digital images is provided. The apparatus 65 may include or otherwise be in communication with a processor 70, a user interface 72, a communication interface 74 and a memory device 76. The memory device 76 may include, for example, one or more volatile and/or non- volatile memories. In other words, for example, the memory device 76 may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like the processor 70). The memory device 76 may be configured to store information, data, applications, instructions or the like for enabling the apparatus to carry out various functions in accordance with example embodiments of the present invention. For example, the memory device 76 could be configured to buffer input data for processing by the processor 70. Additionally or alternatively, the memory device 76 could be configured to store instructions for execution by the processor 70.
The apparatus 65 may, in some embodiments, be a network device (e.g., service platform 60) or other devices (e.g., the mobile terminal 10 or the second communication device 48) that may operate independent of or in connection with a network. However, in some embodiments, the apparatus 65 may be instantiated at one or more of the service platform 60, the mobile terminal 10 and the second communication device 48. Thus, the apparatus 65 may be any computing device configured to employ an example embodiment of the present invention. However, in some embodiments, the apparatus 65 may be embodied as a chip or chip set (which may in turn be employed at one of the devices mentioned above). In other words, the apparatus 65 may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus 65 may therefore, in some cases, be configured to implement an embodiment of the present invention on a single chip or as a single "system on a chip." As such, in some cases, a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
The processor 70 may be embodied in a number of different ways. For example, the processor 70 may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other processing circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. As such, in some embodiments, the processor 70 may include one or more processing cores configured to perform independently. A multi-core processor may enable multiprocessing within a single physical package. Additionally or alternatively, the processor 70 may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
In an example embodiment, the processor 70 may be configured to execute instructions stored in the memory device 76 or otherwise accessible to the processor 70. Alternatively or additionally, the processor 70 may be configured to execute hard coded functionality. As such, whether configured by hardware or software methods, or by a combination thereof, the processor 70 may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present invention while configured accordingly. Thus, for example, when the processor 70 is embodied as an ASIC, FPGA or the like, the processor 70 may be specifically configured hardware for conducting the operations described herein.
Alternatively, as another example, when the processor 70 is embodied as an executor of software instructions, the instructions may specifically configure the processor 70 to perform the algorithms and/or operations described herein when the instructions are executed. However, in some cases, the processor 70 may be a processor of a specific device (e.g., a mobile terminal or network device) adapted for employing an embodiment of the present invention by further configuration of the processor 70 by instructions for performing the algorithms and/or operations described herein. The processor 70 may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processor 70.
Meanwhile, the communication interface 74 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data from/to a network and/or any other device or module in communication with the apparatus 50. In this regard, the communication interface 74 may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. In some environments, the communication interface 74 may alternatively or also support wired communication. As such, for example, the communication interface 74 may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.
The user interface 72 may be in communication with the processor 70 to receive an indication of a user input at the user interface 72 and/or to provide an audible, visual, mechanical or other output to the user. As such, the user interface 72 may include, for example, a keyboard, a mouse, a joystick, a display, a touch screen(s), touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms. In an example embodiment in which the apparatus 65 is embodied as a server or some other network devices, the user interface 72 may be limited, or eliminated. However, in an embodiment in which the apparatus 65 is embodied as a communication device (e.g., the mobile terminal 10), the user interface 72 may include, among other devices or elements, any or all of a speaker, a microphone, a display, and a keyboard or the like. In this regard, for example, the processor 70 may comprise user interface circuitry configured to control at least some functions of one or more elements of the user interface, such as, for example, a speaker, ringer, microphone, display, and/or the like. The processor 70 and/or user interface circuitry comprising the processor 70 may be configured to control one or more functions of one or more elements of the user interface through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor 70 (e.g., memory device 76, and/or the like).
In an example embodiment, the processor 70 may be embodied as, include or otherwise control a clustering manager 80. As such, in some embodiments, the processor 70 may be said to cause, direct or control the execution or occurrence of the various functions attributed to the clustering manager 80 as described herein. The clustering manager 80 may be any means such as a device or circuitry operating in accordance with software or otherwise embodied in hardware or a combination of hardware and software (e.g., processor 70 operating under software control, the processor 70 embodied as an ASIC or FPGA specifically configured to perform the operations described herein, or a combination thereof) thereby configuring the device or circuitry to perform the corresponding functions of the clustering manager 80 as described herein. Thus, in examples in which software is employed, a device or circuitry (e.g., the processor 70 in one example) executing the software forms the structure associated with such means.
The clustering manager 80 may be configured to initially perform clustering with respect to faces in a set of images. After the initial clustering, a set of clusters (e.g., an initial set of clusters) in which each respective cluster is defined by multiple corresponding images including the same face is formed. In addition to the initial set of clusters, an initial set of singletons may also be defined where each singleton is essentially a cluster comprising one member image. In other words, each singleton is an image for which the face therein does not have a matching face in another image.
After initial clustering, if additional images are taken, the additional images may be incrementally added to the photo album, gallery or image collection that was initially clustered. When these additional images are added, the clustering manager 80 may be configured to merge the additional images into existing clusters (e.g., the initial clusters), without processing (or clustering) the set of images over again. The clustering manager 80 may then be configured to cluster the initial set of singletons with the additional images to see if any additional clusters are formed. The additional clusters (if any are formed) may then be merged with the initial clusters as modified by the merging of the initial clusters with the additional images. The process described above may then be repeated such that the term "initial" may represent any previously existing set or previously performed operation, rather than a first instance of any particular set or operation.
FIG. 4 illustrates an example flow chart showing incremental clustering by the clustering manager 80 as described above. In this regard, as shown in FIG. 4, a set of images Nl including faces may be provided to the system and clustered at operation 90 to form kl clusters and si singletons. The clustering performed at operation 90 may be accomplished by a clustering algorithm based on an input distance matrix between elements D in the images, and a threshold for the distances T. In an example embodiment, the input distance may be a mutual subspace distance.
At some later time, a second set of images N2 including faces may be provided to the system. The clustering manager 80 may be configured to merge the second set of images N2 with the existing kl clusters and also with the si singletons. Thus, the second set of images N2 may be merged with the existing kl clusters at operation 92 to generate modified kl clusters with leftover s2 singletons. The modified kl clusters may be identical to the existing kl clusters if none of the faces in the second set of images N2 are within a threshold distance from the faces in the existing k2 clusters. A simple way to combine the N1+N2 faces together may be to recluster all of the images based on the faces therein. However, over time the number of images to be processed keeps growing until the cost of doing this type of reclustering becomes prohibitive. In addition, all the information obtained from previous clustering operations would be discarded each time, and thus, previously performed clustering operations are repeated each time a combination of image sets for clustering is performed. Example embodiments of the present invention avoid this computational waste by avoiding processing of previously processed faces in the merging performed at operation 92.
Accordingly, clustering information that is already present in the system can be reused by the clustering manager 80 for subsequent merging and clustering operations. As such, for example, the distances between each of the new faces in the second set of images N2 and the corresponding existing kl clusters are computed so that faces in the second set of images N2 are associated with a closest one of the existing kl clusters in response to the distance being less than a threshold Tl. Faces in images of the second set of images N2 that are greater than the distance defined by the threshold Tl are left as the s2 singletons. The s2 singletons may then be clustered at operation 94 with the si singletons to define k2 clusters for any s2 singletons that happen to cluster with the si singletons. The si singletons have already been processed by the clustering algorithm at least one time at this point and thus, the distances between the si singletons and the faces of the existing kl clusters has already been determined to be greater than the threshold T and these distances do not need to be recomputed. Out of a distance matrix of size (sl+ s2) x (sl+ s2), it is only necessary to compute (sl+ s2) x s2 distances -the distances between the new s2 singletons, and between the new s2 singletons and the older si singletons. The remaining si x si distances are filled to a very high value DMAX > T (in some examples a metric similar to the cosine distance may be employed and thus DMAX = 1, the higher bound of the metric). Accordingly, a reduction in the number of computations required for calculating distances may be reduced by a factor of si x si. The modified distance matrix is shown in FIG. 5, which shows a distance matrix used for clustering singletons. Notably, since the distance matrix is symmetric (Dy = Dji), only the lower triangular part of the matrix may need to be computed.
Thus, for example, using the modified distance matrix, si singletons may be clustered with s2 singletons to determine the k2 clusters. Remaining singletons s3 may exist including those singletons from the s2 singletons and the si singletons that did not cluster with each other. The k2 clusters and the kl clusters (initial or modified) have not to been compared to each other yet at this point. The cluster manager 80 may therefore be configured to compare the k2 clusters with the modified kl clusters based on the distance between the clusters. Thus, the k2 clusters and the modified kl clusters may be merged at operation 96 in order to generate k3 clusters. Thereafter, the operations described above may be repeated where Nl and N2 represent subsequent additional sets of images with faces therein that are to be incrementally clustered according to an example embodiment.
The clustering to be performed within the above described example may be any of a number of algorithms such as k-means, normalized cuts, spectral clustering, and/or the like. Similarly, different distance metrics may be used for clustering, singleton merging, cluster merging and/or the like, in various example embodiments. In one example, the cluster manager 80 may operate as described below in connection with the descriptions of FIGS. 6-9. FIG. 6 illustrates a generalized flowchart of some example embodiments. In this regard, based on a collection of images, facial features may be extracted and normalized at 100. Inputs used to generate the extracted and normalized facial features may include the outputs of a face detector, which may include face coordinates, and coordinates of the left and right eyes. Using these inputs, face normalization and feature extraction may be performed.
In this regard, locations of the left and right eyes may be used to normalize the faces, such that the faces are upright and cropped based on the distance between the eyes. Based on these normalized faces, histogram features based on local binary patterns of different gradient images may be extracted. According to some example embodiments, each feature may be a concatenation of the image feature and a feature from the image mirrored about an axis through the feature's center (e.g., mirrored about the Y axis through the feature's center). The extracted features may be normalized such that each feature is a unit vector.
Having extracted and normalized the features, the features may then be split into groups or clusters. As such, at 110, the features may be split into clusters based on relative distances between the features. The splitting may continue in a recursive fashion (as indicated by the arrow from operation 110 back to the input of 110) until convergence is achieved for all clusters of features relative to a distance threshold. To consider relative distances, the distances between pairs of features that have been generated may be calculated. In this regard, for every feature pair (A,B), a distance metric D(A,B) may be computed. In some example embodiments, the distance metric may be based on subspaces (A, A') and (Β,Β'), where A' and B' are features corresponding to mirror images of A and B, respectively.
As indicated above, starting with the entire set of features, according to various example embodiments, the set may be split into two clusters in response to a determination with respect to the distance threshold. The splitting operation may be continued with each new cluster in a recursive fashion. To perform the splitting, any clustering method may be used, including, but not limited to, k-means, k-medoids, graph- cuts, and Gaussian mixture models. The splitting may recursively continue until an analysis of each cluster indicated that the cluster satisfies the stopping criterion. To determine whether or not to split a cluster of features, a central element of the cluster may be identified. The central element may be called the medoid of the cluster. The medoid may be the feature for which the distance to its farthest neighbor is minimum amongst all elements of the cluster. In an alternative definition, the medoid may also be the feature of a cluster that has the lowest average distance to the other features within the cluster. To determine whether a split is to be performed, the distance from the medoid to the feature within the cluster that is farthest from the medoid (also referred to as the min-max distance), may be compared to a threshold distance. If the min-max distance is greater than the threshold distance, then a split may be performed. If the min-max distance is less than the threshold distance, then no further splitting is needed for that cluster because the stopping criterion for the cluster has been met.
In an embodiment, the threshold distance may be an attribute that is set in any of a number of ways. For example, the threshold distance may be input as a user preference by the user via a user interface. Additionally, or alternatively, the threshold distance may be set during a manufacturing process. Additionally, or alternatively, the threshold distance may be calculated based on a user's preferences as indicted by the user's actions captured by the device, without requiring the express inputting of a value for the threshold distance by the user. By modifying the threshold distance, different degrees of clustering may be performed. In this regard, a shorter threshold distance may result in an increased granularity of clustering, with a cost of increased processing. On the other hand, a longer threshold distance may tend to reduce granularity and avoid the creation of unnecessary clusters that can result from too much splitting.
Subsequent to splitting the features into clusters at 110, the clusters may be analyzed again to determine if merging of clusters is needed. As such, at 120, clusters may be merged based on relative distance between the clusters. Similar to splitting, the process of merging clusters may be performed in a repetitive fashion (as indicated by the arrow from operation 120 back to the input of 120) until a stopping criterion is satisfied for each cluster.
Accordingly, after the clusters are created in 110, the medoid of each cluster may again be utilized. The distances between the medoids of each cluster may then be determined, where the distance may be calculated in the same manner as indicated above using the distances between subspaces. As such, the distances between the cluster pairs may be determined, and using, for example, the same clustering algorithm as used for splitting, merging of the clusters may be performed. The merging of clusters may be continued until all clusters satisfy the stopping criterion.
In this regard, to determine whether to merge two clusters, the distances between the clusters based on the medoids of the clusters may be compared to the threshold distance. This threshold distance may be same distance as described above with respect to the splitting. If the distance between clusters (or cluster medoids) is less than the distance threshold, the clusters are said to be "connected" and the clusters may be merged. A collection of connected clusters may be referred to as a clique. On the other hand, if the distance between clusters (or cluster medoids) is greater than the distance threshold, then clusters are sufficiently distinct and need not be merged. As indicated above, the analysis of the medoid-to-medoid distances may be performed in a repetitive fashion, until an iteration of the process where the number of remaining clusters is left unchanged.
According to various example embodiments, upon completion of the merging process at 120, the output clusters may be identified at 130. In this regard, the output clusters may be associated with a particular individual, and, for example, the associated images may be tagged based on the clusters.
However, according to some example embodiments, prior to associating the merged clusters from operation 120 with a particular individual, some additional processing and merging may be undertaken. In this regard, after the multi-pass merging of clusters is complete at 120, additional operations may be performed. For example, for faces which are not in any cluster, a distance metric may be calculated from that face to the existing clusters. If the calculated distance is less than a merge threshold, which may be the same as the distance threshold, the singular face may be merged into the existing cluster. Again, this distance measure may be based on a projection of the face onto a subspace formed by taking a subset of the vectors in a cluster, similar to distance measurement described above and otherwise provided herein. Additionally, for all existing clusters, the distance between the subspaces defined by pairs of clusters may be determined, and if this distance is less than a threshold (e.g., the threshold distance), the clusters may be merged. Upon completion of these additional merging operations, the output clusters may be identified and associated with an individual at 130. Accordingly, various example embodiments may perform the multi-pass processing described in FIG. 6, and otherwise herein, to cluster faces, based on hierarchical splitting into two clusters at each recursive operation. Further, according to some example embodiments, a particular distance metric calculation may be leveraged for use between image features and between clusters for splitting and merging, respectively. Additionally, a single distance threshold may be utilized for some or all operations of the multi-pass algorithm. Some example embodiments also avoid the need to explicitly determine cluster centers prior to the process and the process can automatically estimate the number of clusters and cluster centers as needed during the execution. Further, according to some example embodiments, a flexible clustering framework may be provided that may utilize any clustering algorithm to perform recursive splitting into two clusters.
Having described some example embodiments in general terms with respect to FIG. 6, the following provides a description of additional details and/or alternatives based on the content of FIGs. 7-9. The content of FIG. 7 and the associated description below provides additional detail and alternatives to the operation of splitting the features into clusters at 110. FIG. 8 illustrates an example splitting process as provided herein. Finally, the content of FIG. 9 and the associated description below provides additional detail and alternatives to the operation of merging the clusters into clusters at 120.
Referring now to FIG. 7, an example process for receiving and splitting extracted features into clusters is provided. In this regard, the process may begin with receiving a plurality of features derived from a plurality of facial images at 200. The received features may be extracted by a number of different techniques that, for example, rely on face detection and the outputs often provided by face detection. For example, a given face detection technique may provide face coordinates, eye coordinates, and a face normalization and feature extraction method which may be used as inputs.
In this regard, given a set of images containing multiple faces, a face detector may be utilized, according to various example embodiments, to detect faces within the images. For each detected face, the right and left eyes may be detected, and the faces may be normalized according to, for example, the distance between the eyes. Features may then be extracted from the normalized faces, and passed to the clustering algorithm for splitting the collection of features into clusters. At 210, the plurality of features may be split into two clusters using a clustering algorithm. The splitting me be undertaken in an instance in which the medoid of the plurality of features is farther than a threshold distance from any feature within the plurality of features. To calculate the distances used to determine whether or not to split the plurality of features or a cluster, the following technique may be used. It is noteworthy that, according to various example embodiments, all distances calculated throughout a process of clustering facial images may be calculated as follows.
For each feature pair (A,B), a normalization technique may be used to orthogonalize vectors generated based on the pair of elements. For example, a Gram- Schimdt normalization may be used to orthogonalize vectors (A, A') and (B, B'), where A' and B' are the features corresponding to the mirror images of A and B, respectively. By performing this operation, subspaces (a, a ') and (b, b ') may be generated. The projection Pi of A onto the subspace (b, b ') may be defined as: pi = j(a - bf + (a - b'f .
Similarly, the projection P2 of B onto the subspace (a, a ') may be defined as:
P2 = ](b - a)2 + (b - a')2 .
Using these projections, a distance D may be defined as,
D(A,B) = l - ( ; + 2)/2.
According to the relations provided above, the distances between features may be considered to determine whether splitting of a given cluster is necessary. In particular, the distance, based on the above, may be determined between a medoid for the cluster and the feature farthest form the medoid (the min-max distance), and this distance may be compared to the threshold distance. If the min-max distance is greater than the threshold distance, then the features may be split into two clusters using any clustering algorithm as indicated above. If the min-max distance is less than the threshold distance, then the features do not need to be spilt, and the process ends.
Assuming that the features are split into two clusters, at 220, a recursive splitting of each cluster is undertaken. In this regard, each cluster is recursively split into two clusters using a clustering algorithm in an instance in which a medoid of the respective cluster is farther than the threshold distance from any feature within the respective cluster. As such, the clusters may be continuously split until the min-max distance of all clusters are less than the threshold distance. The final clusters from the recursive splitting may be included as input clusters at 230 for the merging process as described with respect to FIG. 9.
FIG. 8 illustrates a splitting process pictorially. In this regard, the collection of features represented at 300 may be analyzed as described above and split into clusters 302 and 304. Through the recursive operation of the technique, the clusters 302 and 304 are analyzed and if the min-max distance for the cluster is greater than the threshold distance, the cluster 302 and 304 are further split into two additional clusters for each, thereby forming clusters 306, 308, 310, and 312. In turn, these clusters may be considered for further splitting, but, as indicated in FIG. 8, the min-max distance of each is less than the threshold distance, and therefore no further splitting occurs with respect to these clusters. Accordingly, the operations of FIG. 7 may amount to a hierarchical algorithm which recursively splits a set of features into two clusters. The threshold distance used for the analysis, which may be input by the user, may determine the nature of the clusters formed. Further, any clustering algorithm may be to perform the splitting, such as, for example, the k-means algorithm, which can provide low computational complexity and a small memory footprint.
Referring now to FIG. 9, an example process for merging the input clusters is provided. In this regard, at 400, a plurality of input clusters may be received, as provided, for example, from the example process described with respect to FIG. 7. The clusters may include a first cluster and at least a second cluster.
At 410, medoids for each cluster, including the first and second cluster, are determined. Using the medoids, inter-cluster distances may be determined at 420. To determine the inter-cluster distance, the identity of the feature that is most distant from the medoid is identified. According to some example embodiments, to identify the feature that is most distant, the Euclidean distance between features may be used as the distance metric. Further, to determine the distance between a cluster pair (A,B), the distance D(A,B) may be calculated as the subspaces corresponding to (Amedoid, Amaximum) and (Bmedoid, Bmaximum), and the distance between the subspaces may calculated as described above. The subscripts correspond to the medoid and maximum distance element of the cluster from the medoid. Alternatively, if either A or B includes only one feature, then the distance between the clusters may be the distance between the medoids of the pair of clusters being considered, which may be compared to the threshold distance to facilitate determining whether the cluster should be merged or not.
At 430, the clusters, such as the first cluster and the second cluster, may be merged to generate a merged cluster in an instance in which the inter-cluster distance is less than the distance threshold. In this regard, the distances between clusters may be calculated as described above, and all pairs of clusters that have a distance less than the threshold distance may be merged. In some example embodiments, the clusters are merged only if the clusters to be merged form a clique, where a clique may be a set where all pairs of members have a distance less than the distance threshold. As indicated by the arrow from operation 430 to 400, the merging may be performed repeatedly until the number of clusters between successive iterations does not change and then the process is exited.
Additionally, other merge operations may be performed after the iterative merging is complete. In this regard, for all individual features which are still not part of any clusters (also know as singletons), a distance metric between the singleton and the existing clusters may be determined. To calculate this distance metric, if A is a singleton and B is a cluster with more than one member, n features out of B (in any order) may be selected. The features (Bl s B2,... Bn) may then be orthogonalized, for example, using Gram-Schmidt normalization to form the subspace (hi, b2, ...bn). Then, D(A,B) may be determined as D(A,B) = 1 - ^^ (a - b- )2 , where a is the normalized form of A. Therefore, each singleton may be merged to the cluster to which the singleton is the nearest according to the metric defined above, provided the distance is below a threshold distance.
Additionally, after the previous operations, instances of faces belonging to the same person appearing in multiple clusters may still exist. To solve this Over- clustering' issue, existing clusters may be merged using a distance metric between non- singleton clusters (clusters with more than one element). In this regard, for the cluster pair (A,B), a maximum of m features out of A and B (in any order) may be selected, where the number of elements selected may be ¾ and n2. The features (Al sA2... Anl) may be orthogonalized using, for example, Gram-Schmidt normalization to form the subspace (<¾<¾ a„i). Similarly the subspace corresponding to B, (bi,b2 bn2) may be
constructed. Further where m = m(ni, ni), the distance D(A,B) may be calculated as
D(A,B) = 1 - ■ Using this distance, cluster pairs that fall below a
Figure imgf000026_0001
predefined merging threshold (e.g., the threshold distance) may be combined together.
The description provided above and generally herein illustrates example methods, example apparatuses, and example computer program products for clustering facial images. However, it is contemplated that some of the example embodiments may be utilize for group any type of image in to various different types of categories. Other types of images that may be used include images of building or landmarks, images of animals, medical images, landscape images, or the like. According to some example embodiments, the threshold distance may be set based upon what type of images are being grouped. In some cases, the threshold may be set once for each set of images.
FIG. 10 is a flowchart of a system, method and program product according to example embodiments of the invention. It will be understood that each block of the flowchart, and combinations of blocks in the flowchart, may be implemented by various means, such as hardware, firmware, processor, circuitry and/or other device associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions. In this regard, the computer program instructions which embody the procedures described above may be stored by a memory device of an apparatus employing an embodiment of the present invention and executed by a processor in the apparatus. As will be appreciated, any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the resulting computer or other programmable apparatus embody a mechanism for implementing the functions specified in the flowchart block(s). These computer program instructions may also be stored in a computer-readable storage memory (as opposed to a transmission medium such as a carrier wave or electromagnetic signal) that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture the execution of which implements the function specified in the flowchart block(s). The computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart block(s). As such, the operations of FIG. 10, when executed, convert a computer or processing circuitry into a particular machine configured to perform an example embodiment of the present invention. Accordingly, the operations of FIG. 10 define an algorithm for configuring a computer or processing circuitry (e.g., processor 70) to perform an example embodiment. In some cases, a general purpose computer may be provided with an instance of the clustering manager 80, which performs the algorithm shown in FIG. 10 (e.g., via configuration of the processor 70), to transform the general purpose computer into a particular machine configured to perform an example embodiment.
Accordingly, blocks of the flowchart support combinations of means for performing the specified functions, combinations of operations for performing the specified functions and program instructions for performing the specified functions. It will also be understood that one or more blocks of the flowchart, and combinations of blocks in the flowchart, can be implemented by special purpose hardware-based computer systems which perform the specified functions or operations, or combinations of special purpose hardware and computer instructions.
In this regard, one embodiment of a method according to an example embodiment as shown in FIG. 10 may include, subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons at operation 510. The method may further include causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons at operation 520 and causing merging of the first set of clusters with the second set of clusters at operation 530.
In some embodiments, certain ones of the operations above may be modified or further amplified as described below. Moreover, in some embodiments additional optional operations may also be included (an example of which is shown in dashed lines in FIG. 10). It should be appreciated that each of the modifications, optional additions or amplifications below may be included with the operations above either alone or in combination with any others among the features described herein. In an example embodiment, all of the operations 510 to 530 may be repeated each time an additional set of images with faces is received so that incremental clustering of the faces in the digital images may be performed without requiring re-processing (e.g., distance calculations) for those faces that have already been previously processed. The additional set of images may include any number from one to a plurality of images that are to be incrementally added to an existing library, gallery, collection or set of images that have already been clustered. In some embodiments, the method may further include an initial operation of causing performance of clustering with respect to faces in the first set of digital images to define the first set of clusters in which each cluster includes multiple images in which facial features are within a threshold distance from each other, and to define the first set of singletons in which each singleton does not have another image in which facial features are within the threshold distance from each other at operation 500. In an example
embodiment, causing merging of the first set of clusters with the second set of clusters may further include utilizing a hierarchical algorithm to generate a third set of clusters. In some embodiments, causing clustering of the first set of singletons with the second set of singletons further generates a third set of singletons based on singletons from the first set of singletons that were not within a threshold distance of any singletons in the second set of singletons. In another example case, causing merging of the faces in the second set of images into corresponding clusters among the first set of clusters is performed in response to receipt of a second set of images.
In an example embodiment, an apparatus for performing the method of FIG. 10 above may comprise a processor (e.g., the processor 70) configured to perform some or each of the operations (500-530) described above. The processor 70 may, for example, be configured to perform the operations (500-530) by performing hardware implemented logical functions, executing stored instructions, or executing algorithms for performing each of the operations. Alternatively, the apparatus may comprise means for performing each of the operations described above. In this regard, according to an example embodiment, examples of means for performing operations 500-530 may comprise, for example, the clustering manager 80. Additionally or alternatively, at least by virtue of the fact that the processor 70 may be configured to control or even be embodied as the clustering manager 80, the processor 70 and/or a device or circuitry for executing instructions or executing an algorithm for processing information as described above may also form example means for performing operations 500-530.
An example of an apparatus according to an example embodiment may include at least one processor and at least one memory including computer program code. The at least one memory and the computer program code may be configured to, with the at least one processor, cause the apparatus to perform the operations 500-530 (with or without the modifications and amplifications described above in any combination).
An example of a computer program product according to an example embodiment may include at least one computer-readable storage medium having computer-executable program code portions stored therein. The computer-executable program code portions may include program code instructions for performing operation 500-530 (with or without the modifications and amplifications described above in any combination).
In some cases, the operations (500-530) described above, along with any of the modifications may be implemented in a method that involves facilitating access to at least one interface to allow access to at least one service via at least one network. In such cases, the at least one service may be said to perform at least operations 500-530.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

We Claim:
1. A method comprising:
subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons;
causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons; and
causing merging of the first set of clusters with the second set of clusters.
2. The method of claim 1, wherein causing merging of the first set of clusters with the second set of clusters further comprises utilizing a hierarchical algorithm to generate a third set of clusters.
3. The method of claim 2, wherein causing clustering of the first set of singletons with the second set of singletons further generates a third set of singletons based on singletons from the first set of singletons that were not within a threshold distance of any singletons in the second set of singletons.
4. The method of claim 3, further comprising:
in response to receipt of a third set of images, causing merging of faces in the third set of images into corresponding clusters among the third set of clusters to modify the third set of clusters and generate a fourth set of singletons;
causing clustering of the third set of singletons with the fourth set of singletons to define a fourth set of clusters without recalculating distances associated with the third set of singletons; and
causing merging of the third set of clusters with the fourth set of clusters.
5. The method of any of claims 1 to 4, further comprising an initial operation of causing performance of clustering with respect to faces in the first set of digital images to define the first set of clusters in which each cluster includes multiple images in which facial features are within a threshold distance from each other, and the first set of singletons in which each singleton does not have another image in which facial features are within the threshold distance from each other.
6. The method of any of claims 1 to 5, wherein causing merging of the faces in the second set of images into corresponding clusters among the first set of clusters is performed in response to receipt of a second set of images.
7. An apparatus comprising at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured to, with the at least one processor, cause the apparatus to at least:
subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, cause merging of faces in a second set of images into
corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons;
cause clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons; and
cause merging of the first set of clusters with the second set of clusters.
8. The apparatus of claim 7, wherein the at least one memory and computer program code are configured to, with the at least one processor, cause the apparatus to cause merging of the first set of clusters with the second set of clusters further by utilizing a hierarchical algorithm to generate a third set of clusters.
9. The apparatus of claim 8, wherein the at least one memory and computer program code are configured to, with the at least one processor, cause the apparatus to cause clustering of the first set of singletons with the second set of singletons further generates a third set of singletons based on singletons from the first set of singletons that were not within a threshold distance of any singletons in the second set of singletons.
10. The apparatus of claim 9, wherein the at least one memory and computer program code are further configured to, with the at least one processor, cause the apparatus to:
in response to receipt of a third set of images, cause merging of faces in the third set of images into corresponding clusters among the third set of clusters to modify the third set of clusters and generate a fourth set of singletons;
cause clustering of the third set of singletons with the fourth set of singletons to define a fourth set of clusters without recalculating distances associated with the third set of singletons; and
cause merging of the third set of clusters with the fourth set of clusters.
11. The apparatus of any of claims 7 to 10, wherein the at least one memory and computer program code are further configured to, with the at least one processor, cause the apparatus to perform an initial operation of causing performance of clustering with respect to faces in the first set of digital images to define the first set of clusters in which each cluster includes multiple images in which facial features are within a threshold distance from each other, and the first set of singletons in which each singleton does not have another image in which facial features are within the threshold distance from each other.
12. The apparatus of any of claims 7 to 11, wherein the at least one memory and computer program code are configured to, with the at least one processor, cause the apparatus to cause merging of the faces in the second set of images into corresponding clusters among the first set of clusters in response to receipt of a second set of images.
13. A computer program product comprising a computer-readable medium bearing computer program code embodied therein for use with a computer, the computer program code comprising code for:
subsequent to generation of a first set of clusters and a first set of singletons from a first set of digital images, causing merging of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons; causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons; and
causing merging of the first set of clusters with the second set of clusters.
14. The computer program product of claim 13, wherein code for causing merging of the first set of clusters with the second set of clusters further comprises code for utilizing a hierarchical algorithm to generate a third set of clusters.
15. The computer program product of claim 14, wherein code for causing clustering of the first set of singletons with the second set of singletons further generates a third set of singletons based on singletons from the first set of singletons that were not within a threshold distance of any singletons in the second set of singletons.
16. The computer program product of claim 15, further comprising code for: in response to receipt of a third set of images, causing merging of faces in the third set of images into corresponding clusters among the third set of clusters to modify the third set of clusters and generate a fourth set of singletons;
causing clustering of the third set of singletons with the fourth set of singletons to define a fourth set of clusters without recalculating distances associated with the third set of singletons; and
causing merging of the third set of clusters with the fourth set of clusters.
17. The computer program product of any of claims 13 to 16, further comprising code for an initial operation of causing performance of clustering with respect to faces in the first set of digital images to define the first set of clusters in which each cluster includes multiple images in which facial features are within a threshold distance from each other, and the first set of singletons in which each singleton does not have another image in which facial features are within the threshold distance from each other.
18. The computer program product of any of claims 13 to 17, wherein code for causing merging of the faces in the second set of images into corresponding clusters among the first set of clusters is executed in response to receipt of a second set of images.
19. An apparatus comprising:
means for causing merging, subsequent to generation of a first set of clusters and a first set of singletons, of faces in a second set of images into corresponding clusters among the first set of clusters to modify the first set of clusters and generate a second set of singletons;
means for causing clustering of the first set of singletons with the second set of singletons to define a second set of clusters without recalculating distances associated with the first set of singletons; and
means for causing merging of the first set of clusters with the second set of clusters.
20. The apparatus of claim 19, wherein means for causing merging of the first set of clusters with the second set of clusters further comprises means for utilizing a hierarchical algorithm to generate a third set of clusters.
21. The apparatus of claim 20, wherein means for causing clustering of the first set of singletons with the second set of singletons further generates a third set of singletons based on singletons from the first set of singletons that were not within a threshold distance of any singletons in the second set of singletons.
22. The apparatus of claim 21, further comprising:
means for causing merging, in response to receipt of a third set of images, of faces in the third set of images into corresponding clusters among the third set of clusters to modify the third set of clusters and generate a fourth set of singletons;
means for causing clustering of the third set of singletons with the fourth set of singletons to define a fourth set of clusters without recalculating distances associated with the third set of singletons; and
means for causing merging of the third set of clusters with the fourth set of clusters.
23. The apparatus of any of claims 19 to 22, further comprising means for causing performance of clustering with respect to faces in the first set of digital images to define the first set of clusters in which each cluster includes multiple images in which facial features are within a threshold distance from each other, and the first set of singletons in which each singleton does not have another image in which facial features are within the threshold distance from each other.
24. The apparatus of any of claims 19 to 23, wherein means for causing merging of the faces in the second set of images into corresponding clusters among the first set of clusters operate to cause the merging in response to receipt of a second set of images.
PCT/FI2012/050133 2011-04-15 2012-02-13 Method, apparatus and computer program product for providing incremental clustering of faces in digital images WO2012140315A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN1310CH2011 2011-04-15
IN1310/CHE/2011 2011-04-15

Publications (1)

Publication Number Publication Date
WO2012140315A1 true WO2012140315A1 (en) 2012-10-18

Family

ID=47008878

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2012/050133 WO2012140315A1 (en) 2011-04-15 2012-02-13 Method, apparatus and computer program product for providing incremental clustering of faces in digital images

Country Status (1)

Country Link
WO (1) WO2012140315A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778146A (en) * 2012-10-23 2014-05-07 富士通株式会社 Image clustering device and method
US20170185866A1 (en) * 2015-12-29 2017-06-29 Bar-Ilan University Method and system for dynamic updating of classifier parameters based on dynamic buffers
US20200193201A1 (en) * 2018-12-14 2020-06-18 Giga-Byte Technology Co., Ltd. Method, device and non-transitory computer readable medium of facial recognition
CN112257801A (en) * 2020-10-30 2021-01-22 浙江商汤科技开发有限公司 Incremental clustering method and device for images, electronic equipment and storage medium
CN112949710A (en) * 2021-02-26 2021-06-11 北京百度网讯科技有限公司 Image clustering method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5642431A (en) * 1995-06-07 1997-06-24 Massachusetts Institute Of Technology Network-based system and method for detection of faces and the like
US20030210808A1 (en) * 2002-05-10 2003-11-13 Eastman Kodak Company Method and apparatus for organizing and retrieving images containing human faces
US20080256130A1 (en) * 2007-02-22 2008-10-16 Colorado State University Research Foundation Nonlinear set to set pattern recognition
US20090028393A1 (en) * 2007-07-24 2009-01-29 Samsung Electronics Co., Ltd. System and method of saving digital content classified by person-based clustering
US20100014721A1 (en) * 2004-01-22 2010-01-21 Fotonation Ireland Limited Classification System for Consumer Digital Images using Automatic Workflow and Face Detection and Recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5642431A (en) * 1995-06-07 1997-06-24 Massachusetts Institute Of Technology Network-based system and method for detection of faces and the like
US20030210808A1 (en) * 2002-05-10 2003-11-13 Eastman Kodak Company Method and apparatus for organizing and retrieving images containing human faces
US20100014721A1 (en) * 2004-01-22 2010-01-21 Fotonation Ireland Limited Classification System for Consumer Digital Images using Automatic Workflow and Face Detection and Recognition
US20080256130A1 (en) * 2007-02-22 2008-10-16 Colorado State University Research Foundation Nonlinear set to set pattern recognition
US20090028393A1 (en) * 2007-07-24 2009-01-29 Samsung Electronics Co., Ltd. System and method of saving digital content classified by person-based clustering

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103778146A (en) * 2012-10-23 2014-05-07 富士通株式会社 Image clustering device and method
US20170185866A1 (en) * 2015-12-29 2017-06-29 Bar-Ilan University Method and system for dynamic updating of classifier parameters based on dynamic buffers
US10268923B2 (en) * 2015-12-29 2019-04-23 Bar-Ilan University Method and system for dynamic updating of classifier parameters based on dynamic buffers
US20200193201A1 (en) * 2018-12-14 2020-06-18 Giga-Byte Technology Co., Ltd. Method, device and non-transitory computer readable medium of facial recognition
US11651622B2 (en) * 2018-12-14 2023-05-16 Giga-Byte Technology Co., Ltd. Method, device and non-transitory computer readable medium of facial recognition
CN112257801A (en) * 2020-10-30 2021-01-22 浙江商汤科技开发有限公司 Incremental clustering method and device for images, electronic equipment and storage medium
CN112257801B (en) * 2020-10-30 2022-04-29 浙江商汤科技开发有限公司 Incremental clustering method and device for images, electronic equipment and storage medium
CN112949710A (en) * 2021-02-26 2021-06-11 北京百度网讯科技有限公司 Image clustering method and device
US11804069B2 (en) 2021-02-26 2023-10-31 Beijing Baidu Netcom Science And Technology Co., Ltd. Image clustering method and apparatus, and storage medium

Similar Documents

Publication Publication Date Title
EP3036901B1 (en) Method, apparatus and computer program product for object detection and segmentation
WO2020164270A1 (en) Deep-learning-based pedestrian detection method, system and apparatus, and storage medium
KR102045978B1 (en) Facial authentication method, device and computer storage
EP2916291B1 (en) Method, apparatus and computer program product for disparity map estimation of stereo images
CN111814620B (en) Face image quality evaluation model establishment method, optimization method, medium and device
CN105917359B (en) Mobile video search
CN103399896B (en) The method and system of incidence relation between identification user
WO2022105125A1 (en) Image segmentation method and apparatus, computer device, and storage medium
CN108875487B (en) Training of pedestrian re-recognition network and pedestrian re-recognition based on training
US11250292B2 (en) Method and apparatus for generating information
EP2659400A1 (en) Method, apparatus, and computer program product for image clustering
CN105814582B (en) Method and system for recognizing human face
US20130279763A1 (en) Method and apparatus for providing a mechanism for gesture recognition
US11704893B2 (en) Segment action detection
EP2955694A2 (en) Method, apparatus and computer program product for image processing
CN104915684B (en) A kind of image-recognizing method and device based on the more plane SVMs of robust
WO2012140315A1 (en) Method, apparatus and computer program product for providing incremental clustering of faces in digital images
CN113704531A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN108268510B (en) Image annotation method and device
CN111507285A (en) Face attribute recognition method and device, computer equipment and storage medium
WO2012135979A1 (en) Method, apparatus and computer program product for providing multi-view face alignment
CN109564636B (en) Training one neural network using another neural network
JP2022549661A (en) IMAGE PROCESSING METHOD, APPARATUS, DEVICE, STORAGE MEDIUM AND COMPUTER PROGRAM
US9489741B2 (en) Method, apparatus and computer program product for disparity estimation of foreground objects in images
US20140314273A1 (en) Method, Apparatus and Computer Program Product for Object Detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12771098

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12771098

Country of ref document: EP

Kind code of ref document: A1