AU2011265494A1

AU2011265494A1 - Kernalized contextual feature

Info

Publication number: AU2011265494A1
Application number: AU2011265494A
Authority: AU
Inventors: Veena Murthy Srinivasa Dodballapur; Getian Ye
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2011-12-22
Filing date: 2011-12-22
Publication date: 2013-07-11

Abstract

Abstract KERNALIZED CONTEXTUAL FEATURE Disclosed is a method (200) of matching a first image and a second image comprising: (a) 5 segmenting (502) the first image into segments and the second image into segments ; (b) determining (503) a self-similarity matrix encoding similarity between the first segments, and a self-similarity matrix encoding similarity between the second segments; (c) generating (504) a reduced dimension representation of the first matrix, and a reduced dimension representation of the second matrix; (d) determining (505) a weighted feature correlation 10 pattern of the first reduced dimension representation, and a weighted feature correlation pattern of the second reduced dimension representation; (e) forming (206) a first image vector from the first weighted feature correlation pattern, and a second image vector from the second weighted feature correlation pattern; and (f) matching (206) the first image and the second image by determining a distance between the first image vector and the second image vector. P019408_specilodged /221211 201 Start ) 202 Set the retrieved object image as a query image uery images empt . En 3 No Retrieve job images Generate a Toaogetoo weighted feature correlation pattern s the count of jo Qu'ery images zero? image 303 No Compute the similarity matching score between the weighted feature correlation patterns of the job images and 206 query image Identify the job image with the highest similarity matching 207 Yes 214 score - - -208 215 ighest similarity scor 209Noless than a threshold? Yes Asig tac ienifer 216 Assign new track Assin tack denifie ~~identifier to query of the job image to Iimage query image IF 211 \N 1 Store query image PO19408 lodgedspeci / 5855381_1 Fig 222

Description

S&F Ref: P019408 AUSTRALIA PATENTS ACT 1990 COMPLETE SPECIFICATION FOR A STANDARD PATENT Name and Address Canon Kabushiki Kaisha, of 30-2, Shimomaruko 3 of Applicant: chome, Ohta-ku, Tokyo, 146, Japan Actual Inventor(s): Getian Ye Veena Murthy Srinivasa Dodballapur Address for Service: Spruson & Ferguson St Martins Tower Level 35 31 Market Street Sydney NSW 2000 (CCN 3710000177) Invention Title: Kernalized contextual feature The following statement is a full description of this invention, including the best method of performing it known to me/us: 5845c(5856715_1) -1 KERNALIZED CONTEXTUAL FEATURE TECHNICAL FIELD OF INVENTION The current invention relates to video analytics and in particular to a method of identifying objects in object tracking. BACKGROUND 5 Surveillance cameras, such as Pan-Tilt-Zoom (PTZ) network video cameras, are being deployed in increasing numbers. The cameras capture more data (video content) than human viewers can easily process. Automatic analysis of video content is therefore needed for surveillance applications. An important aspect of automatic analysis of video content is object tracking that is able to (a) associate objects in video frames recorded from one camera, or (b) 10 from multiple cameras. The association can be especially difficult when (i) the video frame rate is low, or (ii) the tracked object changes pose (the pose of the object is a combination of position and orientation of the object) over time, or (iii) a plurality of objects are moving in the scene with occlusions occurring among the objects. Other situations may increase the difficulties in object tracking. This may occur, for 15 example, if the object to be tracked disappears from the scene for a long time and re-appears in the scene with a different pose. This may also occur, for example, if the object is required to be tracked across multiple cameras located at different locations with different view angles and under different lighting conditions. Typically object tracking algorithms require an object detection mechanism. The objects 20 detected by object detection have no consistent identity from frame to frame unless object tracking is used to resolve their identities over time. One common approach to object detection is to use information in a single frame. For example object detection can be achieved by building a binary classifier based on a machine learning algorithm. The binary classifier is usually obtained by performing a machine learning 25 algorithm, e.g., AdaBoost, on a set of training images of an object. Then the learned binary classifier is used to classify the (usually square or rectangular) portions of an image at all locations as being either the object in question, or background. The binary classifier usually outputs a rectangular bounding box indicating the size and the location of the object in the image. One of the major components in object tracking is the object representation, which is 30 usually based on visual features of an object. The desirable property of a visual feature is its uniqueness so that the objects can be easily distinguished in the feature space. For example, colour and texture are widely used as visual features for histogram-based object representation. The object is represented by a single histogram. Object edges are usually used PO 9408_specilodged / 221211 -2 as visual features for contour-based object representation. Feature descriptors of interest points or regions extracted from an object image (i.e., an image segment or an image region extracted from an image by an object detection method) are used as visual features for part based object representation. An example of a feature descriptor is a histogram of orientated 5 gradients. In general, many object tracking algorithms use a combination of visual features. In order to associate the objects in video frames recorded from one camera or multiple cameras, object identification is performed to find the association between the objects across video frames. The association between objects is determined based on the similarity between visual features of objects. 10 One object identification technique is to perform shape matching between the contours of the objects. The contour of an object can be represented by a plurality of shape descriptors, e.g., shape context descriptors. Each contour point has a shape descriptor. The shape matching is done by finding the correspondences between the points of two contours based on the shape descriptors of contour points. Such shape matching is not reliable for matching objects with 15 complex non-rigid shapes, e.g., people, especially when the objects have changes in their poses. Another common object identification technique is to directly find the explicit correspondences between interest points or regions by computing the distances between all the feature descriptors of interest points or regions. The feature descriptor of an interest point 20 or a region can be a histogram of oriented gradients or a feature vector formed by considering the feature descriptors of neighbouring interest points or regions. Finding explicit correspondences between interest points or regions is usually computationally expensive. It is computationally expensive because the all the combinations of the features or regions have to be considered and compared. As the number of features or regions increases, the number of 25 comparisons also increases. In addition, interest point or region matching is not reliable when the objects have different poses. Another object identification technique is based on a learned common appearance model. One example of an appearance model is the bag-of-features based on a learned feature codebook. The feature codebook is usually comprised of a plurality of common feature 30 descriptors of interest points or regions obtained by performing a clustering algorithm, e.g., K-means algorithm, on a plurality of feature descriptors extracted from a plurality of training images containing the object. Another example of an appearance model is the common subspace representation. The common subspace representation is usually obtained by performing a dimension reduction algorithm, e.g., principal component analysis, on a plurality PO19408_specilodged / 221211 -3 of feature descriptors extracted from a plurality of training images containing the object. This technique does not require finding explicit correspondences between interest points or regions as the feature descriptors of all the interest points or regions are converted based on the learned common model so that the feature descriptors are compared. However, this technique 5 requires a machine learning step to find the common model for the feature descriptors of an object. The learned common model may not be effective when a new object enters into the scene as the learned common model does not take into account the features of the new object. In addition, the identification accuracy of the said technique is low when the objects have different poses and are under different lighting conditions across video frames. 10 SUMMARY It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements. Disclosed are arrangements, referred to as Correlated Reduced Self Similarity (CRSS) arrangements, which seek to address the above problems by determining an image match by 15 correlating features that are mapped to reduced dimensional spaces created from self similarity matrices derived from the images in question. According to a first aspect of the present invention, there is provided a method of matching a first image and a second image, said method comprising the steps of: (a) segmenting the first image into a first plurality of segments and the second image 20 into a second plurality of segments ; (b) determining a first self-similarity matrix of first metrics encoding respective similarity scores between members of the first plurality of segments, and a second self similarity matrix of second metrics encoding respective similarity scores between members of the second plurality of segments; 25 (c) generating a first reduced dimension representation of the first self-similarity matrix, and a second reduced dimension representation of the second self-similarity matrix; (d) determining a first weighted feature correlation pattern of the first reduced dimension representation, and a second weighted feature correlation pattern of the second reduced dimension representation; 30 (e) forming a first image vector from the first weighted feature correlation pattern, and a second image vector from the second weighted feature correlation pattern; and (f) matching the first image and the second image by determining a distance between the first image vector and the second image vector. PO 9408_specilodged / 221211 -4 According to another aspect of the present invention, there is provided an apparatus for implementing any one of the aforementioned methods. According to another aspect of the present invention, there is provided a computer program product including a computer readable medium having recorded thereon a computer program 5 for implementing any one of the methods described above. Other aspects of the invention are also disclosed. BRIEF DESCRIPTION OF THE DRAWINGS One or more embodiments of the invention will now be described with reference to the following drawings, in which: 10 Fig. 1 is a schematic block diagram of the hardware and software architecture of an object tracking system which uses the CRSS arrangement; Fig. 2 is a schematic flow diagram illustrating a method of object identification used in object identification module 109 in Fig. 1; Fig. 3 is a diagram showing the object image, the object being a human; 15 Fig. 4(a) and Fig. 4(b) are the diagrams showing embodiments of segmentations of an object image; Fig. 5 is a flowchart describing the method of generating a weighted feature correlation pattern which is used by the object identification module 109 in Fig. 1; and Figs. 6A and 6B form a schematic block diagram of a general purpose computer system 20 upon which CRSS arrangements described can be practiced. DETAILED DESCRIPTION INCLUDING BEST MODE Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary 25 intention appears. It is to be noted that the discussions contained in the "Background" section and the section above relating to prior art arrangements relate to discussions of documents or devices which may form public knowledge through their respective publication and/or use. Such discussions should not be interpreted as a representation by the present inventor(s) or the 30 patent applicant that such documents or devices in any way form part of the common general knowledge in the art. The CRSS arrangement described is a system for deriving, from a captured video stream, several object images, each containing one or more objects such as human, car, and bicycle. PO19408_specilodged / 221211 -5 The purpose of the CRSS arrangement is to identify and match the objects temporally and track them over a period of time in video frames. This CRSS arrangement helps to track objects that re-appear in the scene, where the time period between the first appearance of the object and the next appearance is very large say a 5 few hours or days. This CRSS arrangement can also be applied to identifying an object or objects that appear in difference scenes captured by different cameras. In addition, the CRSS arrangement helps to identify objects in a recorded video. For example, object identification can be used to check if a suspect was present at the scene of shoplifting, at a particular time, from a recorded video. 10 Fig. l is a block diagram of a CRSS arrangement example. In the system illustrated, a network video camera 101 captures a stream of video frames and transmits the stream 112, as depicted by an arrow 102, to an Object detection module 103. The object detection module 103 detects objects in the video stream 112. In one CRSS arrangement the object detection module 103 uses foreground/background separation to detect 15 objects in a frame in the stream 112. A reference frame, which is robust to the scene changes, is established as a background model and the incoming frame is subtracted from the reference frame. A significant change in the incoming frame from the background model indicates the presence of detected objects. The background model can be built using a mixture of Gaussian or median filtering algorithms. The background model is dynamically updated with 20 information from the incoming frame. The outputs of the object detection module 103 are the object images that are image segments or regions containing the objects. In another CRSS arrangement, a binary classifier is constructed using a machine learning algorithm, e.g., AdaBoost (a machine learning algorithm also referred to as "Adaptive Boost") or a Support Vector Machines (SVM). The training process for the classifier is performed 25 using a plurality of feature descriptors extracted from a plurality of training images. The binary classifier is then used to detect objects in the incoming frame using a sliding window technique. Once the objects are detected by the above techniques, the objects may be classified. Classification is necessary if the tracking of only certain objects is required, e.g., only human 30 tracking. In one CRSS arrangement, a database of known objects with their meta-data such as shapes, colours, dimensions, contours is maintained. To classify objects, the contour, shape, colour are matched against the object meta-data present in the database. The detected object may be any object such as human, vehicles, groups of people. P019408 speci lodged / 221211 -6 A two dimensional region surrounding the object is determined by the object detection module 103. The region can be elliptical, or rectangular, or irregularly shaped following the contour of the object. Fig. 3 depicts one example of the object region, showing a rectangular region 303 defined 5 by a bounding box 302 with a width X and height Y surrounding a human 301 in the object image 303. Returning to Fig. 1, the object image 303 is obtained from the object detection module 103 and the object image 303 is then transmitted, as depicted by an arrow 104, to an object tracking module 105. The object tracking module 105 assigns identifiers to the objects and 10 helps to associate the objects across video frames. The object tracking module 105 consists of an input-output module 1 10 and an object identification module 109. The input-output module I 10 obtains the input object images 303 from the object detection module 103. The object images 303 are then sent to an object identification module 109. The object identification module 109 compares the object image 303 received from the input-output module 1 10 15 against images 113 that are stored in a database Image-DB 107. If the object images 303 are new, they are assigned unique track identifiers. Otherwise they are assigned the same track identifiers as the object images matched from the database. In one CRSS arrangement, the database Image-DB 107 may be implemented using a relation database system. In another CRSS arrangement, the database Image-DB 107 may be stored in random access memory 20 (such as 606 in Fig. 6A) of a computer (such as 600), or may be stored in a remote database 682 that is accessible over a network 620 and/or 622. The database Image-DB 107 is constructed, in one CRSS example, in the form of an image table Il l which has a column 115 named "Track-Id" which is a numeric value and is the key of the image table 111. The "Track-ld" is a unique identifier for each object image in the table. The other columns are 25 "Identifier" (i.e. 118), "Image" (i.e. 114), "Correlation" (i.e. 117) and "Creation" (i.e. 116). The column 114 entitled "Image" is used to store the object image, the column 117 "Correlation" is used to store the weighted feature correlation pattern of the image and the column 116 "Creation" is the date when the record was inserted into the database I 11. "Identifier" is the primary key of the image table 111. Entries 119 in the image table Il l and 30 associated column headings 118 will be referred to interchangeably in this description, unless otherwise stated. If the object identification module 109 does not find any close match among the images 113 in the database Image-DB 107 for the input object image 303, then the object image 303 is inserted into the image table II lof the database Image-DB 107 and is assigned a new track PO1 9408_speci lodged / 221211 -7 identifier which is inserted in the field "Track-ld". The "Creation" is set to the current timestamp. The newly assigned track identifier is sent to the input-output module 110. The object image and the track identifier are sent to a display unit 106. The display unit 106 displays the video stream 112, with track identifiers being displayed for objects. In another 5 CRSS arrangement, the user may select, from the display unit, an object to track in the video stream 112. The object selected by the user may be sent to the input-output module 110, which in turn may be sent to the object identification module 109. Figs. 6A and 6B depict a general-purpose computer system 600, upon which the various arrangements described can be practiced. 10 As seen in Fig. 6A, the computer system 600 includes: a computer module 601; input devices such as a keyboard 602, a mouse pointer device 603, a scanner 626, a camera 627, and a microphone 680; and output devices including a printer 615, a display device 614 and loudspeakers 617. An external Modulator-Demodulator (Modem) transceiver device 616 may be used by the computer module 601 for communicating to and from a communications 15 network 620 via a connection 621. The communications network 620 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 621 is a telephone line, the modem 616 may be a traditional "dial-up" modem. Alternatively, where the connection 621 is a high capacity (e.g., cable) connection, the modem 616 may be a broadband modem. A wireless modem may also be 20 used for wireless connection to the communications network 620. The computer module 601 typically includes at least one processor unit 605, and a memory unit 606. For example, the memory unit 606 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 601 also includes an number of input/output (I/O) interfaces including: an audio-video 25 interface 607 that couples to the video display 614, loudspeakers 617 and microphone 680; an I/O interface 613 that couples to the keyboard 602, mouse 603, scanner 626, camera 627, special function module 684, and optionally a joystick or other human interface device (not illustrated); and an interface 608 for the external modem 616 and printer 615. In some implementations, the modem 616 may be incorporated within the computer module 601, for 30 example within the interface 608. The computer module 601 also has a local network interface 611, which permits coupling of the computer system 600 via a connection 623 to a local-area communications network 622, known as a Local Area Network (LAN). As illustrated in Fig. 6A, the local communications network 622 may also couple to the wide network 620 via a connection 624, which would typically include a so-called "firewall" P019408_speci_lodged / 221211 -8 device or device of similar functionality. The local network interface 611 may comprise an EthernetTM circuit card, a BluetoothTM wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 611. 5 The camera 101 may correspond to the PTZ camera 10 1 of Fig. 1. In an alternative arrangement, the computer module 801 is coupled to the camera 101 via the Wide Area Communications Network 820 and/or the Local Area Communications Network 822, as depicted by respective arrows 680, 681. The I/O interfaces 608 and 613 may afford either or both of serial and parallel 10 connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 609 are provided and typically include a hard disk drive (HDD) 610. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 612 is typically provided to act as a non-volatile source of data. 15 Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu-ray DiscTM), USB RAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 600. The components 605 to 613 of the computer module 601 typically communicate via an interconnected bus 604 and in a manner that results in a conventional mode of operation of 20 the computer system 600 known to those in the relevant art. For example, the processor 605 is coupled to the system bus 604 using a connection 618. Likewise, the memory 606 and optical disk drive 612 are coupled to the system bus 604 by connections 619. Examples of computers on which the described arrangements can be practised include IBM-PC's and compatibles, Sun Sparcstations, Apple MacTM or a like computer systems. 25 The CRSS method may be implemented using the computer system 600 wherein the processes of Figs. 2 to 5, described hereinafter, may be implemented as one or more software application programs 633 executable within the computer system 600. In particular, the steps of the CRSS method are effected by instructions 631 (see Fig. 6B) in the software 633 that are carried out within the computer system 600. The software instructions 631 may be formed as 30 one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the CRSS methods and a second part and the corresponding code modules manage a user interface between the first part and the user. PO 9408_specilodged / 221211 -9 The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 600 from the computer readable medium, and then executed by the computer system 600. A computer readable medium having such software or computer program recorded on the computer 5 readable medium is a computer program product. The use of the computer program product in the computer system 600 preferably effects an advantageous apparatus for performing the CRSS arrangements. The software 633 is typically stored in the HDD 610 or the memory 606. The software is loaded into the computer system 600 from a computer readable medium, and executed by the 10 computer system 600. Thus, for example, the software 633 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 625 that is read by the optical disk drive 612. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 600 preferably effects an apparatus for implementing the CRSS arrangements. 15 In some instances, the application programs 633 may be supplied to the user encoded on one or more CD-ROMs 625 and read via the corresponding drive 612, or alternatively may be read by the user from the networks 620 or 622. Still further, the software can also be loaded into the computer system 600 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded 20 instructions and/or data to the computer system 600 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Blu ray M Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 601. Examples of transitory or non 25 tangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 601 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like. 30 The second part of the application programs 633 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUls) to be rendered or otherwise represented upon the display 614. Through manipulation of typically the keyboard 602 and the mouse 603, a user of the computer system 600 and the application may manipulate the interface in a functionally adaptable manner to provide P019408.speci_lodged / 221211 -10 controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 617 and user voice commands input via the microphone 680. 5 Fig. 6B is a detailed schematic block diagram of the processor 605 and a "memory" 634. The memory 634 represents a logical aggregation of all the memory modules (including the HDD 609 and semiconductor memory 606) that can be accessed by the computer module 601 in Fig. 6A. When the computer module 601 is initially powered up, a power-on self-test (POST) 10 program 650 executes. The POST program 650 is typically stored in a ROM 649 of the semiconductor memory 606 of Fig. 6A. A hardware device such as the ROM 649 storing software is sometimes referred to as firmware. The POST program 650 examines hardware within the computer module 601 to ensure proper functioning and typically checks the processor 605, the memory 634 (609, 606), and a basic input-output systems software (BIOS) 15 module 651, also typically stored in the ROM 649, for correct operation. Once the POST program 650 has run successfully, the BIOS 651 activates the hard disk drive 610 of Fig. 6A. Activation of the hard disk drive 610 causes a bootstrap loader program 652 that is resident on the hard disk drive 610 to execute via the processor 605. This loads an operating system 653 into the RAM memory 606, upon which the operating system 653 commences operation. The 20 operating system 653 is a system level application, executable by the processor 605, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface. The operating system 653 manages the memory 634 (609, 606) to ensure that each process or application running on the computer module 601 has sufficient memory in which to 25 execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 600 of Fig. 6A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 634 is not intended to illustrate how particular segments of memory are allocated (unless otherwise stated), but rather to provide a general view of the memory accessible by the computer 30 system 600 and how such is used. As shown in Fig. 6B, the processor 605 includes a number of functional modules including a control unit 639, an arithmetic logic unit (ALU) 640, and a local or internal memory 648, sometimes called a cache memory. The cache memory 648 typically includes a number of storage registers 644 - 646 in a register section. One or more internal busses 641 functionally PO19408 speci lodged / 221211 -11 interconnect these functional modules. The processor 605 typically also has one or more interfaces 642 for communicating with external devices via the system bus 604, using a connection 618. The memory 634 is coupled to the bus 604 using a connection 619. The application program 633 includes a sequence of instructions 631 that may include 5 conditional branch and loop instructions. The program 633 may also include data 632 which is used in execution of the program 633. The instructions 631 and the data 632 are stored in memory locations 628, 629, 630 and 635, 636, 637, respectively. Depending upon the relative size of the instructions 631 and the memory locations 628-630, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the 10 memory location 630. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 628 and 629. In general, the processor 605 is given a set of instructions which are executed therein. The processor 1105 waits for a subsequent input, to which the processor 605 reacts to by 15 executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 602, 603, data received from an external source across one of the networks 620, 602, data retrieved from one of the storage devices 606, 609 or data retrieved from a storage medium 625 inserted into the corresponding reader 612, all depicted in Fig. 6A. The execution of a set of 20 the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 634. The disclosed CRSS arrangements use input variables 654, which are stored in the memory 634 in corresponding memory locations 655, 656, 657. The CRSS arrangements produce output variables 661, which are stored in the memory 634 in corresponding memory 25 locations 662, 663, 664. Intermediate variables 658 may be stored in memory locations 659, 660, 666 and 667. Referring to the processor 605 of Fig. 6B, the registers 644, 645, 646, the arithmetic logic unit (ALU) 640, and the control unit 639 work together to perform sequences of micro operations needed to perform "fetch, decode, and execute" cycles for every instruction in the 30 instruction set making up the program 633. Each fetch, decode, and execute cycle comprises: (a) a fetch operation, which fetches or reads an instruction 631 from a memory location 628, 629, 630; (b) a decode operation in which the control unit 639 determines which instruction has been fetched; and PO19408_specilodged / 221211 -12 (c) an execute operation in which the control unit 639 and/or the ALU 640 execute the instruction. Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 639 stores or 5 writes a value to a memory location 632. Each step or sub-process in the processes of Figs. 2 and 5, described hereinafter, is associated with one or more segments of the program 633 and is performed by the register section 644, 645, 647, the ALU 640, and the control unit 639 in the processor 605 working together to perform the fetch, decode, and execute cycles for every instruction in the 10 instruction set for the noted segments of the program 633. The CRSS arrangement may be implemented in software alone, or may operate in concert with the special function module 684 which may implement some or all of the CRSS functionality as depicted, for example, in Fig. 1. The CRSS method may alternatively be implemented in dedicated hardware, implemented 15 in the special function module 684 or in one or more other modules (not shown), such as one or more gate arrays and/or integrated circuits performing the CRSS functions or sub functions. Such dedicated hardware may also include graphic processors, digital signal processors, or one or more microprocessors and associated memories. If gate arrays are used, the process flow charts in Figs. 2 and 5 are converted to Hardware Description Language 20 (HDL) form. This HDL description is converted to a device level netlist which is used by a Place and Route (P&R) tool to produce a file which is downloaded to the gate array to program it with the design specified in the HDL description. The flow chart in Fig. 2 depicts an implementation example for the object identification module 109. Fig. 2 depicts the steps taken by the object identification module 109 to compare 25 the job images 113 and the query image 303. A Query Image (also referred to as a first image) is the image 303 that is compared against other object images 11 3 (also referred to as second images) present in the scene or the database 107. The Query image 303 is an object image that does not have a track identifier. Job images are the images 113 that the query image is compared against. Job images may be present, for example, in a location in the database 107 30 (or elsewhere) and have a track identifier 115 assigned to them. The object identification module 109 receives as inputs object images such as 303 that constitute the query, and job images such as 113. Fig. 2 shows an example of how the comparison of the job images 113 (i.e. second images) and the query image 303 (i.e. first image) can be performed. PO19408_specilodged / 221211 -13 Fig. 2 shows the steps that are taken by the object identification module 109 to compare job images with a query image. The process of object identification 200 is shown in Fig. 2 for tracking objects in the video stream 112. The object identification 109 module receives as input the object images such as 5 303 from the input-output module 110. The process 200 starts at a step 201. Control then passes to a retrieving step 202 that retrieves an object image such as 303 from a list (i.e., an ordered collection of object images received from the object detection module 103) of input object images that is received from the input-output module 110, the retrieval being performed using a GetNext_ObjectImageo 10 command. The retrieved object image 303 is referred to as the query image. The control then passes to a checking step 203 that determines if there is any query image 303 in the list. If there are no more object images in the list to process, the process 200 follows a YES arrow from the step 203 to an END step 212 and the process 200 ends. On the other hand, if the step 203 determines that the list of query images is not empty (i.e. 15 the list of query images has at least one query image), the process 200 follows a NO arrow from the step 203 to a step 204. The input to the step 204 is the query image 303 received from the retrieving step 202. In the generating step 204, a weighted feature correlation pattern for the query image 303 is generated and output. In one CRSS arrangement, the weighted feature correlation pattern is 20 dependent upon a lower dimensional space created from a self-similarity matrix, and this is described hereinafter in further detail in regard to Fig. 5. Control then passes from the generating step 204 to a retrieving step 213. In this retrieving step 213, a query is made to the database 107 to get a count of the total number of job images 13, which are the stored images to be compared to the query image 303. 25 Control then passes from the retrieving step 213 to a checking step 205. The input to this step is the total count of job images. If the total number of job images is zero (i.e., there is no job image stored in the database 107), then control passes from the step 205 as depicted by a YES arrow to an assigning step 210, which assigns a new track identifier 1 18 to the query image 303. 30 If however the count of job images is greater than zero, control passes from the step 205 via a NO arrow to a step 206 which determines a degree of similarity indicated by a similarity matching score 206 for the query image 303 and the job images 113. In this determining step 206 a query is made to the database 107 to retrieve the weighted feature correlation patterns 117 and track identifiers 118 of all the job images 113 in the database 107. The weighted PO19408_specilodged / 221211 -14 feature correlation patterns 117 of the job images 113 are compared to the weighted feature correlation pattern of the query image 303. The following describes the comparison between the weighted feature correlation pattern of the query image 303 and the weighted feature correlation pattern of a job image 113, 5 performed by the step 206, in more detail. The similarity matching score between the query image 303 and a job image 113 is determined based on the weighted feature correlation patterns I 17 for the query image 303 and a job image 113. A similarity matching score signifies how close the query image 303 is to a job image 113. 10 A vector, f, is formed by taking the elements of the upper triangular portion of the weighted feature correlation pattern. Thus for example, if the weighted feature correlation pattern is an n X n square matrix S, the vector f can be formed by the elements S1,1 to Sjn, S 2

,

2 to S 2 ,n, ... Sn-,n-I to Sn-l,n, Sn,n which form the upper triangular portion of the matrix S. The query image 303 is represented by a single vector fQ and job image 113 is represented 15 by a single vector fj. The matching score between query image and job image is computed as follows: R(fQ,fj) = -d(fQ,fj) (1) where d(fQ, fj) represents the Euclidean distance between two vectors fQ and f 1 . When there are a large number of job images 113 to be matched during querying, the 20 determination of similarity matching score between the query image 303 and every job image 1H 3 is not efficient. Accordingly, a number of methods, described hereinafter, may be used to reduce the number of matching operations. In one CRSS arrangement, the vectors f, created from all the job images can be registered in advance to an indexing structure such as a K-d tree (described hereinafter), a randomised 25 tree, or a lattice. The similarity matching scores between query image and all the job images are determined by querying the indexing structure using the vector fQ for query image. One typical example of indexing structures is a K-d tree, which is a space-partitioning data structure for organising high-dimensional vectors. The vectors created from the weighted feature correlation patterns of the job images 113 are job vectors and are stored in the K-d tree 30 in advance. Each node in the K-d tree corresponds to a cell in the space of job vectors. At the first level (root) of the tree, the input data containing all the job vectors is partitioned into two halves by a hyperplane orthogonal to a chosen dimension with a threshold value. The splitting threshold can be selected to be the mean or the median at the dimension with the maximum PO19408_specilodged / 221211 -15 variance in the data. Each of the two halves of the data is then recursively split in the same way to create a binary tree. At the bottom of the tree, each node of the tree corresponds to a single job vector in the input data. The leaf nodes may contain more than one vector in some implementations. 5 Given a query vector created from the weighted feature correlation pattern of the query image 303, the query process is to find the job vector that is nearest to a query vector from the K-d tree. The nearest job vector indicates the job image 113 with the highest similarity matching score. The query process uses a branch-and-bound technique for search. The tree is traversed to find the cell containing the query vector. The vector recovered from the cell is a 10 good approximation to the nearest neighbour of the query vector. The backtracking technique is then used so that the whole branches of the tree can be pruned if the region of space is further from the query vector than the distance from the currently found nearest neighbour to the query vector. The search terminates when all unexplored branches have been pruned. Other examples of indexing structures are randomised tree, lattice, etc. 15 The output of the determining step 206 is similarity matching scores between job images 113 and the query image 303, and the track identifiers 118 of the job images 113. Control of the process 200 then passes from the step 206 to an identifying step 207. The query image 303, the similarity matching scores between the job images 113 and the query image 303, and the track identifiers 118 of the job images 113 are input, as depicted by a 20 dashed arrow 214, to this identifying step 207. In this identifying step 207, the track identifier 118 of the job image 113 with the highest similarity matching score is identified. Control then passes to a decision step 208. The input to this decision step 208, as depicted by a dashed arrow 215, is the track identifier 118 of the job image 113 with the highest similarity matching score. This decision step 208 checks if the highest similarity score is less 25 than a pre-determined threshold, where the threshold is a number, say 4. If the similarity score is less than the threshold, then control follows a YES arrow from the step 208 and passes to a step 210. This signifies that the query image is not similar to the job image. In the assigning step 210 a new track identifier 1 8 is assigned to the query image 303 if the similarity matching score calculated in computing step 206 is less than a threshold. After 30 the track identifier is assigned in the assigning step 210, control passes to a step 211 that stores the query image 303 and the corresponding weighted feature correlation pattern 117, creation timestamp 116, track identifier 118 and the query image 303 (JPG or PNG or any image format) into the database 107. P019408_speci_lodged / 221211 -16 The control then passes to the step 202 and the object-tracking module 105 resident on the processor processes the next object image. Returning to the step 208, if in the decision step 208 the highest similarity matching score of the job image is higher or equal to the threshold, then the job image is considered to be 5 similar to the query image, and control passes, according to a NO arrow from the step 208 to an assigning step 209. The input to the assigning step 209, as depicted by a dashed arrow 216, is the query image 303 and the track identifier 118 of the job image 113. In the step 209 the processor 605 assigns the track identifier 118 of the job image 113 to the query image 303. The control of the process 200 then passes to the step 211. The step 209 may also output 10 information confirming the match and indicating that the job image is considered to be similar to the query image. In the storing step 211, the query image 303 is inserted into the database 107, with the track identifier 118 being the same as that of the job image 113. In addition, the weighted feature correlation pattern 117 and the current timestamp 116 of the query image 303 are also 15 inserted into the database 107 in the storing step 211. In another implementation of the CRSS arrangement, the weighted feature correlation pattern 117 of the job image 113 with track identifier 118 is retrieved. This weighted feature correlation pattern is appended to the one obtained from the generating step 204 and then updated into the database 107 in the storing step 211. 20 In yet another implementation of the CRSS arrangement, the weighted feature correlation pattern 117 of the query image 303 obtained from generating step 204 is updated in the database 107 for the job image 113 with the same track identifier 118 as the query image 303 during the storing step 211. In another implementation of the CRSS arrangement, the weighted feature correlation 25 pattern 117 of the query image 303 is registered into an indexing structure such as a K-d tree or a lattice in the storing step 211. As mentioned before, after the correlation pattern of the query image is stored in the database 107, the control passes to the step 202 to get the next object image. The section below describes an example of a process 204 of generating the weighted 30 feature correlation pattern as depicted in Fig. 5. Fig. 5 is an implementation example of the generation of the weighted feature correlation pattern for an input image. This is used by the weighted feature correlation process 204 for the query image 303. PO19408_speci_lodged / 221211 -17 Control is passed, by the weighted feature correlation process 204 to the feature extraction step 502. The input to this step 502 is an object image (JPG or PNG or any other image format). The input can be the query image 303 or ajob image 113. In this module 502, features are extracted from pixels in an image. In another implementation of the CRSS 5 arrangement, the processor 605 extracts the features from groups of pixels in segmented regions of the image. In other words, in the implementation of the CRSS arrangement, the image is segmented into multiple regions and the pixels within a region are chosen for feature extraction. The regions can be regular, which means that the regions are evenly sized as depicted in Fig. 4(a), where the image is segmented into evenly sized rectangular regions. On 10 the other hand, the processor can also segment the image into irregularly sized regions, which means the regions that are not of equal size as depicted in Fig. 4(b). The segmentation is based on at least one of the colour similarity within each region or the edges present in the images. In Figs. 4(a) and Fig. 4(b), a rectangular region 303 is defined by a bounding box 302 with 15 a width X and height Y surrounding a human 301 in the object image 303. Examples of the segmented regions are shown as 404, 405, 406, 407. "Superpixel" segmentation techniques can be used to segment an image into irregularly sized regions. One approach for superpixel segmentation is to cluster pixels based on the colour 20 similarity and proximity of pixels. The colour of pixels may be represented in the CIELAB colour space, which is widely considered as perceptually uniform for small colour distances. The visual features of each pixel are composed of three colour components and the pixel location. The clustering is performed in a 5-dimensional visual feature space based on a distance measure such that the expected cluster sizes and the spatial extent of each cluster are 25 approximately equal. There are many other super-pixel segmentation methods that are applicable for the segmentation of the images as part of the implementation of the CRSS arrangement. Another implementation of the super-pixel segmentation technique is the normalized cuts technique, which is a classical region segmentation algorithm. The normalised cuts method 30 uses spectral clustering to exploit pairwise brightness, colour and texture affinities between pixels. Fig. 4(b) represents an image segmented using a superpixel segmentation technique. Once the image is segmented into regions, the pixels of the segmented regions are chosen for feature extraction. In one implementation example of the CRSS arrangement, the centre PO19408_speci_lodged / 221211 -18 pixel of each segmented region is chosen. In another implementation, all the pixels in the segmented region are chosen to calculate an average property of the pixels. For example, the average colour of the segment is determined to represent the segment. Other feature descriptors of the pixel may be used, such as hue, saturation, intensity, red, green and blue, 5 the first-order and second-order luminance gradients, a histogram of oriented gradients, and local binary patterns. A feature vector for a pixel is constructed, which comprises a combination of the feature descriptors. When the centre pixel of a segmented region is chosen for feature extraction, the feature descriptor for the region is extracted from the centre pixel of the segmented region. A 10 feature vector is composed of the feature descriptor. Instead of choosing the centre pixel of a region for feature extraction, all the pixels in a segmented region can be chosen for feature extraction. An example is to select all the pixels in a segmented region and to take the average of the feature descriptors, say average of red values, green values and blue values. The average of feature descriptors of the pixels in a region forms the feature vector for a 15 segmented region. Alternatively, the histograms of the feature descriptors of all the pixels in a region can also be chosen. The histograms of the feature descriptors together form the feature vector for a segmented region. In another CRSS arrangement, a plurality of interest points can be extracted from the image. The feature descriptor for an interest point can be a histogram of oriented gradients 20 (e.g., edge orientation histograms) or a descriptor formed by considering the feature descriptors of the neighbouring interest points (e.g., shape context descriptor or local binary pattern). The feature descriptors of interest points form a set of feature vectors. Once the feature extraction step 502 has performed the feature extraction, control passes from the feature extraction step 502 to a determining step 503 which creates a self-similarity 25 matrix. The self-similarity matrix defines similarity scores between the feature vectors associated with each segment from the same image. In other words, the step 503 determines the similarity between members of the set of segments into which the image has been partitioned, ie determines the similarity between (i) a particular segment of an image, and (ii) every other segment in the same image. The inputs to this step, as depicted by a dashed arrow 30 506, are the feature vectors of the image in question, where the image can be the query image 303 or a job image 113 obtained from the step 202. A self-similarity matrix is created based on all the feature vectors obtained from the image. In one CRSS arrangement, each element of P019408_specilodged / 221211 -19 the self-similarity matrix describes the pair-wise similarity score between two feature vectors within the same image. An element of the self-similarity matrix, Ki 1 , is expressed as: K;j = k(xi,xj), ij E {1,2, ..., N) (2) where N is the total number of feature vectors in an image and k(xi, xj) denotes a similarity 5 metric used for computing the similarity score between the feature vectors xi and xj. In a preferred CRSS arrangement, the similarity metric is expressed in the following equation: -d 2 (XjV, k(xi,xj) = e ,2 (3) where a is the scaling factor and d(xi, xj) denotes the Euclidean distance between the 10 feature vectors xj and xj. The scaling factor a has a constant decimal value such as 0.1. In this arrangement, d(xi, xj) can be further expressed in the following equation: d(xi, x;) = FY.

1 E(xi() - x 2

(I))

2 (4) where L represents the dimension of a feature vector and xi (I) denote the 1-th feature value of the feature vectors xi. 15 In another CRSS arrangement, the similarity metric is expressed in the following equation: k(xi,x;) = (xix; + a)b (5) where a and b are the parameters and x denotes the transpose of the feature vector xi. In CRSS one arrangement, the values of the parameters a and b are set to 0.1 and I respectively, for example. 20 In another CRSS arrangement, the similarity metric is a weighted combination of a plurality of similarity metrics and is expressed in the following equation: k(xi, x;) = EL a.k,(xi, xj) (6) where N is the total number of similarity metrics, kn(xi,xj) is the n-th similarity metric, and an represents the weight for kn(xi, x;). 25 In another CRSS arrangement, the element of the self-similarity matrix Kij = k(xI,x,) is weighted based on the configuration of the feature vectors xi and xj from the image. For example, if the locations of the feature vectors xi and xj are close to the vertical central axis of the image, the element Ki is weighted more than the feature vectors that are further away from the vertical central axis of the image. The weight is calculated based on the averaged 30 distance from the vertical central axis of the image and is expressed in the following equation: P019408_specilodged / 221211 -20 d(xLo)+d(xjO) where 0 denotes the vertical central axis of the image and d(xi, 0) represents the distance from the location of the feature vector xi to the vertical central axis of the image. In another example, the image is decomposed into a plurality of parts. An image containing 5 a person may be decomposed into head, torso, arms, and legs based on a learned body configuration model of human. The similarity scores between the feature vectors from head and torso may be weighted more than the similarity scores between other feature vectors extracted from arms and legs. In other words, the similarity scores between the feature vectors are weighted based on the location of the feature vector in the image, wherein the feature 10 vectors in the centre of the image is given a higher weighting and the feature vectors in the periphery of the image is given a lower weighting. In another CRSS arrangement, each element of a self-similarity matrix describes the pair wise similarity score between two feature vectors with their neighbouring feature vectors taken into account. A feature descriptor is created for a feature vector by considering its 15 relationship with its neighbouring feature vectors in a local region. For example, a feature descriptor for a feature vector can be created by using local binary patterns. The local region is divided into a plurality of cells. For each cell, the distances between a feature vector and a plurality of its neighbouring feature vectors are computed. If the distance is larger than a predefined threshold, the feature value is set to 1. Otherwise, the feature value is set to 0. A 20 binary bit array is generated by considering if the distance is larger or smaller than the predefined threshold. The said binary bit array is converted to a decimal number. A feature descriptor is a histogram indicating the frequency of a decimal number over all the cells. Each element of a self-similarity matrix describes the pair-wise similarity score between two feature descriptors for two feature vectors. 25 A transformation matrix V is created and is expressed in the following equation: V = T(I N EN) (8) where N is the total number of feature vectors in an image, 'N represents an N-dimensional identity matrix, and EN denotes an N-dimensional vector with all the elements equal to 1. The matrix V is used to transform the self-similarity matrix, K, into a transformed self 30 similarity matrix Ky = VTKV. The output of creating step 503, as depicted by a dashed arrow 507, is a transformed self similarity matrix of the image in question. P019408_specilodged / 221211 -21 The process 500 is then directed to a determining step 504, which determines a lower dimensional space of the self-similarity matrix based on the transformed self-similarity matrix for the input image, which may be associated with the query image 303 or a job image 113. The lower dimensional space is a simplified representation of the feature space comprised of 5 feature vectors from the image. The input to this step, as depicted by a dashed arrow 508, is the transformed self-similarity matrix of the input image. In the CRSS arrangement example, a principal component analysis (PCA) is performed on the transformed self-similarity matrix for the input image. The PCA establishes a set of N orthonormal eigenvectors and N eigenvalues if the total number of feature vectors from the 10 same image is N. Then, in a preferred CRSS arrangement, a projection matrix is created by taking M eigenvectors of which the eigenvalue is non-zero. In another CRSS arrangement, a projection matrix can be created by taking M eigenvectors that correspond to the M largest eigenvalues. For example, M can be an integer, which is less than the rank of the matrix. The projection matrix represents a lower dimensional space that best describes the distribution of 15 feature descriptors within the same image. In other words, the selected M eigenvectors form the subspace that reveals the internal structure of the data in a way which best explains the variance in the feature descriptors. Each eigenvector used to create the projection matrix is a basis of the lower dimensional space. The method for finding the projection matrix is not limited to PCA. It is possible that other dimension reduction methods can be used. The output 20 of the determining step 504, as depicted by a dashed arrow 509, is the projection matrix for the input image. Control is then passed to a next determining step 505 that determines a weighted feature correlation pattern dependent upon the lower dimensional space created from the self similarity matrix. The input to this step, as depicted by a dashed arrow 510, is the projection 25 matrix and a set of eigenvalues obtained from the step 504, the transformation matrix V, and all the feature vectors from the input image. If the projection matrix is created by taking M eigenvectors, there are M vectors to be created. The created vector um is all the feature vectors projected onto the lower dimensional subspace (created from the self-similarity matrix) to extract the feature vectors that best describe the distribution of the feature vectors. 30 In other words, these extracted feature vectors um formed from the reduced set of basis are deemed to be most representative of the whole set of feature vectors X, and is expressed as UM = PM m = 1, 2, ..., M (9) PO0I9408_speci_lodged / 221211l -22 where Pm is the m-th vector of the projection matrix, V denotes the transformation matrix, and X represents a matrix in which each row is a feature vector from an image. Thus, the dimension of the basis of the extracted feature vectors um is smaller than the dimension of the basis of the whole set of feature vectors X. 5 The most representative set of feature vectors um is further normalised as follows: 1 , luml 2 (10) where L represents the dimension of the m-th vector urn and HUm 112 represents the Frobenius norm of the vector ur In one CRSS arrangement, a weighted feature correlation pattern for the input image is 10 created as follows: S = #mVmvim (11) where f#m represents the weight and vm denotes the transpose of the vector vm. The weighted feature correlation pattern S is formed by multiplying the normalised version of vector vm with a transposed version of the same vector vT . As previously described, Vm is 15 created by mapping all the feature vectors of an image into a lower dimensional space using a projection matrix formed by a plurality of eigenvectors determined from the self-similarity matrix. Each of the entries in the feature correlation pattern is weighted by #lm to obtain the weighted feature correlation pattern S. In one CRSS arrangement, the weight f#m is chosen to be flrm = In (ym) where ym 20 represents the m-th eigenvalue. In another CRSS arrangement, the weight f#m is chosen to be lm = m The process of calculating the weighted feature correlation pattern S is repeated for each of the job images 113 and query images 303. The determined weighted feature correlation pattern of the job image 113 and the query 25 image 303, output by the step 505, is used to determine the similarity matching score in the determining step 206 as previously explained. In one CRSS arrangement, vectors fQ and fj are extracted from the weighted feature correlation pattern S to determine the Euclidean distance between the two vectors fQ and f 1 . Matching is performed based on the determined Euclidean distance, which forms the similarity score R(fQ,fj). The similarity score 30 determines in the determining step 206 is used to determine if the job image and the query image are visually similar. PO I 9408 speci_lodged / 221211 -23 After the step 505 has determined the weighted feature correlation pattern, control passes back to the step 213 or 207, if the process 500 has been invoked by the step 204 or 206 respectively. INDUSTRIAL APPLICABILITY 5 The arrangements described are applicable to the computer and data processing industries and particularly for the process of identifying objects in object tracking in video analytics. The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit 10 of the invention, the embodiments being illustrative and not restrictive. In the context of this specification, the word "comprising" means "including principally but not necessarily solely" or "having" or "including", and not "consisting only of'. Variations of the word "comprising", such as "comprise" and "comprises" have correspondingly varied meanings. 15 P019408_specilodged /221211

Claims

2. A method according to claim 1, wherein: the second image is a second plurality of images; and said first image is matched to 25 one of the second plurality of images by: performing the steps (a) - (e) in regard to the first image, and in regard to the second plurality of images, to thereby form the first vector and a second plurality of second vectors; registering the second plurality of second vectors in an indexing structure; and wherein the matching step (f) comprises querying the indexing structure to match one of the 30 second plurality of images to the first image.
3. A method of matching a first image and a second image, said method comprising the steps of: PO19408_speci_lodged / 221211 -25 segmenting the first image and the second image into a first plurality of segments and a second plurality of segments respectively; determining a first metric encoding degrees of similarities between feature vectors of the first plurality of segments in the first image and a second metric encoding degrees of 5 similarities between feature vectors of the second plurality of segments in the second image, each of said first metric and said second metric having a set of scores encoding degrees of similarities; generating a first simplified representation of feature vectors in the first image dependent upon a first set of basis created from the first metric and a second simplified 10 representation of feature vectors in the second image dependent upon a second set of basis created from the second metric, a dimension of the first set and the second set of bases being smaller than a dimension of the feature vectors in the first image and the second image respectively; determining a first weighted feature correlation pattern of the first simplified 15 representation, and a second weighted feature correlation pattern of the second simplified representation; and matching the first image and the second image by determining a distance between the first weighted feature correlation pattern and the second weighted feature correlation pattern. 20 4. A method of matching a query image to one of a set of job images, said method comprising the steps of: segmenting each of the images into a plurality of segments respectively; determining a metric associated with each of the images, said metric encoding degrees of similarities between the plurality of segments of each of the images, each said metric 25 having a set of scores; generating a simplified representation of feature vectors in each of the images dependent upon respective sets of bases created from the associated metric, a dimension of each of the sets of bases being smaller than a dimension of the set of feature vectors of the associated image; 30 determining, for each of the images, a weighted feature correlation of the simplified representation associated with each of the images; registering the weighted feature correlation of the query image to construct an indexing structure; and PO 9408_specilodged / 221211 -26 querying each of the weighted feature correlations of the job images from the constructed indexing structure to match the query image to one of the job images.
5. A method according to claim 1, wherein the image is segmented using superpixel 5 segmentation.
6. A method according to claim 1, wherein the image is segmented at a plurality of locations where the histogram oriented gradients is extracted. 10 7. A method according to claim 1, wherein the similarity between segments determined by pair-wise similarity between the features of centre pixels in the segments.
8. A method according to claim 1, wherein the similarity between segments determined by pair-wise similarity between histograms of the feature vectors of pixels in the segments. 15
9. A method according to claim 1, wherein the similarity between segments determined by weighted pair-wise similarity between segments, where the segments near the vertical central axis of the image are given more weighting than the ones further away from the vertical central axis. 20
10. A method according to claim 1, comprising a further step of confirming if the first image matches the second image.
11. An apparatus comprising a processor, an input/output module for receiving inputs and 25 providing outputs, and a memory storing an executable software program for directing the processor to effect a method for matching a first image and a second image, said program comprising: software executable code for directing the input/output module to receive a first image and a second image; 30 software executable code for segmenting the first image into a first plurality of segments and the second image into a second plurality of segments ; software executable code for determining a first self-similarity matrix of first metrics encoding respective similarity scores between feature vectors of the first plurality of PO1 9408_speci_lodged / 221211 -27 segments, and a second self-similarity matrix of second metrics encoding respective similarity scores between feature vectors of the second plurality of segments; software executable code for generating a first reduced dimension representation of feature vectors in the first image based on a first subspace created from the first self-similarity 5 matrix, and a second reduced dimension representation of feature vectors in the second image based on a second subspace created from the second self-similarity matrix; software executable code for determining a first weighted feature correlation pattern of the first reduced dimension representation, and a second weighted feature correlation pattern of the second reduced dimension representation; 10 software executable code for forming a first vector from the first weighted feature correlation pattern, and a second vector from the second weighted feature correlation pattern; software executable code for matching the first image and the second image by determining a distance between the first vector and the second vector; and software executable code for confirming if the first image matches the second image. 15
12. A non-transitory computer readable storage medium storing a computer executable program configured to direct a processor to effect a method for matching a first image and a second image, said program comprising: software executable code for directing the input/output module to receive a first image 20 and a second image; software executable code for segmenting the first image into a first plurality of segments and the second image into a second plurality of segments ; software executable code for determining a first self-similarity matrix of first metrics encoding respective similarity scores between feature vectors of the first plurality of 25 segments, and a second self-similarity matrix of second metrics encoding respective similarity scores between feature vectors of the second plurality of segments; software executable code for generating a first reduced dimension representation of feature vectors in the first image based on a first subspace created from the first self-similarity matrix, and a second reduced dimension representation of feature vectors in the second image 30 based on a second subspace created from the second self-similarity matrix; software executable code for determining a first weighted feature correlation pattern of the first reduced dimension representation, and a second weighted feature correlation pattern of the second reduced dimension representation; P019408_specilodged / 221211 -28 software executable code for forming a first vector from the first weighted feature correlation pattern, and a second vector from the second weighted feature correlation pattern; software executable code for matching the first image and the second image by determining a distance between the first vector and the second vector; and 5 software executable code for confirming if the first image matches the second image.
13. An apparatus comprising a processor, an input/output module for receiving inputs and providing outputs, and a memory storing an executable software program for directing the processor to effect a method for matching a first image and a second image, said program 10 comprising: software executable code for segmenting the first image and the second image into a first plurality of segments and a second plurality of segments respectively; software executable code for determining a first metric encoding degrees of similarities between feature vectors of the first plurality of segments in the first image and a 15 second metric encoding degrees of similarities between feature vectors of the second plurality of segments in the second image, each of said first metric and said second metric having a set of scores encoding degrees of similarities; software executable code for generating a first simplified representation of feature vectors in the first image dependent upon a first set of basis created from the first metric and a 20 second simplified representation of feature vectors in the second image dependent upon a second set of basis created from the second metric, a dimension of the first set and the second set of bases being smaller than a dimension of the feature vectors in the first image and the second image respectively; software executable code for determining a first weighted feature correlation pattern of 25 the first simplified representation, and a second weighted feature correlation pattern of the second simplified representation; and software executable code for matching the first image and the second image by determining a distance between the first weighted feature correlation pattern and the second weighted feature correlation pattern. 30
14. A non-transitory computer readable storage medium storing a computer executable program configured to direct a processor to effect a method for matching a first image and a second image, said program comprising: PO I9408_specilodged / 221211 -29 software executable code for segmenting the first image and the second image into a first plurality of segments and a second plurality of segments respectively; software executable code for determining a first metric encoding degrees of similarities between feature vectors of the first plurality of segments in the first image and a 5 second metric encoding degrees of similarities between feature vectors of the second plurality of segments in the second image, each of said first metric and said second metric having a set of scores encoding degrees of similarities; software executable code for generating a first simplified representation of feature vectors in the first image dependent upon a first set of basis created from the first metric and a 10 second simplified representation of feature vectors in the second image dependent upon a second set of basis created from the second metric, a dimension of the first set and the second set of bases being smaller than a dimension of the feature vectors in the first image and the second image respectively; software executable code for determining a first weighted feature correlation pattern of 15 the first simplified representation, and a second weighted feature correlation pattern of the second simplified representation; and software executable code for matching the first image and the second image by determining a distance between the first weighted feature correlation pattern and the second weighted feature correlation pattern. 20
15. An apparatus comprising a processor, an input/output module for receiving inputs and providing outputs, and a memory storing an executable software program for directing the processor to effect a method for matching a query image to one of a set of job images, said program comprising: 25 software executable code for segmenting each of the images into a plurality of segments respectively; software executable code for determining a metric associated with each of the images, said metric encoding degrees of similarities between the plurality of segments of each of the images, each said metric having a set of scores; 30 software executable code for generating a simplified representation of feature vectors in each of the images dependent upon respective sets of bases created from the associated metric, a dimension of each of the sets of bases being smaller than a dimension of the set of feature vectors of the associated image; PO1 9408_specilodged / 221211 -30 software executable code for determining, for each of the images, a weighted feature correlation of the simplified representation associated with each of the images; software executable code for registering the weighted feature correlation of the query image to construct an indexing structure; and 5 software executable code for querying each of the weighted feature correlations of the job images from the constructed indexing structure to match the query image to one of the job images.
16. A non-transitory computer readable storage medium storing a computer executable 10 program configured to direct a processor to effect a method for matching a query image to one of a set of job images, said program comprising: software executable code for segmenting each of the images into a plurality of segments respectively; software executable code for determining a metric associated with each of the images, 15 said metric encoding degrees of similarities between the plurality of segments of each of the images, each said metric having a set of scores; software executable code for generating a simplified representation of feature vectors in each of the images dependent upon respective sets of bases created from the associated metric, a dimension of each of the sets of bases being smaller than a dimension of the set of 20 feature vectors of the associated image; software executable code for determining, for each of the images, a weighted feature correlation of the simplified representation associated with each of the images; software executable code for registering the weighted feature correlation of the query image to construct an indexing structure; and 25 software executable code for querying each of the weighted feature correlations of the job images from the constructed indexing structure to match the query image to one of the job images.
17. A method of matching a first image and a second image, substantially as described 30 herein, with reference to the accompanying drawings.
18. An apparatus for matching a first image and a second image, substantially as described herein, with reference to the accompanying drawings. PO 9408_speci_lodged / 221211 -31
19. A non-transitory computer readable storage medium storing a computer executable program configured to direct a processor to effect a method for matching a first image and a second image, substantially as described herein, with reference to the accompanying drawings. 5 DATED this 2 2 nd Day of December 2011 CANON KABUSHIKI KAISHA Patent Attorneys for the Applicant 10 SPRUSON&FERGUSON P01 9408_specilodged / 221211