GB2601310A

GB2601310A - Methods and apparatuses relating to object identification

Info

Publication number: GB2601310A
Application number: GB2018481.8A
Authority: GB
Inventors: Goldstein Maayan; Van Cutsem Tom
Original assignee: Nokia Solutions and Networks Oy
Current assignee: Nokia Solutions and Networks Oy
Priority date: 2020-11-24
Filing date: 2020-11-24
Publication date: 2022-06-01
Also published as: GB202018481D0

Abstract

Disclosed is an apparatus comprising means configured to receiving a sequence of images and performing object tracking on the sequence of images. A respective identity, from among a plurality of predefined identities, is determined that corresponds to one or more first objects that are tracked in a first sub-sequence of images of the sequence of images. It is then determined, based on the object tracking, that a second object that is present in a second sub-sequence of one or more images of the sequence of images is to be identified, wherein the first sub-sequence precedes the second sub-sequence in the sequence of images. In response to determining that the second object is to be identified, determining an identity, from among the plurality of predefined identities, corresponding to the second object based on (i) contextual information derived from the first sub-sequence of images, and (ii) the plurality of predefined identities, wherein the contextual information comprises one or more of the identities determined to correspond to the one or more first objects.Determining the identity corresponding to the second object based on the contextual information derived from the first sub-sequence of images may comprise making a first comparison between one or more images or the second object from the second sub-sequence and images of the one or more first objects from the first sub-sequence. Determining the identity corresponding the second object may further comprise making a second comparison between the one or more images of the second object and definitions in a database comprising mappings between the plurality of predetermined identities and at least one corresponding definition which comprises at least one of an image including an object corresponding to the respective identity and features generated from an image corresponding to the respective identity.

Description

Methods and Apparatuses relating to Object Identification

Field

The present specification relates to object identification in a sequence of images.

Background

The tracking of objects including people can be performed by recording video of an environment in which the objects of interest are present, and by processing the video to identify the objects in the video.

For instance, it may be desirable to effectively track individuals within crowds of people in public locations such as airports, stadiums, etc. Similarly, being able to effectively track objects of other types may be desirable in locations such as warehouses and factory lines.

Summary

The scope of protection sought for various embodiments of the invention is set out by the independent claims. The embodiments and features, if any, described in this specification that do not fall under the scope of the independent claims are to be interpreted as examples useful for understanding various embodiments of the invention.

According to a first aspect, there is described an apparatus comprising means configured to perform: receiving a sequence of images; performing object tracking on the sequence of images; determining a respective identity, from among a plurality of predefined identities, corresponding to one or more first objects that are tracked in a first subsequence of images of the sequence of images; determining, based on the object tracking, that a second object that is present in a second subsequence of one or more images of the sequence of images is to be identified, wherein the first subsequence precedes the second subsequence in the sequence of images; responsive to determining that the second object is to be identified, determining an identity, from among the plurality of predefined identities, corresponding to the second object based on (i) contextual information derived from the first subsequence of images, and (ii) the plurality of predefined identities, wherein the contextual information comprises one or more of the identities determined to correspond to the one or more first objects.

Determining the identity corresponding to the second object based on the contextual information derived from the first subsequence of images may comprise: making a first comparison between one or more images of the second object from the second subsequence and images of the one or more first objects from the first subsequence, and determining the identity corresponding to the second object based on the first comparison.

Determining the identity corresponding to the second object may comprise: making a second comparison between the one or more image of the second object and definitions /c) in a database, wherein the database comprises mappings between the plurality of predefined identities and at least one corresponding definition, wherein the at least one corresponding definition may comprise at least one of: (i) an image including an object corresponding to the respective identity, and (ii) features generated from an image including an object corresponding to the respective identity; and determining the identity corresponding to the second object may be based on the first and second comparisons.

The first comparison may comprise: determining a similarity between the one or more images of the second object and the images of the one or more first objects, and the second comparison may comprise determining a similarity between the one or more images of the second object and the definitions in the database, and wherein determining the identity corresponding to the second object based on the first and second comparisons may comprise, based on the determined similarities, determining which identity corresponds to the second object.

Determining which identity corresponds to the second object based on the determined similarities may comprise determining which of the images of the one or more first objects and the definitions in the database is most similar to the one or more images of the second object.

Determining which identity corresponds to the second object based on the determined similarities may comprise majority voting.

Determining the identity corresponding to the second object may be weighted towards 35 the identities determined to correspond to the one or more first objects. -3 -

Making the first comparison may comprise generating a first similarity matrix and making the second comparison may comprise generating a second similarity matrix, wherein the first similarity matrix may comprise a measure of similarity between the one or more images of the second object and the images of the one or more first objects, and the second similarity matrix may comprise a measure of similarity between the one or more images of the second object and the definitions.

Measurements of similarity may be calculated using one of a Euclidean distance and a cosine similarity between features generated from respective images. The features may /o be output from a machine learning model.

Measurements of similarity in the first similarity matrix may be modified such that the modified measurements of similarity may indicate a higher degree of similarity between the respective image of the second object and the respective image of the first object than the corresponding unmodified measure of similarity.

The means may be configured to perform: determining, based on the object tracking, that a plurality of second objects that are present in the second subsequence of one or more images of the sequence of images is to be identified; and responsive to determining that a plurality of second objects are to be identified, determining an identity, from among the plurality of identities, corresponding to the second objects based at least in part on one or more of the identities determined to correspond to objects of the first plurality of objects.

or Determining the identity corresponding to the second objects may comprise one of a greedy approach, and minimum weight perfect matching.

The objects may be humans, and the plurality of predefined identities may correspond to predefined identities of individual humans. The second object may be one of the one or more first objects.

The second subsequence of images may comprise a plurality of images from the sequence of images, and the second object may be tracked in the second subsequence of images. At least one of the one or more first objects may be tracked in the second 35 subsequence of images. -4 -

The means may be configured to perform: outputting the identities determined to correspond to the one or more first objects and the second object.

The sequence of images may be captured by one or more image capture devices, and the 5 one or more image capture devices may capture a field of view of an environment.

Performing object tracking on the sequence of images may comprise determining trajectories for the one or more first objects, and wherein determining, based on the object tracking, that the second object is to be identified may comprise determining /o that a position of the second object does not correlate sufficiently to the trajectories of the one or more first objects.

According to a second aspect, there is described a method comprising: receiving a sequence of images; performing object tracking on the sequence of images; determining a respective identity, from among a plurality of predefined identities, corresponding to one or more first objects that are tracked in a first subsequence of images of the sequence of images; determining, based on the object tracking, that a second object that is present in a second subsequence of one or more images of the sequence of images is to be identified, wherein the first subsequence precedes the second subsequence in the sequence of images; responsive to determining that the second object is to be identified, determining an identity, from among the plurality of predefined identities, corresponding to the second object based on (i) contextual information derived from the first subsequence of images, and (ii) the plurality of predefined identities, wherein the contextual information comprises one or more of the -0 or identities determined to correspond to the one or more first objects.

Determining the identity corresponding to the second object may comprise: making a 35 second comparison between the one or more image of the second object and definitions in a database, wherein the database comprises mappings between the plurality of -5 -predefined identities and at least one corresponding definition, wherein the at least one corresponding definition may comprise at least one of: (i) an image including an object corresponding to the respective identity, and (ii) features generated from an image including an object corresponding to the respective identity; and determining the identity corresponding to the second object maybe based on the first and second comparisons.

The first comparison may comprise: determining a similarity between the one or more images of the second object and the images of the one or more first objects, and the (c) second comparison may comprise determining a similarity between the one or more images of the second object and the definitions in the database, and wherein determining the identity corresponding to the second object based on the first and second comparisons may comprise, based on the determined similarities, determining which identity corresponds to the second object.

Determining the identity corresponding to the second object may be weighted towards the identities determined to correspond to the one or more first objects.

Measurements of similarity maybe calculated using one of a Euclidean distance and a 35 cosine similarity between features generated from respective images. The features may be output from a machine learning model. -6 -

Measurements of similarity in the first similarity matrix may be modified such that the modified measurements of similarity may indicate a higher degree of similarity between the respective image of the second object and the respective image of the first 5 object than the corresponding unmodified measure of similarity.

The method may comprise: determining, based on the object tracking, that a plurality of second objects that are present in the second subsequence of one or more images of the sequence of images is to be identified; and responsive to determining that a /o plurality of second objects are to be identified, determining an identity, from among the plurality of identities, corresponding to the second objects based at least in part on one or more of the identities determined to correspond to objects of the first plurality of objects.

Determining the identity corresponding to the second objects may comprise one of a greedy approach, and minimum weight perfect matching.

The second subsequence of images may comprise a plurality of images from the sequence of images, and the second object may be tracked in the second subsequence of images. At least one of the one or more first objects may be tracked in the second or subsequence of images.

The method may comprise: outputting the identities determined to correspond to the one or more first objects and the second object.

The sequence of images may be captured by one or more image capture devices, and the one or more image capture devices may capture a field of view of an environment.

Performing object tracking on the sequence of images may comprise determining trajectories for the one or more first objects, and wherein determining, based on the 35 object tracking, that the second object is to be identified may comprise determining -7 -that a position of the second object does not correlate sufficiently to the trajectories of the one or more first objects.

According to a third aspect, there is provided a computer program product comprising 5 a set of instructions which, when executed on an apparatus, is configured to cause the apparatus to carry out the method of any method definition of the second aspect.

According to a fourth aspect, there is provided a non-transitory computer readable medium comprising program instructions stored thereon for performing a method /c) comprising: receiving a sequence of images; performing object tracking on the sequence of images; determining a respective identity, from among a plurality of predefined identities, corresponding to one or more first objects that are tracked in a first subsequence of images of the sequence of images; determining, based on the object tracking, that a second object that is present in a second subsequence of one or more images of the sequence of images is to be identified, wherein the first subsequence precedes the second subsequence in the sequence of images; responsive to determining that the second object is to be identified, determining an identity, from among the plurality of predefined identities, corresponding to the second object based on (i) contextual information derived from the first subsequence of images, and (ii) the plurality of predefined identities, wherein the contextual information comprises one or more of the identities determined to correspond to the one or more first objects.

The program instructions of the fourth aspect may also perform operations according to any method definition of the second aspect.

According to a fifth aspect, there is provided an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to: receive a sequence of images; perform object tracking on the sequence of images; determine a respective identity, from among a plurality of predefined identities, corresponding to one or more first objects that are tracked in a first subsequence of images of the sequence of images; determine, based on the object tracking, that a second object that is present in a second subsequence of one or more images of the sequence of images is to be identified, wherein the first subsequence precedes the second subsequence in the sequence of images; responsive to determining that the second object is to be identified, determine an identity, from among the plurality of predefined identities, corresponding to the -8 -second object based on (i) contextual information derived from the first subsequence of images, and (ii) the plurality of predefined identities, wherein the contextual information comprises one or more of the identities determined to correspond to the one or more first objects.

The computer program code of the fifth aspect may also cause performance of operations according to any method definition of the second aspect.

Brief description of the Drawings

/o Example embodiments will now be described by way of non-limiting example, with reference to the accompanying drawings, in which: Figure 1 is a simplified schematic indicating an overview of example operations which may be performed in various implementations of the disclosed technology; Figure 2 is a schematic view of at least part of a system which may be used to carry out the operations described herein; Figure 3 shows an example of a sequence of images including multiple objects; Figure 4 shows an example of a sequence of images including an object; Figures 5, 6 and 7 are flow diagrams showing operations that may be performed in 20 various examples of the described technology; Figure 8 is a schematic view of an example apparatus which may be configured to perform various operations described herein; and Figure 9 is a non-transitory medium which may be used to store computer-readable code which, when executed by one or more processors of an apparatus, may cause or performance of various operations described herein.

Detailed Description

In the description and drawings, like reference numerals refer to like elements throughout.

Object identification involves recognising the identities of objects present in a sequence of images. The objects' movement may also be tracked through the sequence of images. In this way, the identities of the objects can be consistent throughout the sequence of images. -9 -

However, in the case that an object cannot be tracked throughout the sequence of images, for instance, if it becomes occluded partway through the sequence of images, the identity of the object must be re-ascertained. In addition, new objects may appear partway through the sequence of images, and in this case, these new objects must be identified. Furthermore, until an identity is determined for an unknown object, it cannot be known whether or not the object was one of the objects previously tracked, or whether it is a new object. Effectively tracking objects throughout a sequence of images may therefore involve identifying and, possibly also re-identifying, objects that are present in the sequence of images.

In some known techniques, objects are assigned new identities each time they reappear. As such, the identity determined to correspond to an object may not be consistent throughout a sequence of images. This may be undesirable and, instead, it may be desired for the same objects to be assigned the same identities throughout a sequence of images.

When objects are re-identified, it may be desirable that the re-identification is accurate. One measurement of accuracy is that the identity identified for the object in re-identification matches that of the identity initially identified for the same object. In other words, it may be desirable to reduce the number of identity switches for the same object in the sequence of images.

The tasks of multiple object identification and re-identification can be computationally demanding. It may therefore be desirable to reduce the time taken to infer the identity or of objects. It may also be desirable to reduce the required processing power to perform object identification and re-identification, and the computational demand to perform these tasks.

Various implementations of the technology disclosed herein relate to an apparatus and method for identification of objects appearing within a sequence of images.

Specifically, as shown in Figure 1, implementations may include initially tracking and identifying objects in a sequence of images, and identifying and/or re-identifying unidentified objects that appear partway through the sequence of images based at least in part on contextual information.

-10 -Various implementations may leverage the contextual information which maybe derived from several previous images to improve the precision and shorten the time it takes to determine an object's identity. Contextual information may comprise identities of objects identified in the previous images, and in some examples images of those identified objects that are extracted from the previous images. The contextual information may also comprise identities of other identified objects seen in images containing the unidentified objects. The identities of other identified objects seen in images containing the unidentified objects may, for instance, be used when identifying unidentified objects to discount identities of objects which have been determined to be jo present in the images containing the unidentified objects.

Various implementations of the described technology may achieve faster inference time and higher precision as the processing may be performed on batches of images, while giving higher priority to identities that already appeared in the scene. For example, if four people appear in a video for a few seconds and then one person gets occluded and reappears again, matches of the new appearance against the other images of that person can be prioritised, rather than matches against a list of all people that are in a database.

Various implementations may therefore improve the accuracy of re-identification and/or allow the identity identified for the object during re-identification to match more often that of the identity initially identified for the same object. In other words, various implementations disclosed herein may reduce the number of identity switches for the same object in the sequence of images.

Various implementations disclosed herein may also reduce the time taken to perform the tasks of object identification and re-identification and thus infer the identity of objects. Processing power and/or computational demand required to perform object identification and re-identification may also be reduced.

Figure 2 is a schematic view of at least part of a system 200, which maybe used to carry out the operations described herein. As shown in Figure 2, the system 200 may include one or more image capture devices 240, a computing apparatus 210, and in some examples a server apparatus 220.

In some example implementations, the system 200 may include a single image capture device 240. In such examples, the image capture device 240 is arranged to capture a sequence of images of a single, static field of view of an environment. In other examples, the single image capture device 240 may be moveable such that its field of view can change throughout the sequence of images.

In other example implementations, a plurality of image capture devices 240 may be included. In such examples, the plurality of image capture devices 240 may be arranged to capture sequences of images of different fields of view of the environment jo (which may or may not be static). For instance, the plurality of image capture devices 240 may be arranged to capture different angles of the same environment, for instance positioned at different points in e.g. a ticket hall, warehouse or exterior urban environment, or positioned at the top and bottom of an elevator. The sequence of images captured by the plurality of image capture devices 240, and on which object identification may be performed, may be formed from a selection of sequences of images captured by the plurality of image capture devices 240.

For instance, in the example mentioned above where the plurality of image capture devices 240 are positioned at the top and bottom of an elevator, a first portion of the sequence of images may be captured by a first image capture device 240 having a first field of view which includes the bottom of the elevator, and a second portion of the sequence of images may be captured by a second image capture device 240 having a second field of view including the top of the elevator. Continuing with this example, the second portion of the sequence of images (in this case, the images captured by the or second image capture device 240) may follow the first portion of the sequence of images (in this case, the images captured by the first image capture device 240) such that first portion may include images of a person entering and travelling up the elevator from the bottom, and the second portion may include images of a person approaching and exiting the top of the elevator.

The one or more image capture devices 240 may include at least one of a video camera, a surveillance camera, a closed circuit television (Cei'17) camera, an IP camera, a security camera, a mobile phone camera, a digital camera, a film camera and/or any other type of image capture device 240. The sequence of images captured by the image capture devices 240 may be a sequence of still images taken at regular or irregular -12 -intervals, or may be a sequence of video frames where the image capture device(s) 240 is/are configured to capture video.

Alternatively, in some examples, the apparatus does not include an image capture device 240. In such examples, the sequence of images may be simulated. For instance, the sequence of images may be generated by the computing apparatus 210 or some other entity by simulating the environment and/or objects in the environment.

As mentioned above, the system zoo may include a computing apparatus 210. The jo operations described herein may be performed by the computing apparatus 210. The computing apparatus 210 may be in communication with the one or more image capture device 240 via wireless and/or wired communication, and optionally via at least one of the internet, the server apparatus 220, another computing apparatus 210, and/or the cloud. In some examples, the computing apparatus 210 may be integral with the one or more image capture device 240, or at least local to the one or more image capture device 240. In other examples, the computing device may be remote from the one or more image capture device 240. in some examples, the system 200 may include a plurality of computing apparatuses 210, wherein the performance of part or all of the operations described herein is distributed across the plurality of computing apparatuses no. In some example implementations, the system 200 may include a server apparatus 220. The server apparatus 220 may be local to the computing apparatus 210 and/or the one or more image capture device 240, or may be remote from the computing -0 vi apparatus 210 and/or the one or more image capture device 240. The server apparatus zzo may be in communication with the computing apparatus 210 and/or the one or more image capture device 240 via wireless and/or wired communication.

in some examples, the one or more image capture device 240 may provide the sequence of images to the computing apparatus 210 via the server apparatus 220. in other examples, the one or more image capture device 240 may provide the sequence of images to the computing apparatus 210, without going via the server apparatus 220.

In some examples, the server apparatus 220 may store a database 230. The database 35 230 may include mappings between a plurality of predefined identities and one or more corresponding definitions. In such examples, the server apparatus 220 may provide the -13 -computing apparatus 210 with access to the database 230, for instance by providing part or all of the information stored in the database 230 to the computing apparatus 210. The computing device may be provided with an or a subset of the predefined identities and corresponding definitions stored in the database. In addition or alternatively, the computing device may, for the provided predefined identities, be provided with all or a subset of the definitions stored in the database. For instance, the definitions provided per predefined identity may include the same number of definitions. In other examples, the computing device may be provided with a different number of definitions for different identities (for instance, when the database has a different number of definitions for different identities).

The provision of access to part or all of the information stored in the database 230 may be in response to a request from the computing apparatus 210, or may be triggered by a command from the server apparatus zzo and/or the image capture device 240.

In other examples, the database 230 may be stored somewhere other than the server apparatus 220, for instance, on the computing apparatus 210, or remote from both the computing apparatus 210 and the server apparatus 220.

An identity is associated with a corresponding object. In some examples, the identity is unique to a specific example of an object. Additionally or alternatively, the association between the identity and the corresponding object is constant not only across the sequence of images, but also across any other image or representation of the corresponding object. An identity may include a name associated with a respective or object, or may be a string or numerical value assigned to a respective object. An identity may be machine readable, for instance a bar code or a QR code.

In some examples in which the objects are humans, the identities may correspond to a predefined identity of an individual human. For instance, in such examples, the identity may be a name and/or alias associated with an individual human, or may be a string and/or numerical value associated with the individual human. For instance, the database 230 may be a watch list including images of individuals of interest mapped to corresponding names and/or other associated information. However, the objects are not limited to humans, and could be, for instance, vehicles or items in a warehouse to name but a few.

-14 -The one or more definitions may comprise data that is suitable for enabling objects appearing in the sequence of images to be recognised as corresponding to the definition. For instance, the one or more definitions corresponding to the identities may, in some examples, comprise an image, which includes the object corresponding to the respective identity, and/or features generated from an image, which includes the object corresponding to the respective identity. The features generated from an image may be, for instance, in the form of a feature vector. In some examples, the features may be generated using a machine learning model. In such examples, the input to the machine learning model may be an image including an object corresponding to the jo respective identity, and the output of the machine learning model may be the features (e.g. a feature vector). The database 230 may include one, or more than one, different definitions for each identity. For instance, the database 230 may include multiple images of a particular person that is included in the database 230.

In some examples, the definitions may comprise biometric identifiers. For instance, the definitions may comprise anything that may be recognised in the sequence of images, such as but not limited to facial recognition features. In other examples, where the objects to be recognised are vehicles, the definitions may comprise images of a licence plates corresponding to vehicles as an alternative or in addition to images of the vehicles themselves.

In some examples, the system 200 may not include a server apparatus 220. In such examples, the computing apparatus 210 may have access to the database 230 without need of the server apparatus 220.

Returning now to Figure 1, Figure 1 is a simplified schematic indicating an overview of example operations which may be performed in various implementations of the disclosed technology. As shown in Figure 1, example operations may include object tracking and object identification and may enrich these operations with contextual information. in some examples, existing state of the art object trackers and object identifiers maybe used. For instance, some such known techniques are described in "Simple Online and Realtime Tracking" by Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, Ben Uperoft (2016) and "Simple Online and Realtime Tracking with a Deep Association Metric" by Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich (2017).

-15 -As shown in Figure 1, a sequence of images may be received by the computing apparatus 210. In some examples, the sequence of images may be captured by one or more image capture devices 240, and may be provided to the computing apparatus 210. The sequence of images may, for instance, comprise a sequence of video frames.

The computing device may perform object tracking 120 on the sequence of images to locate objects in, and track objects through, the sequence of images. The computing apparatus 210 may perform object tracking 120 using any known object tracking 120 technique. The computing device may associate each instance of an object appearing in /0 an image from the sequence of images to each other instance of the same object in the other images of the sequence of images. For instance, if the sequence of images includes a plurality of objects, the computing device may locate the position of each of the objects in each of the sequence of images in which they are present, and associate each instance of each object with the other instances of the same object in the sequence of images.

The object tracking 120 may include the use of a motion model which describes the possible motion which might be recorded for an object in the sequence of images. The performance of object tracking 120 may further include determining trajectories for each object tracked in the sequence of images.

For instance, in the case that the objects are humans, and there are three humans moving around an environment depicted in the sequence of images, the computing device may associate instances of each of the humans in the sequence of images with other instances of the same human in the sequence of images based on the motion of the humans throughout the sequence of images.

In some examples, the object tracking 120 may determine bounding boxes or outlines around the objects. in some examples, the object tracking 120 may extract an image of an object from an image of the sequence of images. The extraction may be based on, for instance, a determined bounding box around the object.

Continuing with the example above, the object tracking 120 may extract an image of an instance of a human from an image of the sequence of images for each instance of each 35 of the three humans present in the sequence of humans. The extracted image may include, for instance, the human's face and/or the human's body.

-16 -The computing apparatus 210 may perform object identification 130 on the sequence of images to determine a predefined identity corresponding to the tracked objects. The object identification 130 may be a machine learning model that can match an image of an object against a database 230 of images of objects with known identities. The computing apparatus 210 may perform object identification 130 using any known object identification technique. One such known technique is described in "Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch" by Zhou, Kaiyang and Xiang, Tao (2019). However, such known techniques may be imprecise in some /o situations. One way in which the precision of object identification 130 may be improved is by looking at several images of an object and selecting the final identity from the identities ascertained for each image.

In some examples, the object identification 130 may include making a comparison between images of the tracked objects in the sequence of images and definitions in the database 230. The images of the tracked objects may be the images of the sequence of images in which the tracked objects are present. Additionally or alternatively, the images of the tracked objects may be extracted images from the sequence of images, as described above. When the definitions are images, features of the images of the tracked objects and the definitions may be generated and compared. When the definitions are features, features of the images of the tracked objects can be generated and compared to the definitions. The features maybe, for instance, feature vectors. The features may be, for instance, generated by a machine learning model. In such cases, the machine learning model takes as input images and outputs features of the input image. The or comparison between features may be, for instance, a calculation of the Euclidian distance between two respective features (e.g. feature vectors), or may be, for instance, a calculation of the Cosine similarity between two respective features.

As an illustrative example, a first feature for an object could be "long hair", with possible values for the first feature being "o" to indicate short hair and "1" to indicate long hair. A second feature for an object could be "brown hair", with possible values being "o" to indicate not brown hair and "1" to indicate brown hair. As such a first object (or person) with short brown hair may generate a feature vector of <0,1> and a second object with long brown hair may generate a feature vector of <1,1>. A distance between the first and second objects could therefore be found by finding the distance between the vectors <0,1> and <1,1>.

-17 -Continuing with the above examples, the object identification 130 may include comparing each image of each tracked object in the sequence of images with all, some or a particular subset of the definitions included in the database 230. In some examples, the result of the comparisons can be arranged as a similarity matrix, wherein the images of each tracked object are one axis, and the definitions of each identity are the other axis. Measurements of similarity between a respective image of a tracked object and a respective definition of an identity may fill the matrix. Any suitable measurement of similarity may be used, for instance the measurement may be one that jo increases with a higher degree of similarity or increases with a higher degree of difference. For instance, as mentioned above, the measurement of similarity may be, but is certainly not limited to, a Euclidian distance between two respective features (e.g. feature vectors), or a Cosine similarity between two respective features.

Object identification 130 may include determining which of the identities in the database 230 corresponds to the tracked objects. Where there is a single image of a tracked object, and there is a single definition for each identity, the identity corresponding to the definition which is most similar to the image of the tracked object may be determined to be the identity which corresponds to the tracked object. For instance, the measurement of similarity which indicates that the image of the tracked object is most similar to a definition of identity maybe determined, and the identity corresponding to that measurement maybe ascertained. The ascertained identity may then be determined to correspond to (may be assigned to) the tracked object. In some examples, object identification may be performed once and tracking may be used to or track the object to which the identity has been assigned. If the object is "lost" (e.g. is no longer visible) for a period of time and then reappears, object identification may be performed again.

In some examples, there may be multiple images of a tracked object, for instance when the object appears in multiple images of the sequence. Additionally or alternatively, there maybe one or more definitions in the database 230 for each identity. The identity corresponding to the definitions which are most similar to the images of the tracked object may be assigned to the tracked object. For instance, the determination as to which definition is most similar to the images of the object may be based on majority voting. An example of majority voting is described in relation to Figure 4 below.

-18 -Figure 4 shows an example of a sequence of images 400 including an object 42oA to 42oH. Object identification has been performed on the images 410A to 410H in the sequence of images 400 by comparing the images 41oA to 41oH against definitions in the database 230, as described above. The identity 430A to 43011 determined to correspond to the object 420A to 420H in each of the images 410A to 4101-1 is shown in the bottom right hand corner of each image 410A to 41oH. As can be seen in Figure 4, the identity "4" has been determined to correspond to the object 42oA, 420% 420D, 42oH in four of the images 41oA, 41oB, 41oD, 41oH, the identity "7" has been /o determined to correspond to the object 420C, 420F, 420G in three of the images 410C, 41oF, 41oG, and the identity "5" has been determined to correspond to the object 420E in one of the images 41/3E. According to majority voting, the object 42oA to 420H would therefore have four "votes" for identity "4", three votes for identity "7" and one vote for identity "5". As a result, the object 42oA to 42oH may be determined to correspond to identity "4" since this identity has the most votes.

A more detailed discussion of majority voting is provided below in relation to determining identities of second objects.

Example alternatives to the "majority vote" approach may include, but are certainly not limited to, determining one or more definitions which are the most similar to the images of a tracked object based on the average or total measurement of similarity between each of the images of the tracked object and each definition. In this case, for a given definition, the respective measurements of similarity may be averaged or or summed over each image of the tracked object, and/or over definitions corresponding to the same identity. The identity corresponding to the definition or definitions having the averaged or summed measurement of similarity which indicates the greatest similarity may then be determined to correspond to the tracked object.

Continuing with the above examples, where there are multiple tracked objects to be identified, this process may repeat for each of the tracked objects. In some examples, object identification may include a "greedy algorithm". In such examples, a "best match" identity for an object may be found, that identity is then determined to correspond to one of the objects and is not considered in relation to any subsequent objects, then a "best match" identity is found for a subsequent object, determined to correspond to the subsequent object, and is not considered in relation to any further -19 -subsequent objects. This process is then repeated until each object has been assigned an identity. In this way, no two objects are assigned the same identity. Additionally or alternatively, the multiple tracked objects may be assigned identities in other ways than using a "greedy algorithm", one such example being minimum weight perfect matching as, for instance, described in "The Hungarian method for the assignment problem" by Kuhn, H. (1955). The order in which the objects are handled in a "greedy algorithm" based approach may be determined in any number of ways, but could be based on how similar the images of that object are to the definitions, with the object having the highest similarity of all the objects being assigned an identify first, the next most similar being assigned an identify second and so on.

The sequence of images may comprise a first subsequence (also referred to above as a first portion) of images in which one or more objects are tracked and identified in the manner described above. In addition, there may be a second subsequence (or portion) of images in which one or more unidentified objects may be present. The one or more tracked and identified objects in the first subsequence may be referred to as one or more "first objects", and the one or more unidentified objects in the second subsequence may be referred to as one or more "second objects".

The first subsequence may precede the second subsequence in time such that the images included in the second subsequence were captured after (in time) the images include in the first subsequence. The first subsequence may immediately precede the second subsequence (for instance, a time interval between images in the first subsequence may be the same as a time interval between the last image in the first or subsequence and the first image in the second subsequence), or there may be a certain (longer) duration between the first subsequence ending and the second subsequence beginning. The first and second subsequence may both be captured by the same image capture device 240, the first subsequence may be captured by a first image capture device 240 and the second subsequence may be captured by a second image capture device 240, or one or both of the first and second subsequence may be captured by a plurality of image capture devices 240.

One or more of the second objects may be one or more of the first objects which could not be tracked through both the first and second subsequence. For instance, the second 35 objects may have become obscured and therefore could not be tracked between the first and second subsequence. Additionally or alternatively, one or more of the second -20 -objects may not be one of the first objects, and may newly appear in the second subsequence of images. Some or all of the first objects may be tracked and identified in the second subsequence of images. Additionally or alternatively, the second objects may be tracked through some or all of the images of the second subsequence of images using object tracking 120, as described above.

The identity (or identities) corresponding to the one or more second objects may be determined based on (i) contextual information derived from the first subsequence of images, and (ii) the plurality of predefined identities. The contextual information may jo comprise one or more of the identities determined to correspond to the one or more first objects.

In some examples, images may be analysed in batches. For instance, for the same object tracked over V images the most suitable identity may be selected out of: (i) V identities found by comparisons of the V images against definitions, and in parallel (ii) the identities in W previous images. For instance, the first subsequence may comprise W images, and the second subsequence may comprise V images. The final identity may then be calculated based on information both from V and W. The batches are used to determine multiple identities at the same processing step.

In some specific examples, implementations of the technology may run every 100 images. In this case, for standard videos with 30 frames per second, a batch may be processed every 3.3 seconds. In images o-roo identities of objects which appear in these images may be found. During this first batch, W is unavailable. In images mo-or 200 the tracks of all the objects who do not have identities may be found. Within these roo images, there may be a person with a track of 50 images and another with a track of zo images only. In some examples, the computation may be limited to zo images (V) per object. The batch would then contain a maximum of zo images per un-identified object. These images may be compared to definitions in the database 230 with known identities. At the same time, these images may be compared to images of objects seen in images o-roo where the identities of the objects are known. In some examples, the computation may be limited to a maximum of 30 images (W). The identities in W images may be given higher weight as it may be assumed that there is a high chance of an object simply re-occurring in images 100-200 following an occlusion or other type of track loss. As will be appreciated, the numbers of images described above represent just -21 -one example implementation. Any other appropriate number of images may instead be used.

Advantageously, implementations of the present technology may therefore assign unidentified objects with the same identity as previously assigned to the same object.

In addition, since the analysis is done for multiple identities at the same time, object identification and re-identification may be performed faster. Moreover, based on the context collected from previous frames and since the analysis is done for multiple jo identities at the same time, implementations may achieve better accuracy in object identification and re-identification as variability in lighting and other physical conditions can be handled better.

Furthermore, implementations may have improved accuracy, as fewer identity switches occur. Moreover, the overall precision in identifying objects may be improved.

Figure 3 shows an example of a sequence of images 300 including objects. As shown in Figure 3, a plurality of objects 320A to 320D, in this case humans, may be tracked and identified in a first subsequence of images from the sequence of images. This is illustrated by the objects being consistently labelled "1" 330A, "2" 330B, "3" 330C, and "4" 330D in images #1 31DA to #4 310D, which form the first subsequence of images.

As also shown in Figure 3, the sequence of images may include a second subsequence of images in which a second, unidentified object 340 is present. This is illustrated by one of the objects 340 present in frame #5 31oE being labelled with a question mark 350, indicating that the identity of the object 340 is unknown. The second subsequence of images also includes some of the identified objects 320A t0 320C from the first subsequence of images, which is indicated by the objects being labelled "1", "2", and "3".

In the example of the sequence of images 300 from Figure 3, the object 32oD labelled with "4" 330D in the first subsequence becomes occluded in frame #4 3ioD. in this case, the object tracking 120 is unable to track the object 320D labelled with "4" 330D in subsequent images 31(3E, 31oF because of this occlusion event. Responsive to determining that an object 340 is present in frame #5 for which no identity 330A to 330D has been assigned, the computing apparatus 210 may perform context aware object identification and re-identification on the second subsequence of images in order -22 -to identify or re-identify the unidentified object 340. In this example, the contextual information may be defined by the known objects 320A to 320C remaining in the second subsequence of images.

Since the second object 340 is unidentified and not tracked, it is not known whether or not the second object 340 is a newly introduced object or was one of the first objects 320A to 320D tracked in the first subsequence of images. Therefore, the determining of the identity 350 corresponding to the second object 340 may not be limited to only those identities 33oA to 330D determined to correspond to the first objects 320A to Jo 32oD. However, it may be assumed that it is more likely that the unidentified object 340 is one of the first objects 320A to 320D which are known to be present in the first subsequence. As such, the determining of the identity 350 of the second object 340 may be based, at least in part, on the contextual information derived from the previous frames, which may comprise, for instance, identities 330A to 330D of the first objects 320A to 32oD.

If, instead, object tracking 120 only was performed (and not object identification), the unidentified object may be assigned with a new identity when it reappears in Frame #5, which may be undesirable.

Returning now to Figure 1, the computing apparatus 210 may perform context aware object identification and re-identification 140 on the sequence of images.

The computing apparatus 210 may determine that an unidentified second object or present in the second subsequence of images from the sequence of images is to be identified. This determination may be, for instance, based on the object tracking 120. In some examples, the object tracking 120 may include determining trajectories for the first objects. Determining that the second object is to be identified, that is the identity corresponding to the second object is not known, may include determining that the location of the second object does not correlate to any of the trajectories of the first objects. One such example of this may be where a person X walks towards a person Y and hugs them and then they both walk in a different direction. in this case, the moment person X and person Y hug the tracking may be lost as there are two people (or objects) in the same location, then person X walks in a different direction to that which he was previously walking, and therefore the trajectory prediction may not be able to be used thereby requiring re-identification to happen.

-23 -Additionally or alternatively, determining that the second object is to be identified may include determining that there are a plurality of second objects that correlate to the same trajectory of a first object. One such example of this occurring maybe where person X and person Y hug, and then continue walking together in the same direction as X was initially walking. In this case, the trajectory may be preserved, but two people (or objects) may be walking in the same direction, and therefore both may need to be re-identified.

Additionally or alternatively, determining that the second object is to be identified may include determining that a first object travelling along a trajectory was not visible for a threshold number of images. In this case, it cannot be certain that a second object present on the trajectory of the first object is the same object. One such example of this occurring may be where person X is walking, goes behind an obstacle so that they are not visible to the image capturing device, and reappears a few seconds later. In this example, even if person Xis following the same trajectory, it maybe determined that too much time has passed to assume the tracking is still valid, and that therefore the person may need to be re-identified.

In other words, determining that the second object is to be identified may be as a result of being unable to determine that the second object is one of the first objects based only (or at least partly) on the object tracking 120 of the first objects. In examples in which the first and second subsequence are captured by different image capture devices 240, since it may not be possible, or is at least more complex, to track the objects between or image sequences captured by different image capture devices 240, it may be that it is determined that many or all of the of the objects in the second subsequence require identification.

The computing apparatus 210 may, in response to determining that one or more second object is to be identified, determine an identity corresponding to the second object. The determination of the identity corresponding to the second object may be based, at least in part, on contextual information. The contextual information may include, for instance, the one or more identities determined to correspond to the one or more first objects.

-24 -Determining the identity corresponding to the second object may include making a first comparison between one or more images of the second object from the second subsequence and images of the one or more first objects from the first subsequence. The images of the first and/or second objects may be images from the sequence of images in which the first and/or second objects are present. Additionally or alternatively, the images of the tracked objects may be extracted images from the sequence of images, as described above. Comparing the images of the first object(s) and the images of the second object(s) may include generating and comparing features of the respective images. The features maybe, for instance, feature vectors, and may be, for instance, generated as described above. Any suitable comparison approach may be used, with examples including but not limited to a calculation of the Euclidian distance between two respective features (e.g. feature vectors), and a calculation of the Cosine similarity between two respective features.

Determining the identity corresponding to the second object may additionally include making a second comparison between the images of the second object and the definitions in the database 230. The images of the second objects may be images of the second subsequence in which the second objects are present. Additionally or alternatively, the images of the second objects may be extracted images from the second subsequence of images, as described above. When the definitions are images, features of the images of the second object and the definitions may be generated and compared. When the definitions are features (e.g. feature vectors), features of the images of the second objects can be generated and compared to the definitions. The first and second comparisons may be performed using the same technique, such as one or more of those or described above.

In some examples, there may be an image of a given unidentified object from the second subsequence for one, some or all of the images in the second subsequence. For instance, if the second subsequence includes twenty images, one unidentified object may be present in just a single image in the second subsequence and another object may be present in twenty images. In such examples, in one possible implementation of determining the identity of unidentified objects, the three closest definitions for each object may be found. Clearly, for the object present in only a single image, this may involve a comparison of only one image versus all the definitions in the database. For the object present in all twenty images, this may involve 20 times the number of -25 -comparisons. In some examples, it maybe decided to accumulate a predetermined threshold number of images of an object before a comparison is made.

In some examples, multiple identities in the database may each have the same number of definitions. In other examples, one or more identities in the database may have a different number of definitions to other identities in the database. For instance, there may be ten definitions for one identity and one hundred for another. In some implementations of determining the identity of unidentified objects, the five definitions that are most similar to image(s) of the unidentified object may be found out of the jo total one hundred and ten definitions. The corresponding identities to these five definitions may then be found in order to assign the final identity to the unidentified object.

In some examples, even if different identities in the database have a different number of definitions, the same number of definitions may be used per identity when identifying objects. For instance, where only a subset of the available definitions are used, these maybe randomly selected or selected based on some other metric such as image quality, noise etc. Determining the identity corresponding to the second object may be based on the first and second comparisons. In some examples, the first comparison may include comparing each image (or a subset of images) of the second object in the second subsequence of images with each image (or a subset of images) of each of the first objects. The second comparison may include comparing each image (or a subset of or images) of the second object in the second subsequence of images with one, some or all definitions corresponding to each identity in the database 230 (or a particular subset of identities in the database 230). Where a subset of identities in the database 230 is used, the same subset may be used for identifying the first objects in the first subsequence. In some examples, the second comparison may include only comparing images of the second object with images of first objects, which are not tracked in the second subsequence. That is, if a particular first object, to which an identity has been assigned, is tracked from the first subsequence to the second subsequence, the system may assume that a second object, appearing in the second subsequence and to which an identity has not yet been assigned, does not have the same identity as the particular first object.

-26 -The result of the first comparison can be arranged as a first similarity matrix, wherein the images of the second objects are one axis, and the images of the first objects are the other axis, and measurements of similarity between a respective image of a second object and a respective image of a second object fl] the matrix. The result of the second comparison can be arranged as a second similarity matrix, wherein the images of the second objects are one axis, and the definitions of each identity are the other axis, and measurements of similarity between a respective image of a second object and a respective definition of an identity fill the matrix. Any suitable measurement of similarity may be used, for instance the measurement may be one that increases with a jo higher degree of similarity or increases with a higher degree of difference.

An example of a first similarity matrix and a second similarity matrix are illustrated below, in Table 1 and Table 2 respectively: Image 1 of tracked object 1 Image 2 of tracked object I Image 3 of tracked object I Image 1 of tracked object 2 Image 2 of tracked object 2 Image 2 of tracked object 2 Image 1 of unidentified object 1 320 410 420 600 650 400 Image 2 Of unidentified object 1 450 460 520 780 760 640 Image 1 of unidentified object 2 460 840 890 350 400 410 Image 2 of unidentified object 2 650 450 500 460 420 430

7.5 Table 1

Definitio n 1 of identity 4 Definitio n 2 of identity 4 Definitio n 3 of identity 4 Definitio n 1 of identity 7 Definitio n 2 of identity 7 Definitio n 3 of identity 7 Image 1 of unidentifie d object 1 400 420 370 380 720 530 Image 2 of unidentifie d object' 370 401 420 580 605 610 Image 1 of 600 620 600 210 390 260 unidentifie d object 2 -27 -Image 2 of unidentifie d object 2 580 890 6to 290 270 320

Table 2

Determining the identity corresponding to the second object may be based on the first and second comparison. For instance, continuing the examples above, the first similarity matrix and the second similarity matrix may be concatenated. Additionally or alternatively, determining the identity corresponding to the second object may be weighted towards the identities determined to correspond to the first objects. For instance, the measures of similarity in the first similarity matrix may be modified such that the modified measures of similarity indicate a higher degree of similarity between respective images of the second object and respective images of the first object. For instance, when the measures of similarities are distances between respective features, the measures of similarities of the first similarity matrix may each be multiplied by a factor, which may for instance be greater than zero and less than or equal to one. Additionally or alternatively, the measures of similarity in the second matrix may be modified to indicate a lower degree of similarity between respective images of the second object and respective definitions. In this way, the computing apparatus 210 may be more likely to determine that the second object is one of the first objects.

Determining the identity corresponding to the second object based on the first and second comparison may include identifying which of the identities in the database 230 is most similar to the images of the second object. For instance, for each image of the second object, the measurement of similarity which indicates the most similar image of a first object or definition is identified. Alternatively, a plurality of measurements of similarity which indicate the most similar images of first objects and/or definitions may be identified. The one or more images of first objects and/or definitions corresponding to the identified measurements of similarity may be ascertained. The identity corresponding to the second object may be selected based on the identity corresponding to the identified one or more images of first objects and/or definitions.

First objects may be tracked through both the first subsequence and second subsequence. In this case, since the identity of the first objects has already been determined, the identities of the first objects tracked through both the first subsequence and second subsequence may not be considered when determining the identity of second objects. In this way, if an object associated with an identity is present -28 -throughout the sequence of images, and another object resembles that object, erroneously assigning this same identity with the other object may be avoided. Further, the number of comparisons performed to determine the identity of the second object can be reduced. In some examples, if each of the identities of first objects is accounted for in the second subsequence of images, the first comparison may be omitted in determining the identity of second objects.

In some examples, the identity corresponding to the second object may be selected based on majority voting.

One implementation of majority voting is as follows. For a given image of a second object, one or more measurements of similarity which indicate that one or more definitions and images of identified first objects is one of, if not the, most similar to the image of the second object may be ascertained. For instance, the smallest one, two or three distances may be identified. The respective identity or identities corresponding to the ascertained definitions and images of identified first objects may be determined, and in response, one or more of the respective identities may be given a "vote". For instance, where the three smallest distances are identified, and at least two of the three smallest distances relate to the same identity, a "vote" may be cast in favour of this identity. This may then be repeated for each image of a second object. After a "vote" has been given for each image of the tracked object, the identity with the most "votes" may be determined to correspond to the tracked object. In the result of a tie, that is an equal or substantially equal amount of "votes" for multiple identities with the most "votes", one of these identities may be determined to correspond with a tracked object based on or the identity having a corresponding definition which is more similar to an image of the tracked object than any definition corresponding to any other identity is to any of the images of the tracked object.

An example of such an implementation is described below in relation to Table 3, which illustrates a matrix created by concatenating an example first similarity matrix and an example second similarity matrix. Note that the distances between images of unidentified objects and images from the first subsequence of images with known identities are multiplied by a factor 0.8 to give a higher weight to previously seen identities.

-29 -Images from the first subsequence of images with known identities Definitions Database Image 1 Image 2 Def 1 of Def 2 of Def 1 of Def 2 of Def 3 of of object of object 0b4 0b4 0b7 0b7 0b7 4 4 Image 1 of 0.8 600 0.8 * 400 420 380 720 530 unidentified object 1 600 Image 2 of 0.8 900 0.8 * 390 401 580 605 610 unidentified object 1 600 Image 3 of 0.8 -400 0.8 * 401 372 580 600 600 unidentified object 1 = 320 900 Image 4 of unidentified object 1 0.8 x 9oo o.8 ' 9oo 620 620 600 600 600 Image 1 of 0.8 900 0.8 * 600 620 310 320 300 unidentified object 2 900 Image 2 of 0.8 900 0.8 * go 590 305 314 370 unidentified object 2 900

Table 3

In Table 3, for "Image 1 of unidentified object 1", the three smallest distances may be identified as those corresponding to "Definition 1 of object 7", "Definition iof object 4", and "Definition 2 of object 4" respectively. As such, since the majority of smallest distances correspond to identity four, in particular two out of the three, a "vote" in favour of identity four may be cast in respect of "Image 1 of unidentified object 1". Similarly, for "Image 2 of unidentified object 1", the three smallest distances may be those corresponding to "Definition iof object 4", "Definition 2 of object 4", and "image Jo 2 of object 4", and a vote in favour of identity four may be cast in respect of "Image 2 of unidentified object 1". This process may be repeated for "Image 3 of unidentified object 1", resulting in a vote for identity four, and "Image 4 of unidentified object 1", resulting in a vote for identity 7. As such, for "unidentified object 1" there may be three votes in favour of identity four, and one vote in favour of identity 7. In this case, since the majority of the votes are for identity four, "unidentified object 1" may be determined to correspond to identity four.

-30 -This process may then be repeated for "unidentified object 2", resulting in a "vote" in favour of identity seven for both images of this object, meaning that this object may be determined to correspond to identity seven.

Referring back to Figure 4, which shows an example of a sequence of images 400 including an object 42oA to 420H. If it is taken that object identification has been performed on the sequence of images 400 by comparing the images 410A to 410Y1 against definitions in the database 230, as described above, majority voting may result in identity "4" being selected for the object 420A to 420H in the images 41oA to 410H.

However, if this object 42oA to 42oH was seen in the first subsequence of images and was identified as identity "7" with high confidence, then it may be expected to have the identity "7" assigned for the new eight images 410A to 41oH from the second subsequence of images V, even though in these specific images 4113A to 41oH the object 420A to 420H was identified as #7 only in three images 410C, 410F, 410G.

Implementations may follow this logic and assign heavier weight to decisions done in previous images. Note that past decisions may not be solely relied upon as they might have been wrong. Another reason for this is the first-time identification that may be made for objects that appear in the sequence of images for the first time. These objects may be assigned with an identity based on a larger list of available identities than those previously assigned to objects since the unidentified object may not be one of the previously identified objects.

or Returning now to Figure 1, another example implementation of majority voting is as follows. For a given second object, across all or at least some of the images of the second object, one or more measurements of similarity which indicate that one or more definitions and images of identified first objects is one of, if not the, most similar to respective images of the second object may be ascertained. For instance, the smallest one, two or three distances may be identified. The identity corresponding to the ascertained definitions and images of identified objects is then determined, and a "vote" is cast in favour of the determined identity. The identity with the most "votes" is then determined to correspond with the second object. This may then be repeated for the remaining unidentified second objects.

-31 -As an example, referring to table 3, for unidentified object one, the three smallest distances are between "Image 1 of unidentified object 1" and "Definition 1 of object 7", "Image 3 of unidentified object 1" and "image 1 of object 4", and "Image 3 of unidentified object 1" and "Definition 2 of object 4". As such, for unidentified object one, there may be one "vote" for identity seven and two "votes" for identity four. This may mean that the majority of the votes are for identity four, and unidentified object two may therefore be determined to correspond to identity four.

Similarly, for unidentified person two, the smallest three distances are between "Image /0 i of unidentified object 2" and "Definition 1 of object 7", "Image 2 of unidentified object 2" and "Definition 1 of object 7", and "image 2 of unidentified object 2" and "Definition 2 of object 7". As such, for unidentified object two, there may be three "votes" for identity seven. This may mean that the majority of the votes are for identity seven, and unidentified object two may therefore be determined to correspond to identity seven.

Note that distances for "Image 2 of unidentified object 1" and "Image 4 of unidentified object 1" are not considered in this example, as they resulted in higher distances. Hence, an identity for an unidentified object may not be determined for every image of that unidentified object, but rather the identity of an unidentified object in images not used may be inferred from the rest of the images.

Such implementations of majority voting may also be used to initially identify the objects in the first subsequence of images, as mentioned above. However, the skilled person would understand that, since there may be no previous images in which objects or have been identified, comparisons between images of unidentified objects against previous images of these objects (i.e. those comparisons shown in Table 1) may not be available. As such, the skilled person would understand how to modify the implementations discussed above (such as by only using comparisons with definitions in the database as illustrated in Table 2) in order to initially identify the objects in the first subsequence of images.

In other examples not using majority voting, the determination of which definition is most similar to the images of a tracked object may instead be based on the average or total measurement of similarity for each definition. In this case, for a given identity corresponding to one or more definitions and/or images of a first object, the respective measurements of similarity are averaged or summed over each image of the second -3 2 -object. The identity having the averaged or summed measurement of similarity which indicates the greatest similarity is then determined to correspond to the second object.

Continuing with the above examples, where there are multiple second objects to be identified or re-identified, this process may repeat for each of the second objects. In some examples, object identification may include a greedy algorithm, as described above.

A specific example of a method of determining the identities which corresponds to the jo second objects is described below with reference to the following pseudocode: * Let 1 = {id,, idk} be a set of identities * Let G = limgi, img,, I be a database of definitions, with at least 1 definition per identity I is * Let W be the frames of context, or in other words the first subset of frames, with detected identities 1", c J. * Let G = {img-"", img,"",} be images of objects in W (all with known identities) * Let V be a track (second subsequence of images) in which objects x" xp are unidentified (V fl W = 0).

* Let G = {img",, img,"-} be images of objects x,, , xp in V 1. Calculate similarity matrix A of G,", and G 2. Calculate similarity matrix B of C and G" 3. Compute C = a * B, where o < a < 1 4. Find most similar images GAG to xp based on X and A combined together 5. Use majority voting over the identities of images in GAG to compute identities for xp (e.g. by using a greedy approach, i.e., first choosing the best match between an identity and a person xi, removing that identity from the list, and continuing until all x" xp have a unique identity) Steps iand 2 of the pseudocode may be performed using the "Torchreid" library; as described for instance in "Torchreid: A Library for Deep Learning Person Re-Identification in Pytorch" by Zhou, Kaiyang and Xiang, Tao, 2019. This library uses a machine learning model to extract feature vectors representing each image provided to it as an input.

-33 -Features of respective images (e.g. feature vectors) may be compared to each other via a distance metric such as cosine similarity or Euclidian distance to calculate how similar two images are to each other.

While computing matrix A in step 1, the G, images correspond to the rows of matrix A (such that the number of rows equals the size of group G.,) and the G definitions correspond to columns of matrix A (such that the number of columns equals the size of group G). Each cell in A may contain the distance between the corresponding images jo and definitions. Table 2 as shown above illustrates an example of a matrix A, in which there are two unidentified objects, there are two images of the unidentified objects, and there are a total of six definitions corresponding to two identities. In Table 2, unidentified object "1" is "closest" to identity "4", while unidentified object "2" is "closest" to identity "7").

Similarly, while computing matrix B in step 2, the Gx, images correspond to the rows of matrix B (such that the number of rows equals the size of group G",) and the Gw images correspond to columns of matrix B (such that the number of columns equals the size of group Gw). Each cell in B may contain the distance between the corresponding images.

Table 1 as shown above illustrates an example of a matrix B, in which there are two unidentified objects, there are two images of the unidentified objects, and there are a total of six images corresponding to two previously identified objects. In Table 1, unidentified object "1" is "closest" to previously identified object "1", while unidentified object "2" is "closest" to previously identified object "2"). In this example, previously or identified object "1" was previously identified as identity "4", and previously identified object "2" was previously identified as identity "7".

In step 3 the parameter "a" is used to balance between identities determined in images W (seen previously in the same video) and images V. If "a" is set to, for instance, 0.8 then the distance value in the relevant columns of matrix B may be multiplied by 0.8.

This means that there is a better chance to match an unidentified object to an identity from W since the distances are effectively reduced. However, since new objects may enter the scene at any point in time, new identities may also be assigned, and therefore in step 4 a comparison is made with the combined matrix (by means of concatenation -having more columns, but the same rows).

-34 -Steps 4-5 may be achieved using multiple techniques. One such technique is a greedy algorithm that looks for the closest images to the first object, assigns the identity and moves to the next object. Taking for instance the combined matrix as a combined matrix of Table l and Table 2, the closest identity to the first unidentified object is identity "4". Note that for some definitions and images, the better match is actually identity "7". However, since we are using a majority voting, identity "4" is the assigned identity. The greedy algorithm then moves to the second unidentified object. Other algorithms such as minimum weight perfect matching in graphs can be used to assign identities based on this matrix.

The majority voting assigns for an unidentified object the identity that was the most common among all the entries in the similarity matrix. In, for instance, Table 2, for the first image of the first unidentified person, the best matches were identity "4", identity "4", identity "7", while for second image of the first unidentified person, the best matches were identity "4", identity "4", identity "4". The majority voting over all these matches gives the final result of identity "4". Ties may be resolved based on the match with the smallest distance.

Implementations can be used for any object re-identification task to be accomplished based on sequence of image feeds. For instance, implementations can be used to detect crowded areas, search for a specific person of interest and follow that person across video feeds. Implementations can also be used to count the number of people that pass a camera.

As shown in Figure 1, the identities determined to correspond to the first objects and/or the second objects may be output.

In some examples, the determined identities may be output to a second database. The second database may include one or more entries that identify a particular sequence of images and one or more identities determined to be present in a particular sequence of images. The second database may also include one or more entries that identify a particular image from a particular sequence of images, a bounding box containing an object in a particular image and/or an extracted image containing an object from a particular image, and an identity corresponding to the object in the bounding box and/or the extracted image. The second database may be stored on at least one of the -35 -server apparatus 220, the computing apparatus 210, and a computing apparatus (not shown) separate from the server apparatus 220 and the computing apparatus 210.

As shown in Figure 1, in some examples, the sequence of images may be annotated and/or modified to include an indication of the identity 151A to 151C determined to correspond to one or more of the first objects and/or second objects. The sequence of images may be annotated and/or modified to indicate a position of the objects, for instance, with a bounding box 152A to 152C. The annotated sequence of images 150 may be provided as output.

Figures 5,6 and 7 are flow diagrams showing operations that may be performed in example embodiments. It will be appreciated that variations are possible, for example to add, remove and/or replace certain operations. The operations may be performed in hardware, software or a combination thereof. The processing operations may be performed, for example, by the computing apparatus 210.

As shown in Figures, operation 5io may comprise receiving a sequence of images. Operation 520 may comprise performing object tracking on the sequence of images.

Operation 530 may comprise determining a respective identity, from among a plurality of predefined identities, corresponding to one or more first objects that are tracked in a first subsequence of images of the sequence of images.

or Operation 540 may comprise determining, based on the object tracking, that a second object that is present in a second subsequence of one or more images of the sequence of images is to be identified, wherein the first subsequence precedes the second subsequence in the sequence of images.

Operation 550 may comprise determining an identity, from among the plurality of identities, corresponding to the second object based at least in part on one or more of the identities determined to correspond to the one or more first objects.

With reference to Figure 6, operation 550 may comprise operations 610 to 640.

-3 6 -Operation 610 may comprise accessing a database 230 comprising mappings between the plurality of predefined identities and at least one corresponding definition.

Operation 620 may comprise making a first comparison between one or more images of 5 the second object from the second subsequence and images of the one or more first objects from the first subsequence.

Operation 630 may comprise making a second comparison between the one or image of the second object and the definitions in the database 230.

Operation 640 may comprise determining the identity corresponding to the second object based on the first and second comparisons.

With reference to Figure 7, operation 640 may comprise operations 710 to 740.

Operation 710 may comprise determining a similarity between the one or more images of the second object and the images of the one or more first objects and the definitions in the database 230 respectively based on the first and second comparisons respectively.

Operation 720 may comprise modifying the determined similarities to weight the determination of the identity corresponding to the second object towards the identities determined to correspond to the one or more first objects.

-0 or Operation 730 may comprise determining which of the images of the one or more first objects and the definitions in the database 230 is most similar to the one or more images of the second object.

Operation 740 may comprise determining which identity corresponds to the second object based on the images of the one or more first objects and the definitions in the database 230 determined to be most similar to the one or more images of the second object.

Figure 8 shows an example apparatus that may provide the computing apparatus 210 35 or an equivalent system.

-37 -The apparatus comprises at least one processor 800 and at least one memory 820 directly or closely connected or coupled to the processor. The memory 820 may comprise at least one random access memory (RAM) 822A and at least one read-only memory (ROM) 822B. Computer program code (software) 825 may be stored in the ROM 822B. The apparatus may be connected to a transmitter path and a receiver path in order to obtain respective signals comprising the aforementioned data. The apparatus may be connected with a user interface (CI) for instructing the apparatus and/or for outputting data. The at least one processor 800 with the at least one memory 820 and the computer program code may be arranged to cause the apparatus to at least perform methods described herein.

The processor Soo may be a microprocessor, plural microprocessors, a control, or plural microcontrollers.

The memory 820 may take any suitable form.

The transmitter path and receiver path between any of the described apparatus may be established using a transceiver module which may be arranged suitable for any form of radio communications, for example cellular radio communications according to 2G, 3G, 4C, 5G or future-generation standards.

Figure 9 shows a non-transitory media 900 according to some embodiments. The non-transitory media 900 is a computer readable storage medium. It may be e.g. a CD, a DVD, a USB stick, a blue ray disk, etc. The non-transitory media 900 stores computer or program code causing an apparatus to perform operations described above when executed by a processor such as processor 800 of FIGS. Any mentioned apparatus and/or other features of particular mentioned apparatus may be provided by apparatus arranged such that they become configured to carry out the desired operations only when enabled, e.g. switched on, or the like. In such cases, they may not necessarily have the appropriate software loaded into the active memory in the non-enabled (e.g. switched off state) and only load the appropriate software in the enabled (e.g. on state). The apparatus may comprise hardware circuitry and/or firmware. The apparatus may comprise software loaded onto memory. Such software/computer programs may be recorded on the same -38 -memory/processor/functional units and/or on one or more memories/processors/ functional units.

In some examples, a particular mentioned apparatus may be pre-programmed with the appropriate software to carry out desired operations, and wherein the appropriate software can be enabled for use by a user downloading a "key", for example, to unlock/enable the software and its associated functionality. Advantages associated with such examples can include a reduced requirement to download data when further functionality is required for a device, and this can be useful in examples where a device jo is perceived to have sufficient capacity to store such pre-programmed software for functionality that may not be enabled by a user.

Any mentioned apparatus/circuitry/elements/processor may have other functions in addition to the mentioned functions, and that these functions may be performed by the same apparatus/circuitry/elements/processor. One or more disclosed aspects may encompass the electronic distribution of associated computer programs and computer programs (which may be source/transport encoded) recorded on an appropriate carrier (e.g. memory, signal).

Any "computer" described herein can comprise a collection of one or more individual processors/processing elements that may or may not be located on the same circuit board, or the same region/position of a circuit board or even the same device. In some examples one or more of any mentioned processors may be distributed over a plurality of devices. The same or different processor/processing elements may perform one or or more functions described herein.

The term "signalling" may refer to one or more signals transmitted as a series of transmitted and/or received electrical/optical signals. The series of signals may comprise one, two, three, four or even more individual signal components or distinct signals to make up said signalling. Some or all of these individual signals may be transmitted/received by wireless or wired communication simultaneously, in sequence, and/or such that they temporally overlap one another.

With reference to any discussion of any mentioned computer and/or processor and 35 memory (e.g. including ROM, CD-ROM, etc.), these may comprise a computer processor, Application Specific Integrated Circuit (ASIC), field-programmable gate -39 -array (FPGA), and/or other hardware components that have been programmed in such a way to carry out the inventive function.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole, in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant it) indicates that the disclosed aspects/examples may consist of any such individual feature or combination of features. in view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the disclosure.

While there have been shown and described and pointed out fundamental novel features as applied to examples thereof, it will be understood that various omissions and substitutions and changes in the form and details of the devices and methods described may be made by those skilled in the art without departing from the scope of the disclosure. For example, it is expressly intended that all combinations of those elements and/or method steps which perform substantially the same function in substantially the same way to achieve the same results are within the scope of the disclosure. Moreover, it should be recognized that structures and/or elements and/or method steps shown and/or described in connection with any disclosed form or examples may be incorporated in any other disclosed or described or suggested form or or example as a general matter of design choice. Furthermore, in the claims means-plusfunction clauses are intended to cover the structures described herein as performing the recited function and not only structural equivalents, but also equivalent structures.

Claims

-40 -Claims 1. Apparatus comprising means configured to perform: receiving a sequence of images; performing object tracking on the sequence of images; determining a respective identity, from among a plurality of predefined identities, corresponding to one or more first objects that are tracked in a first subsequence of images of the sequence of images; determining, based on the object tracking, that a second object that is present in /o a second subsequence of one or more images of the sequence of images is to be identified, wherein the first subsequence precedes the second subsequence in the sequence of images; responsive to determining that the second object is to be identified, determining an identity, from among the plurality of predefined identities, corresponding to the second object based on (i) contextual information derived from the first subsequence of images, and (ii) the plurality of predefined identities, wherein the contextual information comprises one or more of the identities determined to correspond to the one or more first objects.
2. The apparatus of claim 1, wherein determining the identity corresponding to the second object based on the contextual information derived from the first subsequence of images comprises: making a first comparison between one or more images of the second object from the second subsequence and images of the one or more first objects from the first 25 subsequence, and determining the identity corresponding to the second object based on the first comparison.
3. The apparatus of claim 2, wherein the determining the identity corresponding to the second object comprises: making a second comparison between the one or more image of the second object and definitions in a database, wherein the database comprises mappings between the plurality of predefined identities and at least one corresponding definition, wherein the at least one corresponding definition comprises at least one of: (i) an image including an object corresponding to the respective identity, and (ii) features generated from an image including an object corresponding to the respective identity; and -41 -determining the identity corresponding to the second object based on the first and second comparisons.
4- The apparatus of claim 3, wherein the first comparison comprises: determining a similarity between the one or more images of the second object and the images of the one or more first objects, and the second comparison comprises determining a similarity between the one or more images of the second object and the definitions in the database, and wherein determining the identity corresponding to the second object based on /o the first and second comparisons comprises, based on the determined similarities, determining which identity corresponds to the second object.
5. The apparatus of claim 4, wherein determining which identity corresponds to the second object based on the determined similarities comprises determining which of the images of the one or more first objects and the definitions in the database is most similar to the one or more images of the second object.
6. The apparatus of claim 4 or claim 5, wherein determining which identity corresponds to the second object based on the determined similarities comprises majority voting.
7. The apparatus of any one of claims 3 to 6, wherein determining the identity corresponding to the second object is weighted towards the identities determined to correspond to the one or more first objects.
8. The apparatus of any one of claims 3 to 7, wherein making the first comparison comprises generating a first similarity matrix and making the second comparison comprises generating a second similarity matrix, wherein the first similarity matrix comprises a measure of similarity between the one or more images of the second object and the images of the one or more first objects, and the second similarity matrix comprises a measure of similarity between the one or more images of the second object and the definitions.
9. The apparatus of claim 8, wherein measurements of similarity are calculated 35 using one of a Euclidean distance and a cosine similarity between features generated from respective images. -42 -
10. The apparatus of claim 9, wherein the features are output from a machine learning model.in The apparatus of any one of claims 8 to 10, wherein measurements of similarity in the first similarity matrix are modified such that the modified measurements of similarity indicate a higher degree of similarity between the respective image of the second object and the respective image of the first object than the corresponding unmodified measure of similarity.12. The apparatus of any one of the preceding claims, wherein the means are further configured to perform: determining, based on the object tracking, that a plurality of second objects that are present in the second subsequence of one or more images of the sequence of images is to be identified; and responsive to determining that a plurality of second objects are to be identified, determining an identity, from among the plurality of identities, corresponding to the second objects based at least in part on one or more of the identities determined to correspond to objects of the first plurality of objects.13. The apparatus of claim 12, wherein determining the identity corresponding to the second objects comprises one of a greedy approach, and minimum weight perfect matching.or 14. The apparatus of any one of the preceding claims, wherein: the objects are humans, and the plurality of predefined identities correspond to predefined identities of individual humans; and/or the second object is one of the one or more first objects.15. The apparatus of any one of the preceding claims, wherein the second subsequence of images comprises a plurality of images from the sequence of images, and the second object is tracked in the second subsequence of images.16. The apparatus of claim 15, wherein at least one of the one or more first objects is tracked in the second subsequence of images.17. The apparatus of any one of the preceding claims, wherein the means are further configured to perform: outputting the identities determined to correspond to the one or more first objects and the second object.18. The apparatus of any one of the preceding claims, wherein the sequence of images are captured by one or more image capture devices, and wherein the one or more image capture devices capture a field of view of an environment.19. The apparatus of any one of the preceding claims, wherein performing object tracking on the sequence of images comprises determining trajectories for the one or more first objects, and wherein determining, based on the object tracking, that the second object is to be identified comprises determining that a position of the second object does not correlate sufficiently to the trajectories of the one or more first objects.20. Method comprising: receiving a sequence of images; performing object tracking on the sequence of images; determining a respective identity, from among a plurality of predefined identities, corresponding to one or more first objects that are tracked in a first subsequence of images of the sequence of images; determining, based on the object tracking, that a second object that is present in a second subsequence of one or more images of the sequence of images is to be or identified, wherein the first subsequence precedes the second subsequence in the sequence of images; responsive to determining that the second object is to be identified, determining an identity, from among the plurality of predefined identities, corresponding to the second object based on (i) contextual information derived from the first subsequence of images, and (ii) the plurality of predefined identities, wherein the contextual information comprises one or more of the identities determined to correspond to the one or more first objects.21. A computer program product comprising a set of instructions which, when 35 executed on an apparatus, cause the apparatus to carry out the method of claim 20.