GB2608975A

GB2608975A - Person identification across multiple captured images

Info

Publication number: GB2608975A
Application number: GB2216206.9A
Authority: GB
Inventors: Ghafoor Razwan; Rennert Peter; Moriceau Hichame
Original assignee: Standard Cognition Corp
Current assignee: Standard Cognition Corp
Priority date: 2017-09-25
Filing date: 2017-11-23
Publication date: 2023-01-18
Anticipated expiration: 2037-11-23
Also published as: GB202216206D0; GB2566762B; GB201715509D0; GB2608975B; GB2566762A; GB201719460D0

Abstract

A method of identifying a person across multiple images captured by one or more image-capturing devices, comprising: analysing an initial set of one or more images depicting the person using a first computational neural network that identifies for each analysed image regions of the image each depicting a corresponding body part of the person; analysing the image regions identified from the first computational neural network to extract a set of one or more features associated with the body parts of the person; associating in a database the extracted set of features with an ID for the person; analysing a further image to extract a set of one or more features associated with body parts of a person within the further image; and determining if the extracted set of features from the further image match those associated with the ID for the person stored in the database to within a specified confidence level and, if so, determining that the person with that ID is depicted in the further image, wherein the initial set of one or more images and the further image are captured by a plurality of image capturing devices that are spatially distributed within an environment including a town or city.

Description

PERSON IDENTIFICATION ACROSS MULTIPLE CAPTURED IMAGES

s FIELD

This invention relates to the identification of a person across multiple captured images.

BACKGROUND

Computer vision is a disciplinary field that addresses how computational methods can be employed to gain an understanding of information depicted in images, or videos. To do this, image data (e.g. in the form of single images, image sequences forming a video stream, images captured by a camera etc.) can be processed and analysed to extract information and/or data that can be used to make certain conclusions on what is depicted by those images.

A typical task in computer vision is to identify specific objects in an image and to determine the object's position and/or orientation relative to some coordinate system.

Humans are examples of such objects, with the ability to detect humans within images being desirable for a number of applications, for example in the fields of surveillance and video labelling.

Person re-identification refers to the task of identifying a person across multiple images that may be captured by multiple cameras potentially having non-overlapping fields of view. The images may also span an indeterminate temporal period. Generally speaking, the aim of person re-identification is to use a given set of one or more images that depict a person to identify that person within other images captured by different cameras and/or within images separated in time.

One approach to performing person re-identification is to perform facial detection on images that depict the person, and then perform facial recognition in subsequent images to identify whether the detected faces match those of the person to be identified. However, a drawback with this approach is that facial recognition does not provide a very robust mechanism by which to identify a person. For example, in some implementations (e.g., surveillance systems), the people within the images may be relatively far from the camera that performs the image capture, making it difficult to capture sufficient details of the people's faces to perform facial matching for the purposes of identification. Another problem is that the success of facial recognition is highly dependent on the people within the captured images being suitably oriented with respect to the camera's field of view. For example, if in a subsequent image a person's face is captured only from a side-on view, a facial recognition algorithm may determine that the face does not match that of the person to be identified, even if it is in fact the same face.

SUMMARY

According to the present invention there is provided a method of identifying a person across multiple images captured by one or more image-capturing devices, comprising: analysing an initial set of one or more images depicting the person using a first computational neural network that identifies for each analysed image regions of the image each depicting a corresponding body part of the person; analysing the image regions identified from the first computational neural 20 network to extract a set of one or more features associated with the body parts of the person; associating in a database the extracted set of features with an ID for the person; analysing a further image to extract a set of one or more features associated with body pads of a person within the further image; and determining if the extracted set of features from the further image match those associated with the ID for the person stored in the database to within a specified confidence level and, if so, determining that the person with that ID is depicted in the further image.

Each region of the image identified from the first computational neural network may be analysed using a second computational neural network to extract the set of features associated with the body parts of the person.

The initial set of one or more images depicting the person may be captured by a first set of one or more image capturing devices, and the further image is captured from a further image capturing device not forming part of the first set.

s The further image capturing device may have a non-overlapping field of view with the field of view of the image capturing devices of the first set.

The initial set of one or more images may comprise multiple images.

The first set of one or more image capturing devices may comprises multiple image capturing devices.

The first computational neural network may identify for each analysed image a set of vector fields each encoding the location and orientation of a respective body part of the person within the image.

Each vector field may identify a region of the image depicting the respective body part of the person.

The method may comprise analysing each image region identified from the first computational neural network to extract a set of features each associated with a respective body part of the person.

The extracted set of features may form a descriptor of the person.

The method may comprise analysing each image region identified from the first computational neural network using a second computational neural network to generate for each body part a classification dataset identifying a feature for the body part.

The method may further comprise generating a characterising feature ID characterising the extracted features of the person from the extracted set of features, and storing the feature characterising ID in the database with the ID for the person.

The step of analysing the further image may comprise using a third computational neural network pre-trained to identify a set of features comprising the extracted features.

The method may further comprise generating a feature characterising ID of the person within the further image from the extracted features associated with body parts of that person, and comparing that feature characterising ID with the feature characterising ID stored in the database associated with the person ID.

to The step of analysing the image regions identified from the first computational neural network to extract a set of one or more features associated with the body parts of the person may comprise: identifying a transform of image data for the identified image regions within each of the initial set of images to a common region of a feature space; and for each of at least one image of the initial set of images, applying the identified transform to the identified image regions within the image to transform the identified image regions to a location within the common region of the feature space; and determining the extracted set of one or more features from the location(s) within the common region of the feature space.

The common region of feature space may be of a specified size.

The set of features may be determined from an average of the locations within the common region of the feature space.

The set of features may be determined from each of the location(s) within the common region of feature space.

The locations within the feature space may identify semantic features associated with 30 the body parts of the person.

The step of analysing the further image may comprise: identifying for the further image regions of the image depicting body parts of the person within the further image; applying the identified transform to the identified image regions within the further image to transform those identified regions to a location within the feature space.

The determining step may comprise: determining if the location within the feature space for the image regions within the further image matches a location associated with the ID for the person stored in the database to within a specified amount.

According to a second aspect of the present disclosure there is provided an apparatus for identifying a person across multiple images captured by one or more image-capturing devices, comprising: a first image analysis unit configured to analyse an initial set of one or more images depicting the person using a first computational neural network that identifies 15 for each analysed image regions of the image each depicting a corresponding body part of the person; a feature extraction unit configured to analyse the image regions identified from the first computational neural network to extract a set of one or more features associated with the body parts of the person; a storage unit configured to store in a database an ID for the person associated with the extracted set of features; a second image analysis unit configured to analyse a further image to extract a set of one or more features associated with body parts of a person within the further image; and a person identification unit configured to determine if the extracted set of features from the further image match those associated with the ID for the person stored in the database to within a specified confidence level and, if so, determining that the person with that ID is depicted in the further image.

The feature extraction unit may be configured to use a second computational neural network to analyse each image region identified from the first computational neural network to extract a set of features associated with the body parts of the person.

The initial set of one or more images depicting the person may be captured by a first set of one or more image capturing devices, and the further image may be captured from a further image capturing device not forming part of the first set.

The initial set of one or more images may comprise multiple images.

The first set of one or more image capturing devices may comprise multiple image capturing devices.

The first image analysis unit may be configured to implement the first computational neural network to identify for each analysed image a set of vector fields each encoding the location and orientation of a respective body part of the person within the image.

The feature extraction unit may be configured to analyse each image region identified from the first computational neural network to extract a set of features each associated with a respective body part of the person.

The feature extraction unit may be configured to use a second computational neural network to analyse each image region identified from the first computational neural network to generate for each body part a classification dataset identifying a feature for the body part.

The storage unit may be further configured to generate a feature characterising ID of 30 the person from the extracted set of features, and store the feature characterising ID in the database with the ID for the person.

The second image analysis unit may be configured to use a third computational neural network pre-trained identify a set of features comprising the extracted features.

The second image analysis unit may be configured to generate a feature characterising ID of the person within the further image from the extracted features associated with body parts of that person, and the person identification unit may be configured to compare that feature characterising ID with the feature characterising ID stored in the database associated with the person ID to determine if the extracted set of features match those associated with the ID for the person stored in the database to within a specified confidence level.

The feature extraction unit may be configured to: identify a transform of image data for the identified image regions within each of the initial set of images to a common region of a feature space; for each of at least one image of the initial set of images, apply the identified transform to the identified image regions within the image to transform the identified image regions to a location within the common region of the feature space; and determine the extracted set of one or more features from the location(s) within the common region of the feature space.

The common region of feature space may be of a specified size.

The feature extraction unit may be configured to determine the set of features from an average of the locations within the common region of the feature space.

The feature extraction unit may be configured to determine the set of features from each of the location(s) within the common region of feature space.

The locations within the feature space may identify semantic features associated with the body parts of the person.

The second image analysis unit may be configured to: identify for the further image regions of the image depicting body parts of the person within the further image; apply the identified transform to the identified image regions within the further image to transform those identified regions to a location within the feature space.

The person identification unit may be configured to: determine if the location within the feature space for the image regions within the further image matches a location associated with the ID for the person stored in the database to within a specified amount.

There may be provided computer program code that, when executed, causes an apparatus to perform any method described herein.

There may be provided a non-transitory computer-readable storage medium having stored thereon instructions for performing a method as described herein.

BRIEF DESCRIPTION OF FIGURES

The present invention will now be described by way of example with reference to the accompanying drawings. In the drawings: Figure 1 shows an image identification system.

Figure 2 shows an example image identification apparatus forming part of the system shown in figure 1.

Figure 3 shows a flowchart of steps for performing a method of tracking a person across multiple images.

Figure 4 shows a schematic illustration of a neural network for identifying regions of an image containing a body part.

Figure 5 shows a schematic illustration of how features are extracted from image regions containing body parts of a person depicted in the image.

Figure 6 shows a schematic illustration of a second approach for extracting features associated with body parts of a person.

Figure 7 shows a schematic illustration of the second approach determining whether 30 a set of features extracted from an image match those stored in a database.

DETAILED DESCRIPTION

The present disclosure describes an approach of identifying, or tracking, a person across multiple captured images. The multiple images may digital images captured by multiple image-capturing devices (e.g. cameras) spatially distributed within an environment. Each of the image capturing devices may have a different field of view; the fields of view may be non-overlapping. Alternatively, the multiple images may be captured by a single image-capturing device, but be separated in time; i.e. span a time interval. The images may be captured by different cameras and at different times. The general approach to identifying a person across multiple captured images across multiple cameras and/or time periods may be referred to person re-identification.

Figure 1 shows a schematic illustration of an identification system for performing person re-identification. The system 100 comprises an identification apparatus 102, a server 104, and a plurality of image-capturing devices 106Anc.

The image-capturing devices operate to capture digital images of an environment within their field of view. Each image-capturing device may have a unique field of view. The fields of view may partially overlap with each other. The fields of view may not overlap with each other. The devices could be cameras, for example. The devices may capture video data comprising a stream of images. The devices 106 may be spatially distributed within an environment, such as a building, a shop, or a larger environment such as a town or city. The image-capturing devices may be connected to the identification apparatus via wired and/or wireless communication links. The image capturing devices may be located locally or remotely of the identification apparatus 102.

The identification apparatus 102 operates to identify a person across multiple images captured by one or more of the image-capturing devices 106. A person identified in a first image can therefore be tracked across multiple images spanning multiple image capturing devices and/or temporal periods. To track the person across the multiple images, the apparatus 102 may access a database containing information of identified persons. The database may be stored within server 104. Server 104 may be remote of identification apparatus 102. The server 104 and identification apparatus 102 may be connected via wired and/or wireless communication links. In an alternative arrangement, the database may be stored internally of the apparatus 102, in which case there may be no need for server 104.

An example of the identification apparatus 102 is shown in more detail in figure 2. The apparatus 102 comprises a first image analysis unit 202; a feature extraction unit 204; and a storage unit 206 configured to store data in database 208. In this example, database 208 is stored on server 104 externally of the apparatus 102, however in other implementations database 208 may be stored within apparatus 102. The apparatus 102 further comprises a second image analysis unit 210, and a person identification unit 212.

The operation of the identification apparatus 102 to perform person re-identification will now be described with reference to the flow-chart in figure 3. In the following, the person to be re-identified will be referred to as person 'A' for clarity.

At step 302, an initial set of one or more images depicting person A are analysed using a first computational neural network that identifies, for each analysed image, regions of the image depicting a corresponding body part of the person.

In figure 2, a single initial image 214 is shown for the purposes of illustration. This image is captured by one of the image-capturing devices 206, e.g. device 206A. The initial set of images may be known to depict person A. However, this need not be the case.

The neural network is implemented by image analysis unit 202 using a model in which a person is formed of a discrete number of body parts. That number of body parts may be fixed for a given implementation. A 'body part' may refer to a discrete part or component of the body, for example a limb. Example body parts include, for example, legs (upper, lower or whole), arms (upper, lower or whole), torso, shoulders, head, hands, feet etc. The neural network implemented by the image analysis unit 202 may operate to generate a pose estimate for people depicted in analysed images. A pose estimate refers to an estimate of the configuration of a body (in part or in whole). A pose estimate as may therefore be referred to equivalently as a body pose. A pose estimate may be represented in a variety of ways. For example, a pose estimate may parameterize the body as a series of kinematic links. This may be referred to as the kinematic tree representation of the body configuration. Another way to represent the pose estimate is as a set of body parts, each with its own position and orientation in space. The pose estimates may be generated from the set of candidate joints using knowledge of typical human anatomy. A set of statistical or physical constraints may be used to enforce a pose estimate with skeletal consistency.

As part of generating the pose estimate for a person depicted within an analysed image, the neural network implemented by image analysis unit 202 identifies regions of the image representing a body part of the person. These image regions may be identified on a per-pixel or per-pixel block basis. An image region may be identified for each body pad of the person. These identified image regions are used as part of the neural network to generate the pose estimates. That is, the identified image regions may be calculated by the neural network as part of the processing to calculate the output pose estimates. In this regard, the identified image regions may be viewed as a by-product of implementing the neural network, rather than the final output of the neural network.

The implementation of the neural network by image analysis unit 202 is illustrated schematically in figure 4.

Image 214 depicting person 'A' is input into the neural network 402 implemented by image analysis unit 202. Network 402 may be convolutional neural network. The convolutional neural network may include a plurality of layers. These layers may include convolutional layers. The layers may include pooling layers. The layers may include a combination of convolutional and pooling layers.

The neural network outputs a pose estimate of the person A' depicted in image 214. The pose estimate generated by neural network 402 is shown at 404.

The neural network also generates an intermediary output that identifies regions of the image representing a body part of the person 'A'. This intermediary output is illustrated at 406. The intermediary output is used by the neural network 402 to generate the final pose estimate output 406.

The intermediary output is in the form of an image map that identifies each element of an image (e.g., each pixel, or pixel block) that is estimated to represent a particular body part of the person 'A'. If the neural network 402 models a body as formed of n body parts, the network may generate n image maps, one per body part. That is, each image map may identify the region of the image 214 that represents a corresponding body part. In this example, only a single image map identifying the region of a single body part 408 is shown, for clarity. The body part is shown by the line markings.

The image map may encode the location of the corresponding body part over the domain of the image. In the example shown here, the image map 406 encodes both the location and orientation of the corresponding body part over the image. To do this, each image element (e.g. pixel, pixel block etc.) may encode a vector denoting the location and orientation of the body part at that location. The image map may therefore define a vector field indicating the location and orientation of the corresponding body part within the image. The vector field generated by the neural network 402 indicating the location and orientation of the body part 408 within the image is shown at 410.

Example details of the neural network 402 implemented by image analysis unit 202 will now be described in more detail.

The neural network 402 may be a multi-branch neural network. One branch of the network generates the set of n image maps identifying the regions of the image depicting each body part (e.g. image map 406). The other branch of the network generates a set of confidence maps for body part locations. A confidence map may be generated for each body part that indicates a confidence level for each image element (e.g. pixel, pixel block etc.) that the image element represents that body part.

The body part locations identified by the confidence maps may be different to the body parts identified by the image maps. For example, the body parts identified by the confidence maps may be joints of the body, such as knee, elbow, wrist etc. The body parts identified by the image maps may be, or may include, limbs. For the purposes of this explanation, the head of the body may also be referred to as a limb. The set of image maps and confidence maps may then be combined to obtain a prediction of the body pose estimate. In some examples, the estimate of the body pose may be refined over a plurality of iterations. Each iteration may be performed by a respective stage of the neural network. A confidence map and image map may be produced by each stage of the neural network. The confidence map and image map may be refined following each iteration, or stage of the neural network.

In summary, the image analysis unit 202 implements a first computational neural network to analyse image 214. The first neural network models each body as a set of n body parts, and identifies within the image being analysed n regions (e.g. region 408), each region depicting a corresponding body part. Each region may be identified in a respective image map. That is, each image map identifies the location within the image of a corresponding body part. Thus, the neural network may generate n image maps.

Referring back to figure 3, and at step 304 the feature extraction unit 204 analyses each image region identified by the image analysis unit 202 to extract a set of features associated with the body parts of person 'A'.

The feature extraction unit 204 may analyse each identified image region (e.g. region 408) using a second computational neural network.

Conveniently, the feature extraction unit 204 may analyse only the image regions identified by the image analysis unit 202 that depict corresponding body parts of the person, and not the remaining regions of the image. For example, the feature extraction unit may input into the second computational neural network only the regions of the image identified by the image analysis unit 202. This reduces the computational effort required to extract features associated with the body parts of person 'A' by reducing the domain of the image to be analysed for feature extraction.

The feature extraction unit 204 may extract a feature for each body part of the person 'A.

Conveniently, the features extracted for each body part of person 'A' may be semantic features. The semantic features may describe high-level detail of the body part. They may identify an object or item of content associated with each body part, or assign a label to each body part. The objects may be at a high enough level to be human-s recognisable. For example, a semantic feature may identify a type, or class of clothing covering the body part. In other examples, a semantic feature may identify a combination of: a type, or class of clothing; and detail relating to that type or class of clothing (e.g., colour, pattern etc.). Types of clothing identified by a semantic feature may include, for example, 'top', 'trousers', 'shorts', 'blouse', 'jacket', 'coat', 'hat' etc. Thus, an example semantic feature associated with a body part could be, for example: blue jeans; red coat etc. The feature extraction unit 204 may also analyse each identified region to extract one or more features that are associated with the set of body parts of the person 'A'. That is, rather than extract a feature for each body part of the person, a feature may be extracted that is associated with the set of body parts for that person. The feature extraction unit 204 may do this as an alternative to the example above in which features are extracted for each body part, or in addition; i.e., the feature extraction unit may extract: i) a feature for each identified body part of the person; and ii) one or more features each associated with the set of identified body parts of the person. The features associated with the set of identified body parts are therefore features associated with the person as a whole. These features may be referred to as anthropometric features.

An example anthropometric feature is build (e.g., small, medium and large). Another example anthropometric feature is height (e.g., short, medium and tall). Thus, the extracted set of one or more anthropometric features may classify the identified person into a corresponding set of one or more body type classes. Put another way, the extracted set of one or more anthropometric features may assign one or more body type labels to the person.

Rather than classify, or bucket, the identified person into a body type class, an anthropometric feature may characterise an anthropometric aspect of the identified person. An anthropometric feature may for example be a relative measurement of two parts of the person's body, such as: (i) a ratio of limb, or body part lengths of the person (e.g., a ratio of arm length of leg length, or vice versa); (ii) the ratio of torso length to leg length, or vice versa; (iii) the ratio of arm length to height, or vice versa; (iv) the ratio of leg length to height, etc. In other examples, the features extracted for each body part may be low-level features. In general, a low-level feature is one that doesn't correspond to a real world, human-recognizable object. Example low-level features include, for example, one or more of pixel colour; pixel intensity; intensity gradient; colour gradient; texture etc. An example of how the feature extraction unit 204 may extract features for the body parts of the person depicted in the image is illustrated in figure 5. For the purposes of this illustration, the body of person 'A' is modelled as five body parts. The regions of the image containing each body part, as determined by the image analysis unit 202, are denoted R1, R2, R3, R4 and R5.

The feature extraction unit 204 analyses the identified regions R1-R5 using a computational neural network illustrated generally at 502. In this example, the neural network includes a plurality of branches each performing a respective extraction network, denoted EN1, EN2, EN3, EN4 and ENS. Each extraction network may comprise a plurality of stages including convolution and/or pooling stages.

Each extraction network EN1-EN5 analyses a respective image region R1-R5 to generate a respective feature label, denoted 5041_5 respectively. Each feature label identifies a feature for the corresponding image region (and thus for the corresponding body part of person 'A'). Each feature label may be in the form of a classification vector. Each classification vector may classify a corresponding image region as containing a feature from a set of features, or classifiers. That is, the classification vectors may identify for each image region a feature from a set of features. The number of features in the set may be equal to the number of elements of the classification vector. That is, an m element classification vector may identify up to m different features.

The features identified by the plurality of feature labels 5041_5 form a set of extracted features for the person 'A' depicted in the image 214 being analysed. Each extracted feature corresponds to a body part of person 'A'. The set of extracted features for the person (e.g. the set of feature labels) may form a descriptor of the person. The descriptor may characterise the appearance of the person.

At step 306, the storage unit associates the extracted set of features with an identification tag, or ID, for person 'A' depicted in image 214. The association between the identification tag and the extracted set of features is stored in database 208.

In some implementations, the ID for person 'A' may be stored with the set of extracted features determined by the feature extraction unit 204 in the database 208. For example, the database may store an identification for person 'A' and the set of feature labels identified for the body parts of that person.

Alternatively, a feature characterising ID may be stored in the database with the person ID. The feature characterising ID may characterise the features extracted for the body parts of the person. However, the feature characterising ID may not explicitly identify each extracted feature. The feature characterising ID may be generated in dependence on the set of extracted features. The value of the feature characterising ID may be uniquely dependent on the set of extracted features. For example, the feature characterising ID may take the form of a hash value or seed generated from the set of extracted features. The hash value or seed could for example be generated from the set of feature labels identified for the image. The use of a feature characterising ID conveniently enables the features characterising person A' to be stored in the database 208 without requiring each feature label to be stored, which may reduce storage requirements.

The feature characterising ID may be generated by the storage unit 206 from the set of extracted features determined by the feature extraction unit 204. The storage unit 206 may then store the feature characterising ID along with an associated person ID in the database 208.

Database 208 therefore stores information characterising each identified person (i.e. each person identified within a set of images captured by one of the image capturing devices 106). Each identified person can be uniquely identified within the database by their person ID. Information stored in the database 208 can be accessed by the apparatus 102 to re-identify person 'A' within a further set of images captured by the image capturing devices 106.

In this example, a further image 216 (shown in figure 2) is captured by one of the image capturing devices 106A-c. It is not known a priori whether person 'A' is depicted within this image 216 or not. This further image may for example be captured sometime after image 214. That is, image 214 and the further image 216 may be separated by a time period. Image 216 may be captured by a different image capturing device than the image capturing device 106A used to capture image 214. The image capturing device used to capture image 216 may have a different field of view to the image capturing device used to capture image 214. The fields of view may partially overlap, or not overlap.

Referring back to figure 3, at step 308 the apparatus 102 analyses the further image 216 to extract features associated with body parts of a person depicted within the 20 image.

The image 216 is analysed by a second image analysis unit 210.

The second image analysis unit 210 may operate in a similar fashion to the image analysis unit 202 and feature extraction unit 204. That is, the unit 210 may operate to: i) identify regions of the image containing body parts of the person depicted in the image; and ii) analyse those identified regions to extract a set of features for the body parts of the person (that is, extract a feature for each respective body part). The unit 210 may extract the features for the person depicted in image 216 using any of the techniques described above with reference to steps 302 and 304.

Alternatively, the image analysis unit 210 may analyse the image 216 globally to identify features within the image. For example, the image analysis unit 210 may be pre-trained to identify certain features from received images. For example, the image analysis unit 210 may be pre-trained to identify the same set of features that can be identified by the classification vectors. That is, the image analysis unit 210 may be able to identify the m features that can be identified by the feature extraction unit 204.

The image analysis unit 210 may scan the image 216 for body part features in accordance with some scanning order. The image analysis unit may for instance scan certain regions of the image as a priority (referred to as priority regions), and scan the remaining regions of the image if no features are identified within the priority regions. The priority regions may be peripheral regions of the image. They may for example be regions located at the edge of the image. These regions may be scanned as a priority because if a person were to appear within an image, it is likely they would appear in the peripheries of the image (as the person entered the field of view of the image capturing device used to capture the image). Scanning peripheral regions of the image for body part features as a priority may reduce the amount of processing required to analyse the image 216 to extract body part features.

At step 310, the person identification unit 212 determines if the extracted set of features for the person depicted in image 216 match those features associated with the person ID for person 'A' stored in the database 208. If the extracted set of features for the person in image 216 match the extracted set of features associated with the person ID to within a specified confidence level, the person identification unit determines that the person depicted in image 216 is person 'A'. If the set of features extracted for the person depicted in image 216 do not match those associated with the person ID stored in the database 208 to within a specified confidence level, the person identification unit 212 determines that the person depicted in image 216 is not person In order for the person identification unit 212 to use the information stored in database 208 to determine whether the person depicted in image 216 is person 'A', the information indicative of the body part features extracted from image 216 should be placed into the same format by the image analysis unit 210 as the feature information associated with the person Ds in the database 208. Thus, for example, if each person ID stored in database 208 is associated with a set of feature labels, the image analysis unit 210 may generate a set of feature labels indicating the body part features extracted from image 216. The person identification unit 212 can then access the database 208 to determine if the feature labels extracted from image 216 match those associated with the person ID for person A'. Alternatively, if each person ID stored in database 208 is associated with a feature characterising ID, the image analysis unit 210 may generate a feature characterising ID from the body part features extracted from image 216. The image analysis unit 210 may for example generate a hash or seed value from the extracted body part features using the same hash or seeding algorithm used by the storage unit 206. The person identification unit 212 can then access the database 208 to determine if the feature characterising ID generated from the image 216 matches the feature characterising ID associated with the person ID for person 'A'.

Thus, the person identification unit 212 uses the information indicative of the extracted features determined by the image analysis unit 210 to search database 208 to determine if there is a match with a stored person ID. If there is a match, the person identification unit determines that the person having that person ID is present in the image.

The above-described process enables person re-identification to be performed across images captured by multiple image capturing devices and/or spanning different time periods. The above-described method performs person re-identification using extracted features associated with body parts (e.g. semantic features). This enables person-reidentification to be performed without the need to perform facial recognition, which is advantageous due to the technical difficulties and shortcomings associated with that technique. Moreover, by characterising a depicted person using multiple extracted features associated with body parts, the re-identification can be made more robust compared to the approach of using facial recognition. For example, the technique described herein may be less susceptible to changes in person orientation within the image, lighting, occlusion etc. This is because multiple features can be used to characterise the person.

In the examples described above, step 304 is performed by analysing each image region to extract features associated with the person's body parts, where the features include semantic features and/or anthropometric features. In these examples, step 304 may involve the use of a classifier (e.g. a neural network) to classify the features associated with each body part. Step 308 may similarly involve the use of a classifier to extract features associated with the body parts of a person depicted in a further image. An alternative approach for implementing steps 304 to 310 will now be described with reference to figure 6.

In this example, the image analysis unit 202 analyses an initial set of multiple images depicting person 'A'. For each image in the initial set, the image analysis unit 202 identifies regions of that image depicting body parts of person 'A'. The two images depicting person 'A' are shown in figure 6 at 602 and 604. The regions of each image identified by the image analysis unit 202 as depicting body parts of person 'A' are shown by the hashed lines, and denoted 606 and 608. It is noted that in this example, the two images 602 and 604 depict different views of person 'A'.

It is recalled that at step 304, the feature extraction unit 204 analyses each image region identified by the image analysis unit 202 to extract a set of features associated with the body parts of a person depicted in those images.

In accordance with this example, the feature extraction unit 204 analyses the image regions 606 and 608 by identifying a transform of the image data within those image regions to a feature space 610. More specifically, the feature extraction unit 204 identifies a transform of the image data within each of image regions 606 and 608 to a common region 612 of the feature space. That is, the feature extraction unit identifies a single transform that, when applied to the image data of identified image regions 606 and 608 within each of the initial set of images 602 and 604, transforms those image regions to locations of the feature space 610 located within the common region 612. The image data could be, for example, pixel data (e.g. pixel values) for the pixels within the image regions. The location of the transformed image data for region 606 is shown at 614, and the location of the transformed image data for the region 608 is shown at 616.

The feature space 610 may be formed of multiple feature bases, or feature axes. In general, the feature space 610 may be an m-dimensional space, where m 1. A position within the feature space may define a set of one or more features associated with the body parts of person A. In some examples, the position within the feature space may define semantic features. In other examples, the features are non-semantic; that is to say, they may characterise the depicted person in some way but may not be human-recognisable. The number of features defined by the position within the feature space may depend on the number of orthogonal feature bases defining the space. In some examples, each basis, or axis, may define a corresponding feature; alternatively, multiple bases may define a single feature.

As indicated above, the transform is such that, when applied to the image data of each of the image regions 606, 608, the set of image data for each of those image regions 606, 608 is transformed to a common region of the feature space. In other words, the image regions depicting a person in multiple different views and/or angles are transformed to a common region of the feature space. This is because the image regions, though potentially depicting different views of the person A% are nevertheless depicting the same person and thus the same set of features are associated with the body parts of that person.

The transform is further adapted so that, when applied to the image data of image regions depicting different people (e.g. person A' and a person 'C'), the image data for those regions are transformed to different regions of the feature space (i.e. they are not transformed to a common region). This is representative of the fact that different people are associated with different sets of features. An illustration of this is shown by the cluster of points 618, which illustrate the exemplary position in feature space of the image data for image regions depicting a different person C' when subjected to the transform.

The feature extraction unit 204 may identify the transform by supervised learning. The feature extraction unit 204 may implement a computational neural network to learn the transform. The unit 204 may for example implement the neural network to iteratively refine the transform until the transform, when applied to the image data of different image regions 606 and 608 (depicting body parts of the same person), results in locations within the feature space within a specified distance of each other.

The common region 612 may be of a specified, or predetermined size. The size of the common region may be implementation specific. A smaller common region may more accurately, or uniquely define the set of features associated with the body parts of person 'A', but may require more computational effort to learn/identify.

An equivalent interpretation for the common region 612 is that the transformed locations 612, 614 in feature space of the image data for the identified image regions 606, 608 within each image in the initial subset of images depicting person 'A' are within a specified distance of each other. That is, the distance between any pair of feature space locations resulting from applying the transform to identified image regions 606, 608 depicting person 'A' within different images 602, 604 is less than a specified, or predetermined, distance. That distance may be a Euclidean distance, a Geodesic distance, or some other distance metric.

Once the feature extraction unit 204 has identified the transform (e.g. by self-learning), it applies the transform to the identified image regions within at least one of the initial set of images depicting the person 'A' (i.e. at least one of the initial set of images depicting the person analysed by the image analysis unit 202). For example, if there are multiple images depicting the person 'A' within the initial set of images analysed by the image analysis unit 202 (e.g. as in this example, where there are two images 602 and 604), the feature extraction unit 204 may apply the identified transform to the identified image regions within each of those images (e.g., to both image regions 606 and 608). Alternatively, the feature extraction unit may apply the identified transform to the identified image regions within only a single one of the initial set of images depicting the person (e.g. to one of image regions 606 or 608). In general, if the initial set of images depicting person 'A' contains multiple images, the feature extraction unit may apply the identified transform to the identified image regions within at least of those images.

The feature extraction unit 204 then determines the extracted set of features from the location(s) within the common region of the feature space resulting from the application of the identified transform. For example, if the identified transform is applied to image regions within multiple images 602, 604 depicting the person (thus generating multiple locations 614, 616 within the feature space, the extracted set of features may be determined as the average of those locations. Alternatively, the extracted set of features may be determined to be the set of those generated locations.

At step 306, the storage unit 206 stores in the database an ID for person 'A' and an associated set of extracted features for that person. Thus, the storage unit 206 may store an ID for person 'A' and one or more locations within the feature space 610 associated with that person.

At step 308, the second image analysis unit 210 analyses a further image to extract 10 features associated with body parts of a person depicted in that image. The processing performed by the image analysis unit 210 in accordance with this example is illustrated in figure 7.

Figure 7 shows a further image 702, which in this example is taken to also depict person 'A' (though this is not known at the time the image is received at the image analysis unit 210). The image analysis unit 210 analyses the image 700 to identify regions of the image depicting body parts of the person within the image. The image regions depicting body parts of the person within image 700 are shown at 702.

The image analysis unit 210 then applies the transform (identified by the feature extraction unit 204) to the image data within the identified region 702. The application of the identified transform transforms the image data to a location 704 within the feature space 610. The application of the identified transform is shown schematically at 706.

The extracted features for the person depicted in image 700 are then determined from the location 704 within the feature space generated by applying the transform to the image data of image region 702.

At step 310, the person identification unit 312 determines if the extracted set of features associated with the person depicted in further image 700 match those associated with the ID for person 'A' stored in the database.

To do this, the person identification unit 312 determines if the location 704 within the feature space 610 matches a location stored in the database for person 'A' (i.e., the location associated with the person 'A' in the database) to within a specified amount. For example, the person identification unit 312 may determine if the location 704 is within a specified distance of stored locations 614 and/or 616 associated with person 'A' in the database. If an average location is stored for person 'A' in the database, the person identification unit may determine whether the location 704 is within a specified distance of that average location.

io If the person identification unit 312 determines that the location 704 does match a location associated with person 'A' in the database, the person identification unit determines that person 'A' is also depicted in further image 700. Alternatively, if the identification unit 312 determines that the location 704 does not match any location associated with person 'A' in the database, the unit 312 determines that person 'A' is not depicted in image 700.

The above-described approach of transforming the image data of identified regions to a feature space may conveniently enable the extracted features of depicted people to be expressive enough to support robust matches within the database. This is because the use of a feature space may enable extracted features to be expressed with greater granularity than features expressed through classifiers.

In some of the above examples, a single image 214 was analysed to generate a characterising feature set for the person 'A' associated with the person ID in the database. In other examples, multiple images may be analysed to generate a characterising feature set for the person, where the features in the feature set correspond to a body part of the person. For example, a set of images may be analysed by image analysis unit 202 and feature extraction unit 204. In this case, the image analysis unit may analyse each image in turn to identify, for each analysed image, regions of the image depicting body parts of the person. The feature extraction unit 204 may then analyse each region for each image. Thus, the feature extraction unit 204 may extract a set of features associated with the body parts of the person, where the set of features includes multiple feature labels for each body part (e.g., a feature label per analysed image for each body part). The multiple feature labels for a given body pad may then be used to generate a characterising feature label for the body part. The set of such characterising feature labels may then be associated with the person ID in the database. The use of multiple images to generate the characterising feature labels may make the feature labels more robust, for example to changes in lighting, viewing angles etc. In the examples described above, the further image 216 tested for person 'A' contained only a single person. It will be appreciated that in practice, an image may depict any number of people, for example zero people or more than one person. In the event the image depicts more than one person, the image analysis unit 210 and person identification unit 212 may perform the steps 308 and 310 described herein for each person within the image. That is, the unit 210 may analyse the image 216 to extract, for each person depicted in the image, features associated with body parts of that person. The identification unit 212 may then determine if the extracted set of features for each person depicted in the image 216 match those associated with the ID for person 'A' stored in the database. If the extracted set of features for one of the depicted persons matches those associated with the ID for person 'A' to within a specified confidence level, then it is determined that the image depicts person 'A'.

In some implementations, the apparatus 102 may be used to identify a set of frames depicting person 'A'. In this case, the person identification unit 212 may access the database to store the information characterising the extracted set of features for person 'A' locally within the apparatus 102. For each received image to be tested, the image analysis unit 210 may extract the set of features for the body pads of the person within the image, and the person identification unit 212 can then compare that extracted information with the locally stored information characterising the extracted set of features for person 'A'. This may save the apparatus having to repeatedly read the information characterising the extracted features from the database 208.

The above examples have been described in the context of searching a further image to identify whether that image contains a specific person (person 'A'). It will be appreciated that, in practice, the apparatus 102 may analyse a set of further images to determine whether those images depict anyone having an ID stored in the database. For example, at step 310, the person identification unit may search the database 208 to determine whether the set of one or more features associated with the body parts of a person depicted within the further image match any of the sets of features stored within the database to within a specified confidence level. In response to determining that there is a match, the person identification unit may conclude that the person having an ID associated with the matching set of features in the database is depicted in the further images.

In some examples, the apparatus may contain only a single apparatus unit that performs both steps 302 and 308 and/or 306. A single apparatus unit may perform all of steps 302 to 310.

The apparatus 102 is shown as comprising a number of functional blocks. This is schematic only and is not intended to define a strict division between different logic elements of such entities. Each functional block may be provided in any suitable manner. It is to be understood that intermediate values described herein as being formed by a particular module/unit etc. need not be physically generated by the module at any point and may merely represent logical values which conveniently describe the processing performed by the module between its input and output.

The modules/units described herein may be configured to perform any of the methods described herein. Generally, any of the functions, methods, techniques or components described above can be implemented in software, firmware, hardware (e.g., fixed logic circuitry), or any combination thereof. The term "unit' may be used herein to generally represent software, firmware, hardware, or any combination thereof. In the case of a software implementation, the unit program code that performs the specified tasks when executed on a processor. The methods described herein could be performed by one or more processors executing code that causes the processor(s) to perform the methods. The code may be stored on a non-transitory computer-readable storage medium. Examples of a non-transitory computer-readable storage medium include a random-access memory (RAM), read-only memory (ROM), an optical disc, flash memory, hard disk memory, and other memory devices that may use magnetic, optical, and other techniques to store instructions or other data and that can be accessed by a machine.

The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.

ANNEX TO DESCRIPTION

The present disclosure may be further understood with reference to the following clauses: 1. A method of identifying a person across multiple images captured by one or more image-capturing devices, comprising: analysing an initial set of one or more images depicting the person using a first computational neural network that identifies for each analysed image regions of the image each depicting a corresponding body part of the person; analysing the image regions identified from the first computational neural network to extract a set of one or more features associated with the body parts of the person; associating in a database the extracted set of features with an ID for the person; analysing a further image to extract a set of one or more features associated with body parts of a person within the further image; and determining if the extracted set of features from the further image match those associated with the ID for the person stored in the database to within a specified confidence level and, if so, determining that the person with that ID is depicted in the further image.

2. A method according to 1, wherein each region of the image identified from the first computational neural network is analysed using a second computational neural network to extract the set of features associated with the body parts of the person.

3. A method according to 1 or 2, wherein the initial set of one or more images depicting the person are captured by a first set of one or more image capturing devices, and the further image is captured from a further image capturing device not forming part of the first set.

4. A method according to 3, wherein the further image capturing device has a non-overlapping field of view with the field of view of the image capturing devices of the first set.

5. A method according to any preceding clause, wherein the initial set of one or more images comprises multiple images.

6. A method according to any preceding clause, wherein the first set of one or more image capturing devices comprises multiple image capturing devices.

7. A method according to any preceding clause, wherein the first computational neural network identifies for each analysed image a set of vector fields each encoding the location and orientation of a respective body part of the person within the image.

8. A method according to 7, wherein each vector field identifies a region of the image depicting the respective body part of the person.

9. A method according to any preceding clause, wherein the method comprises analysing each image region identified from the first computational neural network to extract a set of features each associated with a respective body part of the person.

10. A method according to any preceding clause, wherein the extracted set of features form a descriptor of the person.

11. A method according to any preceding clause, wherein the method comprises analysing each image region identified from the first computational neural network using a second computational neural network to generate for each body part a classification dataset identifying a feature for the body part.

12. A method according to any preceding clause, wherein the method further comprises generating a characterising feature ID characterising the extracted features of the person from the extracted set of features, and storing the feature characterising ID in the database with the ID for the person.

13. A method according to any preceding clause, wherein the step of analysing the further image comprises using a third computational neural network pre-trained to identify a set of features comprising the extracted features.

14. A method according to 12 or 13 when dependent on 12, wherein the method further comprises generating a feature characterising ID of the person within the further image from the extracted features associated with body parts of that person, and comparing that feature characterising ID with the feature characterising ID stored in the database associated with the person ID.

15. A method according to any of 1 to 8, wherein the step of analysing the image regions identified from the first computational neural network to extract a set of one or more features associated with the body parts of the person comprises: identifying a transform of image data for the identified image regions within each of the initial set of images to a common region of a feature space; and for each of at least one image of the initial set of images, applying the identified transform to the identified image regions within the image to transform the identified image regions to a location within the common region of the feature space; and determining the extracted set of one or more features from the location(s) within the common region of the feature space.

16. A method according to 15, wherein the common region of feature space is of a specified size.

17. A method according to 15 or 16, wherein the set of features are determined from an average of the locations within the common region of the feature space.

18. A method according to 15 or 16, wherein the set of features are determined from each of the location(s) within the common region of feature space.

19. A method according to any of 15 to 18, wherein the locations within the feature space identify semantic features associated with the body parts of the person.

20. A method according to any of 15 to 19, wherein the step of analysing the further image comprises: identifying for the further image regions of the image depicting body parts of the person within the further image; applying the identified transform to the identified image regions within the further image to transform those identified regions to a location within the feature space.

21. A method according to any of 15 to 20, wherein the determining step comprises: determining if the location within the feature space for the image regions within the further image matches a location associated with the ID for the person stored in the database to within a specified amount.

22. An apparatus for identifying a person across multiple images captured by one or more image-capturing devices, comprising: a first image analysis unit configured to analyse an initial set of one or more images depicting the person using a first computational neural network that identifies for each analysed image regions of the image each depicting a corresponding body part of the person; a feature extraction unit configured to analyse the image regions identified from the first computational neural network to extract a set of one or more features associated with the body parts of the person; a storage unit configured to store in a database an ID for the person associated with the extracted set of features; a second image analysis unit configured to analyse a further image to extract a set of one or more features associated with body parts of a person within the further image; and a person identification unit configured to determine if the extracted set of features from the further image match those associated with the ID for the person stored in the database to within a specified confidence level and, if so, determining that the person with that ID is depicted in the further image.

23. An apparatus according to 22, wherein the feature extraction unit is configured to use a second computational neural network to analyse each image region identified from the first computational neural network to extract a set of features associated with the body parts of the person.

24. An apparatus according to 22 or 23, wherein the initial set of one or more images depicting the person are captured by a first set of one or more image capturing devices, and the further image is captured from a further image capturing device not forming part of the first set.

25. An apparatus according to 24, wherein the further image capturing device has a non-overlapping field of view with the field of view of the image capturing devices of the first set.

26. An apparatus according to any of 22 to 25, wherein the initial set of one or more images comprises multiple images.

27. An apparatus according to any of 22 to 26, wherein the first set of one or more image capturing devices comprises multiple image capturing devices.

28. An apparatus according to any of 22 to 27, wherein the first image analysis unit is configured to implement the first computational neural network to identify for each analysed image a set of vector fields each encoding the location and orientation of a respective body part of the person within the image.

29. An apparatus according to 28, wherein each vector field identifies a region of the image depicting the respective body part of the person.

30. An apparatus according to any of 22 to 29, wherein the feature extraction unit is configured to analyse each image region identified from the first computational neural network to extract a set of features each associated with a respective body part of the person.

31. An apparatus according to any of 22 to 30, wherein the feature extraction unit is configured to use a second computational neural network to analyse each image region identified from the first computational neural network to generate for each body part a classification dataset identifying a feature for the body part.

32. An apparatus according to any of 22 to 31, wherein the storage unit is further configured to generate a feature characterising ID of the person from the extracted set of features, and store the feature characterising ID in the database with the ID for the person.

33. An apparatus according to any of 22 to 32, wherein the second image analysis unit is configured to use a third computational neural network pre-trained identify a set of features comprising the extracted features.

34. An apparatus according to 32 or 33, wherein the second image analysis unit is configured to generate a feature characterising ID of the person within the further image from the extracted features associated with body parts of that person, and the person identification unit is configured to compare that feature characterising ID with the feature characterising ID stored in the database associated with the person ID to determine if the extracted set of features match those associated with the ID for the person stored in the database to within a specified confidence level.

35. An apparatus according to any of 22 to 30, wherein the feature extraction unit is configured to: identify a transform of image data for the identified image regions within each of the initial set of images to a common region of a feature space; for each of at least one image of the initial set of images, apply the identified transform to the identified image regions within the image to transform the identified image regions to a location within the common region of the feature space; and determine the extracted set of one or more features from the location(s) within the common region of the feature space.

36. An apparatus according to 35, wherein the common region of feature space is of a specified size.

37. An apparatus according to 35 or 36, wherein the feature extraction unit is configured to determine the set of features from an average of the locations within the common region of the feature space.

38. An apparatus according to35 or 36, wherein the feature extraction unit is configured to determine the set of features from each of the location(s) within the common region of feature space.

39. An apparatus according to 35 to 39, wherein the locations within the feature space identify semantic features associated with the body parts of the person.

40. An apparatus according to 35 to 39, wherein the second image analysis unit is configured to: identify for the further image regions of the image depicting body parts of the person within the further image; apply the identified transform to the identified image regions within the further image to transform those identified regions to a location within the feature space.

41. An apparatus according to 35 to 40, wherein the person identification unit is configured to: determine if the location within the feature space for the image regions within the further image matches a location associated with the ID for the person stored in the database to within a specified amount.

42. Computer program code that, when executed, causes an apparatus to perform the method of any 1 to 21.

43. A non-transitory computer-readable storage medium having stored thereon instructions for performing the method of any 1 to 21.

Claims

CLAIMS1. A method of identifying a person across multiple images captured by a plurality of image-capturing devices, comprising: analysing an initial set of one or more images depicting the person using a s first computational neural network that identifies, for each analysed image, regions of the image each depicting a corresponding body part of the person; analysing the image regions identified from the first computational neural network to extract a set of one or more features associated with the body parts of the person; associating, in a database, the extracted set of features with an ID for the person; analysing a further image to extract a set of one or more features associated with body parts of a person within the further image; and determining if the extracted set of features from the further image match those associated with the ID for the person stored in the database to within a specified confidence level and, if so, determining that the person with that ID is depicted in the further image, wherein the initial set of one or more images and the further image are captured by a plurality of image capturing devices that are spatially distributed within an environment including a town or city.
2. A method as claimed in claim 1, wherein the first computational neural network identifies, for each analysed image, vectors encoded from image elements of the analysed image.
3. A method as claimed in claim 2, wherein image elements are based on pixel 25 data.
4. A method as claimed in claim 2 or 3, wherein the vectors encode a location and orientation of a respective body part of the person within the image.
5. A method as claimed in any preceding claim, wherein the plurality of image capturing devices have non-overlapping fields of view.
6. A method as claimed in claim 1, further comprising tracking the person, depicted in the initial set of one or more images captured by the plurality of image-capturing devices, across multiple images including the initial set of one or more images and the further image.s
7. The method as claimed in claim 6, wherein the person is further tracked over multiple temporal periods.
8. A method as claimed in claim 1, wherein the initial set of one or more images depicting the person are captured by a first set of one or more image capturing devices, and the further image is captured from a further image capturing device not forming part of the first set.
9. A method as claimed in claim 8, wherein the further image capturing device has a non-overlapping field of view with the field of view of the image capturing devices of the first set.
10. A method as claimed in any preceding claim, wherein the initial set of one or more images comprises multiple images.
11. A method as claimed in any of claims 2-5 and 10, when dependent on claim 2, wherein the vectors form a vector field, and wherein each vector field identifies a region of the image depicting the respective body pad of the person.
12. A method as claimed in any preceding claim, wherein the method comprises analysing each image region identified from the first computational neural network to extract a set of features each associated with a respective body part of the person.
13. A method as claimed in any preceding claim, wherein the extracted set of one or more features forms a descriptor of the person.
14. A method as claimed in any preceding claim, wherein the method comprises analysing each image region identified from the first computational neural network using a second computational neural network to generate, for each body part, a classification dataset identifying a feature for the body pad.
15. A method as claimed in any preceding claim, wherein the method further comprises generating a feature characterising ID which characterises the extracted s features of the person from the extracted set of features, and storing the feature characterising ID in the database with the ID for the person.
16. A method as claimed in any of claims 1-13 and 15, wherein the step of analysing the further image comprises using a second computational neural network pre-trained to identify a set of features comprising the extracted features.
17. A method as claimed in claim 15, or claim 16 when dependent on claim 15, wherein the method further comprises generating a feature characterising ID of the person within the further image from the extracted features associated with body parts of that person, and comparing that feature characterising ID with the feature characterising ID stored in the database associated with the ID for the person.
18. A method as claimed in any of claims 1 to 11, wherein the step of analysing the image regions identified from the first computational neural network to extract a set of one or more features associated with the body parts of the person comprises: identifying a transform of image data for the identified image regions within each of the initial set of images to a common region of a feature space; for each of at least one image of the initial set of images, applying the identified transform to the identified image regions within the image to transform the identified image regions to a location within the common region of the feature space; and determining the extracted set of one or more features from the location(s) within the common region of the feature space.
19. A method as claimed in claim 18, wherein the common region of feature space is of a specified size.
20. A method as claimed in claim 18 or 19, wherein the set of features are determined from an average of the locations within the common region of the feature space.
21. A method as claimed in claim 18 or 19, wherein the set of features are s determined from each of the location(s) within the common region of feature space.
22. A method as claimed in any of claims 18 to 21 wherein the locations within the feature space identify semantic features associated with the body parts of the person.
23. A method as claimed in any of claims 18 to 22, wherein the step of analysing 10 the further image comprises: identifying, for the further image, regions of the image depicting body parts of the person within the further image; and applying the identified transform to the identified image regions within the further image to transform those identified regions to a location within the feature 15 space.
24. A method as claimed in any of claims 18 to 23, wherein the determining step comprises: determining if the location within the feature space for the image regions within the further image matches a location associated with the ID for the person stored in the database to within a specified amount.
25. An apparatus for identifying a person across multiple images captured by a plurality of image-capturing devices, comprising: a first image analysis unit configured to analyse an initial set of one or more images depicting the person using a first computational neural network that identifies, for each analysed image, regions of the image each depicting a corresponding body part of the person; a feature extraction unit configured to analyse the image regions identified from the first computational neural network to extract a set of one or more features associated with the body pads of the person; a storage unit configured to store, in a database, an ID for the person associated with the extracted set of features; a second image analysis unit configured to analyse a further image to extract a set of one or more features associated with body parts of a person within the further image; and a person identification unit configured to determine if the extracted set of features from the further image match those associated with the ID for the person stored in the database to within a specified confidence level and, if so, determining that the person with that ID is depicted in the further image, wherein the initial set of one or more images and the further image are captured by a plurality of image capturing devices that are spatially distributed within an environment including a town or city.
26. Computer program code that, when executed, causes an apparatus to perform the method of any of claims 1 to 24.is
27. A non-transitory computer-readable storage medium having stored thereon instructions for performing the method of any of claims 1 to 24.