WO2018109048A1

WO2018109048A1 - A clothing compliance detection apparatus, and associated clothing standard enforcement apparatus

Info

Publication number: WO2018109048A1
Application number: PCT/EP2017/082720
Authority: WO
Inventors: Benjamin BIGGS; Patrick Robert HYETT
Original assignee: Glaxosmithkline Intellectual Property Development Limited
Priority date: 2016-12-16
Filing date: 2017-12-13
Publication date: 2018-06-21
Also published as: GB201621522D0

Abstract

A clothing compliance detection apparatus comprises an image processor configured to determine whether or not a body region of an individual is clothed by an item of clothing, further configured to determine whether or not the body region is bearded.

Description

A CLOTHING COMPLIANCE DETECTION APPARATUS, AND ASSOCIATED

CLOTHING STANDARD ENFORCEMENT APPARATUS

BACKGROUND TO THE INVENTION

Field of the Application

The present invention relates to a clothing compliance detection apparatus, a clothing standard enforcement apparatus, and a method of training an image classifier for use in such apparatus.

Description of Prior Art

WO2013/178819 teaches a method and apparatus for protective clothing compliance. The method acquires an image of a person attired in protective clothing and then eliminates certain colours before determining whether the person is correctly attired.

The present invention provides an improved apparatus for the detection, and in particular, the classification of clothing, including items of clothing which have little if any colour, such as glasses, or protective goggles. The present invention further provides apparatus which is capable of classifying a human body region as being bearded or not in order to determine whether or not that body region requires clothing, and optionally, then checking the clothing has been worn.

The present invention also provides a method of training an image classifier which provides improved accuracy. The classifier trained by the method is especially useful for classifying a body region of an individual as clothed or unclothed and, separately, as being hirsute, or not. By hirsute, it is meant that a predetermined density of hair is present over the body region which requires personal protective equipment (PPE) should be used order to prevent shedding of the hair into a controlled zone, such as a clean room for the manufacture of pharmaceuticals. SUMMARY OF THE INVENTION

Acronyms and Abbreviations:

The following acronyms and abbreviations have meanings ascribed below, unless context would indicate otherwise:

API: Application Programming Interface

BoW: Bag of Visual Words

CCTV: Closed Circuit Television

GPD: Gowning procedure file

HOG: Histogram of Orientated Gradients HSV: Hue, Saturation, Value

LAB: "lightness", alpha and beta (colour dimensions)

LCFE: Local Colour Feature Extractor

MFBS: Multi-Feature Bag of Visual Words with SVM Classifier

PPE: Personal Protective Equipment

RFID: Radio Frequency Identification

RGB: Red, Green, Blue (additive colour model)

SDK: Software Developer Kit

SIFT: Scale Invariant Feature Transform

SURF: Speeded-Up Robust Features

SVM: Support Vector Machine

UI: User Interface

According to a first aspect of the present invention, there is provided a clothing compliance detection apparatus comprising;

an image processor, configured to determine from at least one video image of an individual whether or not a predetermined body region of the individual is clothed by an associated item of clothing, and to provide a recognition output indicative of the determination,

a video camera, for providing at least one video image of an individual to the image processor, an output module, configured to receive the recognition output and to generate a user perceptible feedback based upon the recognition output,

wherein the image processor is configured to;

extract from at least one video image a cropped body region which corresponds with the body region,

classify the cropped body region using a clothing classifier for the associated item of clothing, which clothing classifier can predict at least two output classes, including a clothed class and an unclothed class,

wherein the image processor is further configured to determine whether or not a body region is bearded, and to provide an output indicative of that determination, by,

extracting from at least one frame of live video frame data a cropped body region which corresponds with the body region, classifying the cropped body region using a beard classifier associated with the body region, which beard classifier can predict at least two output classes, including an unbearded class and a bearded class. In one embodiment, the clothing classifier comprises a multi-descriptor trained image classifier.

In one embodiment, the multi-descriptor trained image classifier is trained with a set of labelled vectors representing a training image set for the associated clothing, wherein each labelled vector incorporates features extracted from an image of the training image set via at least two different feature extractors.

In some embodiments, the at least two different feature extractors comprise a shape feature extractor, such as Speeded-Up Robust Features (SURF), and a colour feature extractor, such as a local colour feature extractor.

Optionally, the at least two different feature extractors includes a texture feature extractor, such as Histogram of Orientated Gradients (HOG).

Optionally, the at least two different feature extractors include a shape feature extractor, a colour feature extractor, and a texture feature extractor.

Optionally, the at least two different feature extractors further include a reflectivity feature extractor.

In certain embodiments, the image processor is adapted to receive plural video images from the video camera and to generate the recognition output based upon classification of a plurality of frames of this live video frame data. Optionally, at least 30 video frame images are classified and the recognition output is based upon the classification made for at least 70% of the plurality of frames i.e. about 21 frames of the 30 frames. This improves the accuracy of the classification made.

Optionally, the classifier may be trained to predict at least three output classes, including a correctly clothed class, an incorrectly clothed class, and an unclothed class.

In certain embodiments, the clothing compliance detection apparatus is further adapted to classify a predetermined list of plural body regions of the individual, each of said plural body regions having at least one associated item of clothing, and wherein each associated item of clothing has an associated clothing classifier for classification of that body region as clothed, or unclothed, and wherein at least one body region has an associated beard classifier.

In further embodiments, the predetermined list of body regions comprises body regions selected from a list including, a hat region, a goggle region, a snood region, a beard region, a torso region, a left hand region, a right hand region, a left foot region, a right foot region.

In still further embodiments, the clothing compliance detection apparatus is further adapted to generate user perceptible feedback depending on the value of all the classifications made such that the user is informed if; a) none of the listed body regions are clothed by clothing associated with the clothing classifiers, or

b) at least one of the listed body regions are clothed by clothing associated with the clothing classifiers, or

c) all of the listed body regions are clothed by clothing associated with the clothing classifiers.

In one aspect, the beard classifier comprises a multi-descriptor trained image classifier.

Such a multi-descriptor trained image classifier is trained with a set of labelled vectors representing a training image set for the body region, with beard (bearded) and without beard (unbearded), wherein each labelled vector incorporates features extracted from an image of the training image set via at least two different feature extractors.

In one embodiment, the image processor first classifies the body region using the beard classifier and only if the body region is classified as bearded by the beard classifier subsequently classifies the same body region with the associated clothing classifier. If the body region is classified as unbearded by the beard classifier, a subsequent classification by the clothing classifier is not made.

Optionally, the image processor is configured such that, upon classification of a body region as a bearded body region, user perceptible feedback is provided to prompt the individual to don a suitable item of clothing, for example a beard snood over that region.

Optionally, the body region is the beard region of a face.

In various embodiments, the image processor is configured to classify plural beard region components, which together make up the beard region, in order to improve classification accuracy. In certain embodiments, the beard region components comprises sub-regions, for example, the sub- regions comprise a moustache region, a goatee region and a left mass region and a right mass region.

In a further embodiment, the clothing compliance detection apparatus is configured to provide the user perceptible feedback to prompt donning a suitable item of clothing, for example a beard snood over the plurality of regions if at least one of those regions are classified as bearded. In still other embodiments, the image processor is further configured to classify the bearded body region, using a clothing classifier for the suitable item of clothing, for example the beard snood, as clothed or unclothed, and provide further user perceptible feedback to the individual accordingly.

Optionally, the video camera, for providing at least one video image of an individual to the image processor, is a multi-sensor video capture device selected to provide information in the visible spectrum and also information from the infrared spectrum, in particular the near infrared (i.e. with a wavelength of λ = 0.75-1.4 urn^'), so that the at least one video image provided to the image processor comprises information from the visible spectrum and infrared spectrum.

Optionally, the multi-sensor video camera device is a Microsoft® Kinect® or an Intel® Real Sense.

Optionally such a multi-sensor video camera device can be used to incorporate depth information in the at least one video image provided to the image processor.

Where the at least one video image provided to the image processor comprises information from the visible and infrared spectrum, especially the near infrared, the beard and/or clothing classifiers can be trained with an image training set comprising images which incorporate information from the infrared spectrum, especially the near infrared, in addition to the information from the visible spectrum. These images would be provided by the same type of multi-sensor video camera. Optionally depth information can also be used to train the classifiers where this is incorporated in the at least one video image provided to the imaging processor.

Optionally, the video camera further comprises a thermal imaging camera capable of providing images to the image processor which comprise information from the long-wavelength infrared spectrum (i.e. with a wavelength of λ = 8-15 μιτι). Where the at least one image provided to the image processor comprises such information, the beard and/or clothing classifiers are trained with an image training set including images which comprise information from the long-wavelength infrared spectrum, recorded by the same type of thermal imaging camera.

According to a further aspect of the invention, there is provided a clothing standard enforcement apparatus for controlling access to a controlled zone comprising the clothing compliance detection apparatus described herein, and further comprising:

a lockable portal to control entry to the controlled zone, said lockable portal adapted to be controlled by the recognition output produced by the clothing compliance detection apparatus such that the portal permits entry by the individual to the controlled zone only when the individual is classified as being clothed to a predetermined clothing standard by the clothing compliance detection apparatus, which clothing standard comprises a requirement for at least one predetermined body region of the individual to be clothed by a predetermined item of clothing prior to entry into the controlled zone, and wherein the image processor is configured to classify the at least one predetermined body region using a clothing classifier for the predetermined item of clothing.

In certain embodiments of such aspect, the clothing standard enforcement apparatus is further provided with an input recognition module adapted to recognise at least one input from a user, and to carry out a predetermined sequence of steps in response to the at least one input from the user, including:

1) upon recognising a first input from the user, classifying the at least one predetermined body region with the clothing classifier for the predetermined item of clothing,

2) providing first user perceptible feedback indicating the classification made of the predetermined body region and, if that the classification is unclothed, prompting the individual to don the predetermined item of clothing,

3) upon recognising a second input from the user, classifying the at least one predetermined body region using the clothing classifier for the predetermined item of clothing , and

4) providing a second user perceptible feedback indicating the classification made of the predetermined body region and, if that the classification confirms that the clothing is worn, causing the portal to be unlocked to permit entry of the individual to the controlled zone.

In certain embodiments, the user perceptible feedback is provided visually via a video display.

In certain embodiments, the user perceptible feedback is provided to a video display with a live video showing the mirror image of the individual. The user perceptible feedback is provided as graphic overlay to the live video showing the mirror image of the individual.

In certain embodiments, the clothing standard enforcement apparatus comprises a clothing compliance detection apparatus adapted to classify a predetermined list of plural body regions of the individual, wherein each body region of the list of plural body regions has an associated clothing classifier for classification of that body region, and upon receiving the first input from the user, each body region of the list of plural body regions of the individual is classified using the associated classifier, and the first user perceptible feedback is generated based upon the totality of the classifications made, and upon receipt of the second input from the user, each body region of the list of plural body regions of the individual is classified using the associated classifier, and the second user perceptible feedback is generated based upon the totality of the classifications made.

Alternatively, the clothing standard enforcement apparatus comprises a clothing compliance detection apparatus adapted to classify a predetermined list of plural body regions of the individual, wherein each body region of the list of plural body regions has an associated clothing classifier for classification of that body region, wherein the predetermined list of plural body regions of the individual is arranged as a sequential list, and wherein, upon receiving the first input from the user, the first user perceptible feedback is generated based upon the totality of the classifications made, and upon receipt of the second input from the user, each body region is classified in a stepwise fashion, starting with first body region and associated classifier, and moving sequentially through the list upon confirmation that each associated item of clothing is worn, and providing the second user perceptible feedback upon completion of the list.

In certain embodiments, the clothing compliance detection apparatus, or clothing standard enforcement apparatus, is used to detect personal protective equipment.

In certain embodiments, the controlled zone is a clean room which requires a mandated level of personal protective equipment, which mandated level of personal protection is the predetermined clothing standard.

In certain embodiments, the individual is required to don these items in a specific order, (the gowning procedure) to comply with the mandated level of personal protection.

In some embodiments, the user is the individual.

In some embodiments, the input recognition module comprises a gesture detection device which is able to interpret gestures from the live video frame data and so recognise a pose made by the individual.

In some embodiments the first input comprises a pose made by the individual. Optionally this first pose is a presentation gesture.

In some embodiments, the second input comprises a pose made by the individual. Optionally this second pose is a presentation gesture.

There is provided a method of creating a classifier for use with the clothing compliance detection apparatus in any of the embodiments described herein, and the associated clothing standard enforcement apparatus of the embodiments described herein, comprising one or more of the steps of:

1) creating a set of training images and a set of testing images,

2) extracting shape features and colour features from the training images using an associated feature descriptor and colour descriptor,

3) constructing independent bags of visual words for the shape features and colour features of the training images,

4) encoding training set images with each bag, generating a shape feature vector, and a colour feature vector of a set length for each, 5) Row-normalising these vectors with respect to each other, to ensure each carries equal weight,

6) concatenating these normalised vectors to form a single multi-feature vector for each training image, and

7) training a Support Vector Machine classifier using these labelled multi-feature vectors, where the labels indicate the correct class labels.

In some embodiments, in addition to the shape features and colour features, texture features are also extracted from the training images using an associated descriptor, and used in the construction of the classifier by constructing an independent bag of visual words for the texture features to generate texture feature vectors for the training set images, which are then row- normalised and concatenated with the row-normalised shape vectors and row-normalised colour vectors to form the single multi-features vector for each training image.

In some such embodiments, the method of creating a classifier is used for constructing a clothing classifier uses training images which consist of images of body regions which are clothed by the clothing associated with the clothing classifier, and a a further set of images of the body regions in which the clothing associated with the clothing classifier is not worn, and the associated single vectors for each image are labelled "clothed", or "unclothed", accordingly.

Optionally, the method of creating a classifier is used for constructing a clothing classifier uses training images which consist of images of body regions which are clothed by the clothing associated with the clothing classifier, and a a further set of images of the body regions in which the clothing associated with the clothing classifier is not worn, and a still further set of images of the body regions in which the clothing associated with the clothing classifier is worn incorrectly, and the associated single vectors for each image are accordingly labelled "clothed", or "unclothed", or "clothed incorrectly". In some embodiments, the method for constructing a beard classifier uses a test image set consisting of bearded body regions, and unbearded body regions. Bearded body regions e.g. beard component regions, are body regions covered by at least the amount of hair which require gowning, e.g. wearing of a beard snood, to prevent shedding of hair. Unbearded body regions, e.g. beard component regions, are body regions which are covered by less than the density of hair which corresponds with a requirement for gowning to prevent shedding.

A particular advantage of the present invention is the provision of apparatus which can recognise specific pieces of PPE and the order in which a user puts them on to validate a full gowning procedure without requiring modification to existing PPE. The application further features a training module that allows an unspecialised user to define new gowning procedures by introducing new pieces of PPE and setting new orderings on them. Where gowning procedures require some users to wear a beard snood, dependent upon the amount of beard present, an integrated beard detection module determines if the operator has sufficiently dense facial hair to warrant its necessity. Using a third party API, the application is able to restrict access to a high-risk facility, such as a pharmaceutical manufacturing plant, until such time as adherence to the gowning procedure has been completely verified.

Because the application, and the associated apparatus is inside a 'dirty zone', an area siloed to reduce the risk of staff and external equipment introducing contaminants, no contact can be allowed between human operators and hardware. Consequently, the application preferably employs alternative human-computer interaction (HCI) methods than the standard mouse and keyboard set-up, to prevent possible cross- contamination. In addition, the application preferably provides a large, high-definition and portrait-oriented display that will remain wall-mounted in the d i rty zo n e . The application can also provide the user with continuous feedback that indicates which item is next to be gowned, and display an error if the specified gowning procedure is violated in any way.

According to a further aspect of the invention, there is provided a computer program comprising instructions to cause the clothing compliance detection apparatus to execute the steps described herein, or to cause the clothing standard enforcement apparatus to exectute the steps set out herein.

DESCRIPTION OF DRAWINGS

FIG. 1 shows, schematically, a clothing standard enforcement apparatus and an associated controlled zone and changing room.

FIG. 2 depicts 25-point skeletal mapping approach accomplished using a high-fidelity depth sensor, as mapped by a Kinect®

FIG. 3 depicts a five-point basic facial mapping approach as mapped by a Kinect®

FIG. 4 depicts an over 1000-point facial mapping approach as mapped by a Kinect®.

FIG. 5 depicts an "entrance gesture" which involves a user being stood in an upright position with their right arm raised above their head. FIG. 6 depicts a "presentation gesture" which involves a user being stood in an upright position with their arms extended to their sides at approximately waist height (e.g., at approximately 45 degrees).

FIG. 7a and 7b show various body regions alongside the Kinect® frame type used to plot vertices of the desired extraction region. FIG. 8 depicts beard sub-regions.

FIG. 9 depicts various body regions, which have been separated into two classes, circular or rectangular, depending on which shape the region most resembles and names of the relevant body joints and the encompassing rectangles. FIG. 10 depicts connected annotated face points for hat, glasses and beard snood (aka net for covering facial hair) regions.

FIG. 11 depicts beard regions, constructed by combining HighDetailFacePoints, and then connected in selected point patterns.

FIG. 12 depicts an example of a pre-trained object, identified by a set of scale and rotation- invariant key points selected by SURF detector, that can be detected through by matching algorithms.

FIG. 13 depicts how an image's local colour information is represented by a described algorithm, which is adapted from an existing MATLAB® implementation.

FIG. 14 depicts Histogram of Orientated Gradients (HOG) features for a sample glove image using three different cell sizes. FIG. 15 depicts three material types (bark, wood and brick) taken from a textured surface dataset, which were used to train a SVM.

FIG. 16 depicts an algorithm/process flow chart used to generate a multi-feature bag of visual words model to produce an SVM classifier.

FIG. 17 is a bar chart depicting the overall accuracy of the multi-descriptor trained classifier scheme when compared with single-descriptor trained classifiers schemes.

FIG. 18 depicts an individual in the "presentation gesture" in an "ungowned" state.

FIG. 19 depicts an individual in the "presentation gesture" in a "gowned" state.

FIG. 20 depicts, schematically, a training scheme used to create a Multi-Descriptor Trained Classifier.

FIG 21 depicts, schematically, a training scheme used to create a Gowning Procedure File.

DETAILED DESCRIPTION OF THE INVENTION

Referring now to FIG. 1, there is shown, schematically, a clothing standard enforcement apparatus 2 for controlling access to a controlled zone 4, such as a clean facility 4 used for the manufacture of pharmaceutical products, or for microfabrication of electronic components. A changing room 6 adjoins the controlled zone via a door 8 which separates the changing room from the controlled zone. This door 8 is lockable by a computer controlled lock 10 to form a lockable portal 12 into the controlled zone 4.

The changing room 6 is equipped with a video camera 14, in the present example a Microsoft® Kinect® forXBox One® with a Kinect® Adaptor for Windows®. The camera may be co- mounted on a wall 16 adjacent to, and above, a large display monitor 18 comprising, in the present example, a Samsung DM82E-BR 82, which is oriented in portrait form. Uniform lighting is provided by a strip of light emitting diodes 20 which border the mirror (not all shown).

The camera and monitor are co-located such that an individual 22 using the changing room is within the field of view of the camera when the individual is stood in front of the monitor. A live video feed 24, taken from the camera 14, is fed to a computer 26, in the present example, an Intel® NUC NUC5i7RYH Core i7 5557U / 3.1GHz _5¾_> Mini PC.

The computer 26 is configured to run a gowning application 28. The gowning application comprises an input module 30, an image processor module 32, a recognition module 34, and an output module 36. The gowning application also comprises at least one configurable gowning procedure file (GPD) 38 comprising a list of body regions 40, a list of clothing classifiers 42, and beard classifiers 44. The GPD file is stored on non-volatile memory such as a hard disk or solid state disk (not shown).)

The live feed video feed to the computer is processed by the gowning app to be 'flipped' about its vertical axis and then fed back as a mirrored live feed 46 to the monitor via the output module. In this way, the display monitor 18 provides a Virtual mirror' which presents a virtual reflection 48 of the individual standing in front of it, which comprises the live feed received from the camera. A graphic overlay user interface (UI) 50 is presented in conjunction with the virtual reflection to provide feedback to the individual.

The changing room is provided with storage 52 for clothing shown for illustrative purposes as 54a, 54b, 54c. Such clothing typically comprises personal protective equipment (PPE) which must be worn in order to work in the controlled zone. Such clothing typically comprises, but is not limited to, hats, goggles, beard snoods, coveralls, gloves and overshoes.

Other equipment may be provided in the changing room which must be used before entering the controlled zone, for example a hand washing station 56 which is capable of monitoring compliance with a cleaning procedure by the individual, for example by monitoring the proximity of an RFID tag worn by the individual during operation of the hand washing station. This equipment 56 may be configured to provide data 58 to the computer, in particular to indicate that a required cleaning process has been carried out.

The video camera 14 comprises a Microsoft® Kinect® which itself comprises a video camera and on board image processor 60. Although primarily used for motion-controlled video gaming, the multi-sensory Microsoft® Kinect® provides a mechanism for mapping human body and facial points to perform a pre-processing step, i.e. cropping images to only contain the relevant human body region. Alongside its 1080p video camera and infrared sensor, the Kinect® features a high-fidelity depth sensor that provides accurate 25-point skeletal mapping as shown in FIG. 2, five-point basic facial mapping as shown in FIG. 3, and over 1000-point facial mapping as shown in FIG. 4. F I G s . 2, 3 and 4 show the position of these skeletal and face points mapped by the Kinect®.

By tracking the human skeleton, the Kinect® can interpret a user's body movements, referred to as gestures, to provide contactless motion control. As well as providing a mechanism for performing the pre-processing step, this gesture capability can be harnessed to determine if the user is stood in a suitable pose before recognition takes place. By doing this, it is not necessary to perform a computationally-expensive prediction process on every frame, or even just those containing the user. Instead, the process need only be run on frames in which the relevant body region is clearly visible to the Kinect® Sensor. After several key enhancements from the initial version, Microsoft® have released a

Windows®-compat\b\e version of the new device alongside the Kinect® for Windows® SDK that allows the development of Kinect®- nd!o\&A software. Due to the relatively low cost and its rich API collection, the Kinect® has lent itself well to a substantial amount of previous academic research, for example by authors pursuing projects that involve human body tracking. Gesture detection is implemented in the application to recognise two poses that a user should adopt during the running of the recognition module. The first pose, referred to as the entrance gesture, should be adopted by a user to indicate their readiness to begin the gowning process. The user should be asked to adopt the second position, referred to as the presentation gesture, when there is a need to ensure that every item of PPE is clearly visible to the sensor. This enables improved classification of the body regions associated with the PPE.

As part of each frame's processing cycle, any human body that is in a tracked state - i.e. the Kinect® is able to map their skeleton - is continuously tested to see if they are stood in either the entrance gesture or presentation gestures.

Entrance Gesture

In the present embodiment, the entrance gesture involves a user being stood in an upright position with their right arm raised above their head. With reference to FIG. 5, the application will 'recognise' this gesture if a > 40°, β > 45° and x > 0.

Presentation Gesture

A user can adopt the presentation gesture by standing in the position shown in FIG. 6. The presentation gesture recognition is represented via a database file wh i ch wa s generated with the Microsoft® Visual Gesture Builder software (Microsoft Corporation, 2015c). The database file that completely describes the required skeletal joint positioning and the gowning app uses this file to determine whether the user is stood in the presentation gesture.

The Microsoft® Visual Gesture Builder software generates the database file by applying a machine learning approach to pre-recorded Kinect® skeletal frames from three volunteers. After specifying parameters that specify the type of gesture - for example, which parts of the body it involves, whether is it symmetrical etc. - the software analyses each volunteer's skeletal frames, which are manually labelled to indicate those which show the subject stood in the gesture pose. The presentation gesture was trained on a training set partition of this data before using the testing set partition to confirm the accuracy and tolerance.

The computer 26 and video camera 14 depicted in FIG. 1 are configured to provide a recognition module which is able to receive live video frame data from the camera 14 and to extract at least one cropped body region which corresponds with a predetermined body region of the individual. Two techniques have been used to produce the best possible images for evaluation by

PPE item classifiers. This has been achieved by careful manipulation of the Gesture, Body Index, Face and HD Face APIs found in the Kinect® SDK ^'(Software Developer Kit).

A pre-processing technique has been developed to extract cropped body and facial regions from the live Kinect® video feed before the image processing algorithm is applied. Gesture detection has also been developed to reject frames in which the user is either not present or not stood in a readable pose, i.e. those in which the body region(s) are not clearly visible to the Sensor. A similar technique has also been developed to identify the aforementioned entrance gesture, which a user should adopt to indicate their readiness to begin gowning items.

Extracting Body Components Before developing the described pre-processing procedure, the relevant body regions were first defined. Owing to the fact that some PPE items cover multiple separate body regions (such as nitrile glove, required to be worn on both the left and right hands), clothing types have been defined that indicate these body region sets. After a new PPE item's clothing type has been specified, a single gowning procedure can include the item multiple times, provided a different body region is selected for each occurrence.

Although enough body regions have been included to support most commonly-used PPE, the list is not completely exhaustive and other body regions could be readily incorporated. The tables of FIGs. 7a a nd 7b show each body region alongside the Kinect® frame type used to plot vertices of the desired extraction region. To cater for PPE items that cover multiple body regions and to ensure an item is not introduced that extends beyond the defined body regions, the clothing types in following Table 1 have been defined.

Table 1: Clothing Types

Clothing Type Allowed on the following reg

Hat Hat

Goggles Goggles

Beard Beard

Torso Torso

Glove Left Hand, Right Hand

Boots Left Foot, Right Foot

Although the beard region describes an appropriate facial area for the beard snood, it is not suitable for use by the beard detection module. Due to the fact that some people choose to exhibit facial hair in subdivisions of this large region, smaller areas have been constructed to allow beard detection on individuals who have non-uniform facial hair.

These smaller regions are the beard regions in the table of FIG. 8, which are defined for use by the beard detection module. By breaking the beard region of the individual's face into these four regions (moustache, goatee, left mass and right mass), each can be analysed independently when seeking the presence of facial hair for improved accuracy.

As previously discussed, the latest version of the Kinect® SDK supports 25 skeletal joints, 5 face points and over 1000 high detail face points that are dealt with in the Bodylndex), FaceFrame and HighDefinitionFaceFrames sections of the Kinect® for Windows® SDK. By harnessing these APIs, polygons are mapped to the original scene image by connecting points around each body region's boundary. Since these points are captured using Kinect®s depth sensor, their coordinates are first converted to the colour-space frame using the inbuilt CoordinateMapper.

After the vertices of each polygon are obtained, encompassing rectangles are constructed that surround the body region. These rectangles are slightly enlarged about their centre points to account for any small error in Kinect® point mapping. The portion of the image contained within the encompassing rectangle is then extracted from the colour frame and written to a new body component image.

Extracting Skeletal Components

The Bodylndex API from the Kinect® SDK is used to map the 25 skeletal joints as previously shown in FIG. 2. Body component regions are formed by connecting sets of these points and are categorised into two classes, circular or rectangular, depending on which shape the region most resembles. The ta ble of FIG . 9 shows these regions, the names of the relevant body joints and the encompassing rectangles.

Circular Body Components

Circular body components are formed of two joints - the body region's centre point and one that lies on the boundary of the region. To account for region articulation whereby the body component region is rotated away from the camera, multiple extremity points can be defined allowing the algorithm to select the one that is furthest from the centre point. By carefully selecting these points, the application can construct a tightly-fitting polygon around the body region.

Let c = (XQ , y_c ) represent the centre point of a specific body region and

Pi = ( _/ Vi i = 1_{/ /} n represent the region's extremity points. The polygon is then formed by calculating;

PMAX = (XMAX- YMAX) = argmax (x, - x_c)² + (y, - y_c)²

Piie{l n}

PMAX = (* MAX, V MAX) = argmax VOi - x_c)² + (y_t - y_c)²

i'e{l "}

A circle is then constructed with centre point p_c , and radius r given

r = (XMAX - Xc)² + (YMAX - Yc)²

An encompassing rectangle is then formed using an offset ε for safety with top- left point (x_c - r - £, y_c - r - ε), width 2(r + ε) and height 2(r + ε).

Rectangular Body Components

Body component rectangles are constructed by connecting four independent vertices to form a simple polygon of maximal area before performing the described enlargement step. Extracting Facial Components

The FaceFrame API has been used to map a user's face points to allow body component regions to be similarly extracted from the colour frame image. HighDefinitionFaceFrames have not been used in the main sequential gowning process owing to their increased processing time and requirement for the user to stand closer to the Kinect® than otherwise required. However, HighDefinitionFaceFrames will be used to identify the region subdivisions during the beard detection process, as the increased number of mapped points provides the necessary accuracy in constructed polygons.

The points in FIG. 3 are used for the FaceFrame facial mapping.

The hat, glasses and beard snood (aka net for covering facial hair) regions are formed by connecting the annotated face points as shown in the table of FIG. 10.

Extracting Beard Region Components

Beard regions are constructed by combining HighDetailFacePoints. The regions are then formed by connecting the points as in the table of FIG. 11.

Table 7 shows an application independently extracting each of these beard regions. Note that for beard region images, pixels that are not contained within the original encompassing rectangle are set to black to reduce noise that may otherwise effect the accuracy of produced item classifiers.

The gowning app 28 described in FIG. 1 is configured to classify the at least one cropped body region using a classifier constructed as set out hereinafter.

Construction of the Classifier

This classifier comprises a Multi-Feature Bag of Visual Words with SVM Classifier (MFBS). The MFBS scheme for differentiating between clothed and unclothed images for arbitrary PPE items s o l v e s previous problems of overreliance on individual features by simultaneously incorporating representations of shape, colour and texture. It is noted that the present invention relies on a classifier which classifies a body region as clothed or unclothed by a specific item of clothing. This specific item may be worn over bare skin, or over other clothing. Hence the classification of clothed or unclothed relates specifically to the presence, or otherwise of a predetermined piece of clothing, not the presence of any item of clothing.

A shape descriptor in the form of a SURF extractor was used to encode images in terms of their local interest points, which generally correspond to edges and corners. A colour descriptor, in the form of local colour extractor was developed to represent the position of particular colours and a texture descriptor in the form of a HOG extractor used to encode the local gradient information which was thought to indicate a material's texture. In order to use this information to classify unseen images, a Bag of Visual Words framework was used to standardise the interest point information across both classes. By associating each training image's standardised interest point data with its correct classification label, a commonly- used classification model known as Support Vector Machine (SVM) was then trained on the entire set.

The MFBS approach provides an improved classifier scheme that constructs PPE item classifiers using multiple image features, which in this case represent shape, colour and texture information.

Shape Descriptor: Speeded-Up Robust Features (SURF)

Speeded-Up Robust Features (SURF) is a technique used to extract local interest points from a given greyscale input image. A SURF detector produces a set of scale and rotation-invariant key points that can be used by matching algorithms to detect pre-trained objects (see, e.g., as in FIG. 12). Each SURF point is represented by a 64-dimensional vector that encodes the point's location and local neighbourhood i n fo rm atio n .

Even prior to designing and testing the algorithm, this technique was expected to face the converse problem to use of a classifier based solely on colou r such as a n HSV Colour Thresholding scheme. Owing to SURF's restriction to single-channel (or greyscale) images, the technique has a sole reliance on shape information and was therefore likely to systematically misclassify items that cause little deformation to the body region when worn.

Colour Descriptor: Local Colour Feature Extractor

A local colour descriptor was constructed to produce a set of vectors that indicate an image's average colour across a set of extracted sub-blocks of a fixed size. By virtue of the pre-processing step, the target body region in both clothed and unclothed images appear in approximately the same place, meaning that the location of each colour point is likely to be useful. To capitalise on this, each five-dimensional feature vector encodes the sub-block's centroid alongside its average colour. Due to its ability to provide a quantifiable measure of the visual differences between colours, images are first converted to the three-channel LAB colour space, where L represents 'lightness' and «? and b represent the colour dimensions. Although LAB provides the most accurate colour representation, it is often difficult to implement, given the lack of simple conversion formulas from RGB or HSV. This is due to the fact that LAB is device-independent, unlike RGB and HSV, which therefore must first be transformed to an absolute colour space before conversion to LAB.

The following algorithm, as shown in Table 2, adapted from an existing MATLAB® implementation, describes how an image's local colour information is represented, and is shown in FIG. 13. Table 2: Image's Local Colour Information Algorithm

Local Colour Feature Extractor

1) Convert RGB image to LAB colour space using library.

2) Split image into x, y independent cells of size 16x16.

3) Construct a colour feature vector (L, a, b, X, Y) for each cell where (L, a, b) is the average LAB value over the cell and (X, Y) are the cell's centre-point coordinates, normalised to a range [-0.5, 0.5], allowing feature vectors to be compared to images of varying dimensions.

4) Return the set of colour feature vectors.

This feature extraction technique was initially evaluated using a custom-built application that trains an SVM using images from three categories: boats, flowers and horses. As each category expresses a distinctive colour (boat images are predominantly blue, flowers yellow and horses brown) it is made easier to judge the method's effectiveness over the original dataset. Extracted features from the three classes were passed to the SVM trainer after being encoded as a bag of visual words.

The test showed a relatively high classification accuracy and the technique has therefore been used to represent colour information in the multi-descriptor trained classifier. Texture Descriptor: Texture Feature Extractor (HOG)

The histogram of orientated gradients (HOG) is a well-known technique used to describe an image's local texture information. The method first splits the input image into cells of a predefined size and produces a feature vector that indicates the gradient information in each cell. The method has been used extensively for detecting pedestrians in CCTV images and for problems in optical character recognition, but has here been used to classify an object's material.

A suitable balance between information loss and feature length has been identified by optimising the descriptor's cell-size parameter. FIG. 14 shows HOG features for a sample glove image using three different cell sizes. It can be observed that cells of 64x64 size appear to lose information, whereas 16x 16 cells produce features of extremely high dimensionality. To best satisfy these two concerns, the HOG descriptor appears to be set best to work on cells of size 32x32, although 64x64 had to be used to prevent the occurrence of memory overload exceptions. The formal testing section evidences the impact this had on SVMs built using HOG features.

An application has been written to evaluate the effectiveness of HOG features for texture classification by training an SVM on three material types (bark, wood and brick) taken from a textured surface dataset shown in FIG. 15. Extracted HOG features were encoded with a bag of visual words before being passed to the SVM training algorithm.

The high accuracy results obtained, shown in the following table, demonstrate the effectiveness of using HOG information to classify different material types. HOG features have therefore been used to represent each class's texture information in the MFBS combined classifier scheme.

Table 3: Image Classification Accuracy

Bark Wood Brick

Bark 0.75 0.16 0.09

Wood 0.07 0.91 0.02

Brick 0.02 0.04 0.95

Accuracy 0.87

Baa of Visual Words (BoW)

By design of the SURF and local colour feature extraction algorithms, it is unlikely that the result of processing two separate images - even those taken from the same class - would yield the same number of feature vectors. To prevent this disparity resulting in some images being over-represented during the training process, it was necessary to standardise this information over each dataset image. This could have been trivially achieved by retaining only the 'strongest' k points for each image, where k is given as the fewest number of key points returned over the entire training set. However, owing to its proficiency for representing relationships between pre-categorised images, a bag of visual words approach has been used to standardise the information. Bag of Visual Words is an approach used to represent individual images in terms of the entire input set. The process begins by constructing a vocabulary of /c-many features, which are chosen to be the most significant across all input images.

Although the features are generally more abstract, the following example may aid with explanation. Consider the process of forming a vocabulary over an input set that contains multiple images of humans wearing goggles and multiple images of humans not wearing goggles. By analysing the SURF points over all of these images - irrespective of clothed or unclothed class - significant features may be returned that, for example, represent a logo marking, human nose or plastic frame.

By constructing this vocabulary of /c-many significant features, feature vectorsare then constructed by encoding each training image as a histogram of its significant-feature occurrences. In other words, each training image is assigned a /c-dimensional vector where the i^th element describes the number of occurrences of the i^tn significant feature in the image.

With this technique, is was hoped that feature vectors across the clothed and unclothed classes differ in some recognisable way. To refer back to the example, it should be that the class of images that show humans wearing goggles contain more logo and frame features than the class of images that do not contain goggles. Since both classes depict the face region, they are equally likely represent the nose feature. By virtue of this technique, it is considerably more likely to obtain highly informative feature vectors.

Support Vector Machine f SVM)

Once the feature vectors for each class have been calculated, they are then provided to a supervised classification model, which uses each input set's known class labels to develop a prediction function that is capable of categorising similar but previously-unseen images. Owing to its frequent and successful use in image classification, this project will use a support vector machine as the prediction model.

A support vector machine is a commonly-used supervised model that calculates a classification boundary that best partitions a training set's feature vectors in a high-dimensional feature space. For linear SVMs that rely on linear kernels, this boundary is a straight line, so only its gradient and intercept need be determined. However, it is possible to define more complex SVMs that allow their classification boundaries to have extra degrees of freedom by increasing the feature space's dimensionality, using a technique known as the 'kernel trick'.

An SVM must be trained on a labelled set of vector features, where each feature's label indicates its correct class (in this case, either clothed or unclothed). The SVM's internal 'compute' function then executes an algorithm that fits the best classification boundary line according to the mapped data points. To classify an unseen feature vector, the SVM first represents the vector in feature space coordinates and returns the class label that corresponds to the side of the classification boundary it inhabits. As in the previous proposal, confusion matrices can be constructed by running the trained SVM on the testing set partition.

Constructing the Multi-Feature Model

Research was conducted to identify a suitable method for combining extracted SURF, Colour and HOG features in a single, improved classifier. The three feature extractors produce the following sets:

• s SURF feature vectors of length 64

• c COLOR feature vectors of length 5 • t = 1 HOG feature vectors of length 93960

Even by employing a technique to ensure each extraction method produces an equal number of feature vectors, an image's three descriptor types cannot simply be concatenated due to the varying dimensionalities. Moreover, as each feature vector type encodes the interest point's location, these would be in direct conflict if combined into a single vector.

This issue has been resolved by employing a similar bag of visual words scheme, although this time, an individual bag is required for each feature type. Each training set image is encoded into three normalised vectors of identical length - a SURF feature vector, a colour feature vector and a texture feature vector. As each image's three feature vectors are now of standard length and only contain general (rather than local) feature information, they can be concatenated to form a multi-feature vector. The matrix formed by appending each multi-feature vector is then row-normalised to ensure that each constituent feature is equally represented (i.e. carries an equal weight) before it is passed as training data to construct a multi-feature SVM classifier.

Algorithm

To generate a multi-feature bag of visual words model to produce an SVM classifier, the following algorithm is employed, as shown in FIG. 16.

Multi-Feature Bag of Words with SVM Scheme

1) Partition into training and testing sets.

2) Extract SURF, colour and texture features using the described algorithms.

3) Construct three independent bags of visual words - SURF, colour and texture bag.

4) Encode training set images with each bag, generating a SURF feature vector, colour feature vector and texture feature vector of a set length for each.

5) Row-normalise these vectors with respect to each other, to ensure each carries equal weight.

6) Concatenate these normalised vectors to form a single multi-feature vector for each training image.

7) Train a SVM classifier using these labelled multi-feature vectors, where the labels indicate the correct class labels.

Formal Testing

To demonstrate and evaluate the effectiveness of this technique, an algorithm was implemented in MATLAB® and then compiled into a C program using the MATLAB® Compiler with results reported to disk. The application also serialises the constructed bag of words and feature vectors, allowing them to be loaded at a later time.

Preliminary testing and evaluation has shown the multi-descriptor trained classifier to provide a strong classifier for both goggles and gloves than previous single-descriptor trained classifiers, demonstrating an improved ability to cater for items that have no distinguishable colour or cause little shape change when worn. In comparison to three single-descriptor trained classifiers (SURF- only, Colour-only, HOG-only), the multi-feature classification approach showed a significant accuracy improvement on three of four PPE items in tests.

Testing and Evaluation

The MFBS multi-descriptor trained classifier was tested against four single-descriptor trained classifiers as set out below in Table 4. All schemes were run on each PPE item set out in Table 5 below, trained using a random 30% training set partition and tested using the remaining 70% testing set partition.

Table 4

Data Capture

A group of sixteen volunteers was identified to take part in this project. Images were captured of the body regions in Table 5, in states indicated by their class name. Justification for the selection of these items and an example for each class can be seen in the table. Table 5

Comparison of Techniques

Figure 17 shows the average performance statistics obtained by training and testing each classifier on each PPE item. Indicated by the final series (labelled 'MFBS'), the data indicates that a remarkably high classification accuracy was obtained when the multi-descriptor trained classifier was employed. The scheme yielded a 91.8% accuracy and 90.3% precision rating over all PPE items and an 5.6% false negative rate (occurring when gowned images are misclassified). The 10.7% false positive rate is the lowest among each tested scheme, and this is expected to be reduced still further by classifying multiple live frames before reporting the overall result. The more frames that the application considers, the higher the accuracy reading and, critically, the lower the false positive rate. The Local Colour Feature Extractor (LCFE) used for training of the classifiers is shown to be superior over the use of HSV scheme - i.e. the scheme that only uses colour information.

While the simultaneous incorporation of a texture descriptor into the classifier, with the colour descriptor and shape descriptor further improves accuracy, it is believed that the most significant benefit is provided by the simultaneous incorporation of the shape descriptor and the colour descriptor into the single classifier. Nevertheless, the technique described above can be used to simultaneously incorporate a number of descriptors into a classifier, e.g. a texture descriptor, a shape descriptor and a colour descriptor, or still further descriptors e.g. a texture descriptor, a shape descriptor, a colour descriptor, and a reflectivity descriptor. This may be especially useful where the PPE clothing comprises protective eyewear such as goggles or a visor which can have little to no colouring and so the further descriptor for reflectivity may be especially useful for good prediction accuracy.

Other types of descriptor for feature extraction

Shape

While the classifiers described herein are trained using shape features extracted via a SURF shape descriptor, other shape descriptors could be used, as alternatives or in addition to the SURF shape descriptor, for feature extraction. The obvious alternative to a SURF shape descriptor is using SIFT, which extracts 'interest points' corresponding to edges and corners from an image, and which may enable training of a more accurate classifier although this would require increased processing time when comared to SURF. Other descriptors which could be used include DAISY, which uses dense matching, or GIST, which yields a global representation, or BRIEF, which uses binary strings as a feature point descriptor which is claimed to outperform SURF and SIFT in certain instances.

Colour

Similarly, while the classifiers described herein are trained using colour features extracted via a Local Colour Classifier, other colour descriptors can be used as alternatives to, or in addition to this colour descriptor.

To be specific, a valid colour descriptor is formed by selecting a choice of colour space and using either a global or local measure. The RGB colour space represents a particular colour by its red, green and blue components. Another alternative is HSV. While HSV is believed to be less accurate than the LAB space, which is highly sensitive to minor colour differences, HSV benefits from having a simple conversion formula from the RGB space, and so requires less system resource.

Nevertheless, after experimenting with both HSV colour descriptor and the local colour feature extractor, the block-based local colour feature extractor version showed better accuracy, particularly for identifying PPE. This is believed to be because the ability to locate of colour patches, i.e. by location, gives an advantage when classifying more complicated items, such as goggles and beard.

Another potential colour descriptor is a colour SIFT/SURF descriptor which runs the well- known key point detection algorithm over each channel of the colour space.

Texture

Finally, while the classifiers herein are trained using texture features extracted via a HOG method texture descriptor these can be omitted or substituted with, or used in addition to another type of texture descriptor such as a block-based SURF algorithm, or using wavelet methods wherein vertical and horizontal wavelet coefficients are first extracted, allowing magnitudes and directions to be calculated for specific points.

Operation of the Clothing Compliance Detection Apparatus

In use, the recognition module is first targeted at the gowning procedure file (GPD) which defines a predetermined gowning process for the clean zone. This gowning process stores a list of the body regions which must be 'gowned' in protective clothing and, for each said body region, a clothing classifier associated with the associated protective clothing for each body region. Where a beard snood is required to be worn over any beard present, the list of body regions includes a beard region, divided into bear region components with an associated beard classifier for each beard region component.

In the current embodiment, the body regions are stored in a sequential order corresponding to the order in which the protective clothing must be donned.

When the recognition module is initialised, the gowning procedure file is first deserialised and interpreted to construct the process that will be followed.

The image processor constantly monitors the live feed received from the camera, and displays the live Virtual' reflection of this live feed in real time via the monitor. The image processor constantly monitors the live feed received from the camera for the presence of an individual.

STEP 1:

Detection of individual

When an individual is present before the video camera, the image processor detects the individual, and indicates, via the user interface displayed with the virtual reflection on the monitor, that the individual should adopt the entrance gesture of FIG. 5. STEP 2:

Display and detection of entrance gesture

The individual then adopts the entrance gesture.

When this entrance gesture is adopted by the individual, and recognized by the application, via the gesture detection algorithm described above, the UI is updated to indicate that the individual should adopt this entrance gesture for a predetermined time. In the present example, a white circle is superimposed over the virtual reflection to indicate to the user that the entrance gesture should be maintained, for a period of three seconds to move to the next event.

After the entrance gesture has been adopted for the three seconds, the image processor works through the gowning procedure in a stepwise fashion. In the event that the entrance gesture isn't detected for this predetermined time, the process reverts to step 1.

STEP 3:

Beard determination

If the GPD file contains a beard snood item, the image processor then determines whether or not sufficient facial hair is present to make this a required item for the user. Optionally, the UI may be updated to request the individual move closer to the camera for better resolution imaging. The image processor receives plural video frames of the individual and pre-processes them to extract from each frame of live video frame data a cropped body region which corresponds with the beard region components of the individuals face. These beard region components are constructed by combining HighDetailFacePoints as set forth previously. Each beard region component is classified with the associated beard classifier for that beard region component. If any one or more beard region component is predicted to be bearded, then the UI is updated to indicate that a beard snood should be worn. If none of the beard region components are predicted to be bearded then the there is no requirement for a beard snood.

STEP 4:

Display and detection of presentation gesture, and confirmation individual is ungowned

First, the UI is updated to direct the individual to ensure they are ungowned and to adopt the presentation gesture of FIG. 6, also shown in FIG. 18 and FIG. 19. The individual then adopts the presentation gesture.

When this presentation gesture recognized by the application, via the gesture detection algorithm described above, the image processor pre-processes a predetermined number of frames of live video frame data to extract cropped body regions corresponding to each of the body regions listed in the targeted GPD file. Where a beard snood is required to be worn over facial hair of a predetermined density, this clothing classifier list includes a beard snood classifier which predicts for the entire beard region shown in FIG. 7.

The recognition module then classifies each of these cropped body regions with the associated clothing classifier for each body region. In certain embodiments, this result is calculated by checking whether the average number of classifications made over 30 frames exceeds a set threshold, preferably when 70% of the frames are classified the same (e.g. 21 frames are "clothed" or "unclothed").

Confirmation that the individual is fully ungowned, or not In the event that any one or more body regions is predicted to be clothed, the UI is updated to instruct the individual to remove all protective equipment and begin again, by adopting the presentation gesture, at which point the image processor returns to the beginning of this step 4.

In the event that all body regions are predicted to be unclothed, the image processor moves to the next Step 5.

STEP 5:

Start gowning procedure

The UI is then updated to instruct the individual to fully gown up, in accordance with the predetermined gowning procedure and, once gowned, to again adopt the presentation gesture of FIG. 6. Preferably, the UI is updated to superimpose coloured boxes over those body regions of the virtual reflection which require gowning, e.g. a red box is superimposed over the hands were latex gloves are required to be worn, and similarly, a red box is superimposed over the beard region of the individuals face where a beard snood is required. These boxes correspond with the body component regions (cropped body regions) calculated as described previously and as shown in FIGs. 7a and 7b This is shown in FIG. 18 and FIG. 19. The individual dons the correct protective clothing stored in the changing room, in accordance with the predetermined gowning procedure, and once fully gowned, i.e. wearing all of the specified protective clothing, adopts the presentation gesture, as shown in FIG. 18.

STEP 6:

End gowning procedure When this presentation gesture is recognized by the application, via the gesture detection algorithm described above, the image processor pre-processes a predetermined number of frames of live video frame data to extract cropped body regions corresponding to each of the body regions listed in the GPD file and used previously in step 5. STEP 7:

Confirmation that the individual is fully gowned, or not fully gowned

In the event that any one or more body regions is predicted to be unclothed, the UI is updated to instruct the individual to complete the gowning procedure and to again adopt the presentation gesture to indicate they are ready. Preferably, the UI is updated to superimpose red boxes over those predetermined body regions which are not correctly gowned and, optionally, to also superimpose green boxes over those predetermined body regions which are correctly gowned. The image processor then starts again at step 7.

In the event that all body regions are predicted to be correctly clothed, the image processor moves to the next step 8.

STEP 8:

Validation of gowning process.

The UI is updated to inform the individual that they are correctly gowned, preferably, by the superim position of green coloured boxes over all of the body regions which require gowning. Again, these boxes correspond with the body component regions (≡cropped body regions) calculated as described previously and as shown in FIG.7a and 7b. An example of such an updated UI is shown in FIG. 19. Optionally, the application stores a photograph of the validated individual to serve as evidence of their conformance to the procedure. The computer then instructs the computer controlled lock 10 to unlock the door, permitting the individual entry into the controlled zone. Preferably, the computer controlled lock is configured to provide feedback to the computer indicating that the individual has passed through the door so that, upon receiving this feedback, the computer restarts the process set forth above. Alternatively, the image processor can be configured to restart the gowning procedure upon detection of the presentation gesture at any time. Optional steps

Where other steps are required before entry into the controlled zone, such as a hand washing step, then specially adapted equipment, which can monitor the individual compliance with the further step can be included in the changing room. One example is the hand washing station, which uses monitoring of an RFID tag, worn by the individual, to monitor the proximity of the individual while a predetermined hand washing procedure takes place, and so infer that a hand washing step has taken place, before providing data to the computer confirming that the handwashing requirement has been complied with. In this case, the operation of the clothing compliance detection apparatus comprises a further step, 7a, which requires the computer receive confirmation that the individual has washed their hands at the hand washing station before moving to the validation step 8.

Alternatives

In an alternative operation of the clothing compliance detection apparatus, the computer works through the gowning process in an item by item way, which is to say that the steps 6 and 7 are repeated for each item of clothing. For example, where a beard snood, two gloves and overshoes were required to be worn for entry into the controlled zone, and they were to be donned in that sequence, the clothing compliance detection apparatus would

First, work through steps 1 to to confirm that individual was ungowned. Second, work though steps 3and 6 as though the beard snood were the only item on the gowning procedure file (assuming that step 5 mandated the use of a beard snood).

Third, upon confirmation of the correct wearing of the beard snood, work through steps 3and 6 as though the gloves were the only items on the gowning procedure file . Alternately this could be done separately for each glove. Fourth upon confirmation that the gloves are correctly worn, work through steps 5 and 6 as though the overshoes were the only items on the gowning procedure file. Alternatively, this could be done separately for each overshoe.

Finally, upon confirmation that the gloves are correctly worn, moving to step 8 for a final validation that all of the clothing; the snood, the gloves and the overshoes are correctly worn by the individual before moving to step 9.

Training Module

Optionally, a training module is provided for the creation of a multi-descriptor trained classifier, and also for the creation of the gowning procedure file (GPD).

The training module allows a user to create new multi-descriptor trained classifiers and new gowning procedures via two schemes, a Classifier Training Scheme and a Gowning Procedure (GPD) Training Scheme. The Classifier Training Scheme allows a user to create new classifiers for PPE items, and the GPD Training Scheme allows a user to create new gowning procedures by specifying a new ordering on pre-trained items.

Classifier Training Scheme

As shown in FIG. 20, a Classifier Training Scheme is designed to run on two types of body region images, captured with a standard RGB camera. The first set are created by the user and contain images that depict the relevant body region wearing a new PPE item, referred to as gowned images. The second set should be pre-stored and depict images of the relevant body region not wearing the item, referred to as ungowned images. The user is then asked to provide a unique name for the new item and select what kind of item is being trained from a predefined clothing list. The options given in this list are used by the application to determine which body regions the item covers when worn i.e. the items are pre-associated with a predetermined list of body regions.

The Training Scheme is then run over these gowned and ungowned images to construct a multi-descriptor trained item classifier that accurately determines the presence of the PPE item when shown a previously unseen image of the relevant body region. The algorithm can also be used to generate a beard classifier to form the basis of the beard detection routine.

The multi-descriptor trained classifier and its user-supplied metadata are then combined and serialised to disk in a suitable format for interpretation by the GPD training scheme, which is then used to incorporate the new item and its associated classifier into a gowning procedure.

Gowning Procedure Training Scheme

FIG. 21 shows, schematically, the GPD training scheme. Owing to the fact that some PPE items can be required on multiple body regions - for example, nitrile gloves are often required on both the left and right hands -the gowning procedure specifies an order on body regions, where each is associated with the item classifier that the clothing compliance detection apparatus should use. The GPD training scheme allows the user to construct a list of PPE items (possibly containing duplicates), although each should be targeted to a unique compatible body region, where each multi-descriptor trained classifier is associated with a predetermined body region.

The set of ordered multi-descriptor trained classifiers and relevant metadata, including a unique name for the procedure, is then combined and serialised to disk in a suitable format for interpretation by the recognition module.

Other alternative embodiments

The present embodiment of the invention uses a Microsoft® Kinect® to provide the video camera and also some aspects of the pre-processing of the live video frame data. It will be understood, however that other aspects of the Microsoft® Kinect® could be used, including the depth measurement, and infrared camera, which operates in the near infrared and which could augment the classification abilities of the current invention, particularly in regard of clothing such as glasses, which are transparent to visible light, but which are opaque to infra-red light. Other, alternative, multi- sensor video capture devices could be used, such as the Intel® RealSense.

The present embodiment is run on a local PC however, much of the image processing could be carried out remotely, e.g. in the cloud which would reduce the local processing requirement, as well as simplifying updates to the process used. The present embodiment shows a single Virtual' mirror which is used for the entire gowning procedure. It may be preferable however, to divide the gowning procedure into separate parts which must be carried out in separate parts of the changing room separated, for example, by a step over.

In this case, it is desirable to provide at least two cameras and displays, providing virtual reflections, so that one is present in each part of the changing room, while using a single computer to monitor a single gowning procedure carried out by an individual moving between the two displays. While this could be done by only allowing one individual into the changing room at a time, greater efficiency would be facilitated by tracking individuals as they move between the displays by using individual identification means such as image recognition, or RFID tagging, to ensure consistent tracking of the gowning procedure.

Optionally, the Kinect® sensor could be adapted to track the individual after verification of the gowning procedure and, simultaneously to monitor the portal to ensure that only that tracked individual is present at the portal when it is unlocked by the application. This ensures that the lockable portal does not give access to anyone other than the validated user when in the 'open' state and so prevents 'tailgating'.

Finally, while the present embodiment requires the user to be proactive in establishing they have conformed to the gowning procedure, it would also be possible to use the described classification algorithms from input frames coming from a CCTV cameras positioned in a high-risk facility. In this approach, users could be continually monitored for PPE adherence.

Claims

1. A clothing compliance detection apparatus comprising;

an image processor configured to determine, from at least one video image of an individual, whether or not a predetermined body region of the individual is clothed by an associated item of clothing, and to provide a recognition output indicative of the determination,

wherein the image processor is configured to;

extract from the at least one video image a cropped body region which corresponds with the body region,

characterized in that the image processor is further configured to determine whether or not a body region is bearded, and to provide an output indicative of that determination, by,

extracting from at least one frame of live video frame data a cropped body region which corresponds with the body region,

classifying the cropped body region using a beard classifier associated with the body region, which beard classifier can predict at least two output classes, including an unbearded class and a bearded class.

2. The clothing compliance detection apparatus of claim 1, wherein the clothing classifier comprises a multi-descriptor trained image classifier.

3. The clothing compliance detection apparatus of claim 2 wherein the multi-descriptor trained image classifier is trained with a set of labelled vectors representing a training image set for the associated clothing, wherein each labelled vector incorporates features extracted from an image of the training image set via at least two different feature extractors.

4. The clothing compliance detection apparatus of any preceding claim, wherein the beard classifier comprises a multi-descriptor trained image classifier.

5. The clothing compliance detection apparatus of any preceding claim, wherein the the image processor first classifies the body region using the beard classifier and, only if the body region is classified as bearded by the beard classifier, subsequently classifies the same body region with the associated clothing classifier.

6. The clothing compliance detection apparatus of any preceding claim further adapted to classify plural body regions of the individual, wherein each body region of the plural body regions has an associated clothing classifier for classification of that body region as clothed or unclothed, and wherein at least one body region has an associated beard classifier.

7. The clothing compliance detection apparatus of claim 6, wherein the at least one body region comprises plural beard region components, each of which has an associated beard classifier, and which together comprise the beard region of the individual.

8. The clothing compliance detection apparatus of any of claims 5 to 7, further adapted to generate user perceptible feedback depending on the value of all the classifications made such that the user is informed if;

a) none of the listed body regions are gowned by clothing associated with the clothing classifiers, or

b) at least one of the listed body regions are gowned by clothing associated with the clothing classifiers, or

9. A clothing standard enforcement apparatus for controlling access to a controlled zone comprising the clothing compliance detection apparatus of any of claims 1 through 8, and further comprising;

a lockable portal to control entry to the controlled zone, said lockable portal adapted to be controlled by the clothing compliance detection apparatus such that the portal permits entry by the individual to the controlled zone only when the individual is classified as being clothed to the clothing standard by the clothing compliance detection apparatus,

which clothing standard comprises a requirement for at least one predetermined body region of the an individual to be clothed by a predetermined item of clothing prior to entry into the controlled zone, and wherein the clothing compliance detection apparatus is configured to classify the at least one predetermined body region using a clothing classifier for the predetermined item of clothing.

10. A clothing standard enforcement apparatus as claimed in claim 9 further provided with an input recognition module adapted to receive at least one input from a user, and to carry out a predetermined sequence of steps in response to the at least one input from the user, including

1) upon receiving a first input from the user, classifying the at least one predetermined body region with the clothing classifier for the predetermined item of clothing,

2) providing a first user perceptible feedback indicating the classification made of the predetermined body region and, if that the classification is ungowned, prompting the individual to don the predetermined item of clothing,

3) upon receiving a second input from the user, classifying the at least one

predetermined body region using the clothing classifier for the predetermined item of clothing,

11. A clothing standard enforcement apparatus as claimed in claim 10 when dependent upon at least claim 6 and claim 9 wherein, upon receiving the first input from the user, each body region of the list of plural body regions of the individual is classified using the associated classifier, and the first user perceptible feedback is generated based upon the totality of the classifications made, and upon receipt of the second input from the user, each body region of the list of plural body regions of the individual is classified using the associated classifier, and the second user perceptible feedback is generated based upon the totality of the classifications made.

12. A clothing standard enforcement apparatus as claimed in claim 10 when dependent upon at least claim 6 and claim 9 wherein the predetermined list of plural body regions of the individual is a sequential list, and wherein, upon receiving the first input from the user, the first user perceptible feedback is generated based upon the totality of the classifications made, and upon receipt of the second input from the user, each body region is classified in a stepwise fashion, starting with first body region and associated clothing classifer, and moving sequentially through the list upon confirmation that each associated item of clothing is worn, and providing the second user perceptible feedback upon completion of the list.

13. A clothing compliance detection apparatus, or clothing standard enforcement apparatus of any of claims 1 through 12 wherein the clothing associated with the individual comprises personal protective equipment.

14. A computer program comprising instructions to cause the clothing compliance detection apparatus of any preceding claim to execute the steps set out in claim 1, claim 5 or claim 8, or to cause the clothing standard enforcement apparatus of claim 9, claim 10, claim 11 or claim 12 to exectute the steps set out in claim 10, claim 11, or claim 12.

15. A computer-readable medium having stored thereon the computer program of claim 14.