GB2620560A

GB2620560A - Obstruction detection in image processing

Info

Publication number: GB2620560A
Application number: GB2209909.7A
Authority: GB
Inventors: Smith Andrew; Pallister Michael; Ruiz-Garcia Ariel
Original assignee: Seechange Technologies Ltd
Current assignee: Seechange Technologies Ltd
Priority date: 2022-07-06
Filing date: 2022-07-06
Publication date: 2024-01-17
Also published as: GB202209909D0

Abstract

A method 100 of operating an image processing apparatus, an apparatus and a computer program product to monitor a space 110 for obstructions comprising capturing a first image of a physical space 104. Applying a trained model to the first image to delimit a portion of the physical space as a normal portion image 106. Deriving a mask M comprising a plurality of pixel values at defined positions of the normal portion image 108 and monitoring the physical space with the image processing apparatus 110. Capturing at least one further image of the physical space 112 and analysing image data of the further image in comparison with the mask. Responsive to non-detection in the further image of at least one of the plurality of pixel values at a defined position of the mask, emitting a control signal 124 to an obstruction awareness system. A machine learning technique may be used to detect characteristics in the further images.

Description

OBSTRUCTION DETECTION IN IMAGE PROCESSING

The present technology relates to accurately detecting and handling obstructions in physical spaces under monitoring by imaging systems.

Both common sense and legal requirements in many jurisdictions require that certain physical spaces remain free from obstructions, so that people can pass freely through those spaces. Typically, emergency evacuation routes, such as fire escape routes, must remain clear for the passage of people in case of emergency. Similarly, access to objects such as fire extinguishers, hoses, electrical supply control cabinets and the like, must remain unobstructed. It is also desirable (though not essential) for, for example, service routes between kitchens and tables in restaurants to remain clear. It is both costly in resources, such as time, to have people carry out sufficiently frequent visual checks on these spaces, and such checks, being tedious, are frequently of poor quality and subject to human error. It is therefore desirable to use some kind of automated monitoring to detect the presence of obstructions and to raise alarms or initiate actions to clear the obstructions.

In a first approach to the many difficulties encountered in accurately detecting and handling obstructions in physical spaces, the present technology provides a method of operating an image processing apparatus to monitor a space for obstructions comprising capturing a first image of a physical space; applying a trained model to the first image to delimit a normal portion of interest of the physical space as a normal portion image; deriving a mask comprising a plurality of pixel values at defined positions of the normal portion image; monitoring the physical space with the image processing apparatus; capturing at least one further image of the physical space; analysing image data of the further image in comparison with the mask; responsive to non-detection in the further image of at least one of the plurality of pixel values at a defined position of the mask, emitting a control signal to an obstruction awareness system. The obstruction awareness system may initiate action to clear the obstruction by, for example, a mechanical object removal apparatus.

The method may be computer-implemented, for example in the form of a computer program product to, when loaded into a computer system and executed thereon, cause an apparatus to perform the steps of the method. There is further provided an apparatus comprising a processor, a memory and logic operable to perform the steps of the method.

It will be clear to one of ordinary skill in the art that, where the term "pixel" is used, any subunit of digitally-processable image data may equally be used.

Implementations of the disclosed technology will now be described, by way of example only, with reference to the accompanying drawings, in which: Figure 1 shows a simplified view of a method of operating an image processing apparatus according to implementations of the presently described 15 technology; and Figure 2 shows a simplified representation of one possible arrangement of electronic circuit components in an image processing apparatus operable according to an implementation of the presently described technology.

As described above, it is desirable to use some kind of automated monitoring to detect the presence of obstructions and to raise alarms or initiate actions to clear the obstructions. However, monitoring systems that could carry out such checks may be subject to rather frequent false positives in their detections -for example, a person moving into a space, or standing in the space, may trigger an unnecessary alarm and instigate time-wasting further investigation or attempts to clear an obstruction that does not exist in reality. Similarly, a false positive may be caused when a camera is knocked out of true alignment by a passer-by, changing the image and causing the system to falsely raise an obstruction alarm when there is no obstruction present or to waste the time and power of a mechanical obstruction removal apparatus by instructing it to operate when there is no requirement for it to do so.

It is thus desirable to have a system and method for monitoring a physical space using an image capture apparatus and an imaging system in such a way that a modelled region in a captured image can be identified and delineated as a normal image portion. The imaging system should then be able to perform comparisons between the modelled region as it should be and a captured image of a current state to detect any deviations. Deviations from the modelled region (as it should be) detected in the current state may be the result of a genuine obstruction, or they may arise because of the presence of a passing person or because of a disturbance of the image capture apparatus. It is desirable to be able to detect and discount these factors when deciding whether or not an alarm is to be emitted or clearance action by a mechanical apparatus is to be initiated.

The present technology thus provides apparatus, computer-implemented techniques and logic circuitry for correctly identifying obstructions while avoiding false positives that cause waste of time and resources.

According to implementations of the present technology, a trained model can be used to identify the normal portion of interest of an image and to produce a 'mask' of the area of the image which corresponds to the physical normal area of interest. The mask is typically a set of pixel-level data values representing the characteristics of the physical area of interest. If an item of a detectable size is placed so as to obstruct the portion of interest, the image contains a gap with respect to the mask values where the item is placed. By detecting the absence of at least one expected pixel value (as expected according to the mask, that is) at a location, the present technology detects the gap, and this allows the detection of entities that are obscuring the object of interest in the image.

Typically, the implementation of the present technology relies on a probabilistic technique for operating the model in deciding that there is an obstruction -that is, the system is intelligent enough to detect when a difference between the image as it should be and the fresh image is substantial enough to warrant action, thereby allowing for minor or transient disturbances in the image caused by, for example, transient misreading of data or minor noise interference in a communications channel.

The present technology is, as described above, subject to two possible sources of error. The first is that the image capture apparatus, such as a monitoring camera, may have been knocked or otherwise subjected to some movement, such that the current image of the object of interest is displaced with respect to the stored mask. To address this issue, the bounding polygon of the normal portion of interest in the image is processed as follows. The identifying characteristics of the pixel arrangement at each vertex of the image portion boundary is registered -these arrangements can be used as a unique identifier of the normal image of the object of interest when compared to other images. Using known image comparison and matching algorithms, each vertex of the image portion boundary can be identified in an image produced by a camera of a similar scene. This allows a map of points between the images to be created, after filtering and adjustments to remove bad matches. If, when comparing a fresh image of the area with the mask, a difference is detected in the relative alignment of the vertices, this indicates that the image capture apparatus may have been knocked out of position. In one implementation of the present technology, on thus detecting an apparent displacement of the fresh image compared with the mask, the system adjusts the image to compensate, and then derives a fresh mask before repeating the comparison. This allows for the detection of obstructions even when the image capture apparatus has been displaced.

The second potential source of error is the false positive caused when a person strays into the monitored zone, such that the person appears to be an obstruction. Clearly, activating an obstruction removal system or raising an alarm whenever a person incidentally appears in the fresh image would be a nuisance, and thus it is desirable to modify the system slightly to accommodate this false positive from the system. This can be achieved by implementing a further aspect of the machine-learning model that is capable of identifying a portion of an image as a person. Any of the known machine-learning techniques for detecting persons (such as those implemented in sensor systems for self-driving vehicles, for example) may be used. In one implementation, a semantic segmentation neural network may be used to detect that an entity that appears in the fresh image but not in the mask is most likely to be a person, and can therefore be disregarded for the purposes of obstruction detection.

Turning now to Figure 1, there is shown a simplified view of a method of operating an image processing apparatus according to implementations of the presently described technology. The method may be implemented using an arrangement of electronic circuit components in an image processing apparatus such as that shown in Figure 2.

The method of Figure 1 starts at START 102, and at 104 a first image 51 is captured, the image being of a physical space that is to be monitored. The first image comprises image data for a plurality of pixels, and comprises one or more pixel values for each pixel of the first image. For example, in the case of a grey-scale image, the first data may comprise an intensity value for each pixel of the first image. As another example, in the case of a colour image, the first data may comprise a value for each of Red, Green, and Blue channels associated with each pixel of the first image. The first data may also comprise information on the location of each pixel within the first image. In any case, the first data represents the first image, for example in such a way that the first image captured by the image sensor may be reproduced on the basis of the first data.

At step 106, an appropriately-trained model is applied to the image data of the first image Si. In one example, a trained semantic segmentation neural network is applied to the first image data to determine, for each pixel a classification value associated with the pixel. In other words, the trained segmentation neural network is configured to take as input the first image data, and provide as output a classification value for each pixel of the first image Si.

The determined classification value for a given pixel indicates the extent to which the trained semantic segmentation neural network estimates, based on its training, that the given pixel illustrates a portion of the image portion of interest. For example, the classification value for a given pixel may be or otherwise indicate the probability as estimated by the trained semantic segmentation neural network, or the confidence with which the trained semantic segmentation neural network predicts, that the given pixel illustrates a portion of interest.

Detecting a portion of interest in an image of a space based on pixel classification values determined by a trained semantic segmentation neural network may allow for reliable identification of an object of interest. For example, semantic segmentation models differ from (object) detection models. Object detection models attempt to identify instances of a given class of object in an input image. However, semantic segmentation models output a classification value or class for each pixel of an input image. Because, by their nature, floor surfaces or other objects of interest do not necessarily have a well-defined generic form or shape, it is difficult to train an object detection machine learning model to reliably identify instances of objects of interest in images. However, due to the per-pixel nature of the semantic segmentation output, pixels of the input image can be classified as depicting an object of interest or not, independent of the form of the object itself.

Any semantic segmentation neural network may, in principle, be used. In one example the semantic segmentation neural network may comprise an encoder and a decoder. The encoder uses operations, such as convolutions and/or pooling operations, to encode or downsample an input image into progressively smaller but denser feature representations. As a result of the encoding a compressed representation of the input image is produced. The decoder uses operations, such as transpose convolutions, to decode or upsample the compressed representation into progressively larger but more sparse feature representations. Specifically, the decoder decodes the compressed representation so that the final representation has the same size as the input image. The final representation may consist of classification values (e.g. probabilities or prediction confidences for a given classification) or classifications for each pixel of the input, for each class that the semantic segmentation neural network has been trained on -in the present instance, values of pixels that represent portions of objects of interest. The output of the semantic segmentation neural network may be a bitmap consisting of the classification values or classes for each pixel of the input image.

In some examples, the trained semantic segmentation neural network may be provided by training a semantic segmentation neural network based on a training data set. For example, the training data set may comprise a plurality of training images, each training image having been captured by an image sensor, each training image being of a space including an object of interest, each training image comprising pixels. In each training image, each pixel that illustrates an object of interest is annotated to indicate that an object of interest is illustrated by the pixel. The semantic segmentation neural network may be trained based on this training data, for example using the annotations as a supervisory signal for supervised learning. For example, the encoder and decoder may be iteratively adjusted so as to correctly classify pixels in each of the training images, as judged against the class annotations. In this way, step 106 is enabled to delimit the normal portion image Fl, both as a representation of all the pixels as positioned over the entire object of interest and as a set of delimiting vertices that define the alignment of the object of interest with respect to the surrounding elements of the image 51 of the space. Thus, by use of a plurality of annotated examples, the model learns to recognise the characteristics of an object of interest, by analysing a plurality of factors such as shape, conformation, texture, position and sizing relative to other image elements, and the like.

In some examples, the trained semantic segmentation neural network may be obtained from a storage. For example, a semantic segmentation neural network may have been pre-trained, for example in a manner similar to as described above, and the trained semantic segmentation neural network stored in data storage. The trained semantic segmentation neural network may then be retrieved from the storage and applied to the first image as in step 106 of Figure 1.

At step 108, a mask M representing the unobstructed normal surface is derived from the data of image 51. In the semantic segmentation implementation described above, this mask M may comprise, or be derived from, the bitmap consisting of the classification values or classes for each pixel of the input image.

The process continues by monitoring 110 the physical space with the image processing apparatus. At step 112, a fresh image 52 of the space is captured. As will be clear to one of ordinary skill in the art, the monitoring and capturing steps shown will form part of an iterative process as long as the image processing apparatus is active.

Image 52 is then processed at 114 to distinguish the normal portion image F2, which is rendered into a form that is compatible with the form of the mask M derived from the original image Si, so that any differences may be determined by analysis at the pixel level. The rendering of the current portion image F2 may be implemented using a similar technique to the semantic segmentation technique that has been described above with reference to the processing of original normal portion image Fl to produce mask M. At 116, it is determined whether there are differences between the pixel values of mask M and the pixel values of current portion image F2. If no differences are detected, this iteration of the process completes at END 132, after which further iterations may proceed until the image processing apparatus ceases to be active.

If, at 116, it is determined that differences have been detected between mask M and current portion image F2, the differences are analysed to assess the cause. In the case in which the pixel-level analysis detects a displacement of the pixels in proximity to the vertices in current portion image F2 relative to their positions in mask M, this is taken as an indication that the most likely cause is that the image capture apparatus has been accidentally disturbed -for example, a monitoring camera has been knocked out of its original alignment. In implementations, the image processing apparatus is able to analyse the differences in the images and to determine the type and metrics of the displacement -for example, a linear displacement of the camera of C centimetres, or an angular displacement of the camera of R radians -and to compensate for the difference by, for example, re-rendering current portion image F2 to reverse the displacement and thereby compensate for the disturbance at 120. In these circumstances, the method takes a logical path that returns from the analysis 118 via the compensation actions at 120 to the distinguishing of the current portion image F2, but now with appropriate compensatory actions having been taken to enable the comparison at 116 to correctly determine any differences at the pixel level between the now-corrected current portion image F2 and mask M. At 122, it is further determined whether the difference detected between current portion image F2 and mask M is the result of the presence of a person in current portion image F2. As will be clear to one of ordinary skill in the art, a continuous monitoring of the space will provide the image processing apparatus with a plurality of frames, and thus it is clear that in any physical space under monitoring, some frames may contain images comprising persons who are transiently in the physical space. As would also be clear to one of skill, activating any kind of action to deal with an obstruction when the perceived obstruction actually consists of a passing person would be a clear example of a false negative giving rise to wasted activity. It is thus desirable to eliminate such false positives be detecting when a difference in the images is caused by a person. Detecting persons in images is known in the art, and there are several possible techniques available for doing so. Common to all the known techniques is a combination of object detection and image classification, so that objects that are found in the image data can be correctly and efficiently individuated (that is, identified as potentially "of interest" and separated from the background and other objects) and then classified according to the probabilities of their various characteristics being those that identify them as persons. In implementations, semantic segmentation techniques performed by neural networks may be used in combination with known techniques of, for example, human pose estimation and gait analysis over plural frames, to provide the person detection required by the present technology.

If, at 122, a person is detected with a high probability of accuracy, the method proceeds to end this iteration at END 132, after which further iterations may proceed until the image processing apparatus ceases to be active. If no person is detected at 122, the model concludes that, as a difference was detected at 116 that cannot be accounted for by a displacement at 118 or a person at 122, that there is a high probability that the difference between F2 and Ni is caused by an obstruction, and at 124 a control signal is emitted.

Responsive to the control signal at 124, the method continues by determining at 126 whether or not a coupled or integrated mechanical obstruction clearance system is available, and if so, instructing, at 130, the obstruction clearance system to remove the obstruction. As will be clear to one of skill in the art, many facilities, such as industrial factories, warehouses and the like, are equipped with automated systems for the moving of goods and materials from place to place. Given the availability of such technologies, in an implementation, there may be provided an automated or robotic obstruction clearance system that may be activated by one or more control signals emitted by the image processing system according to the present technology. The obstruction clearance system may require instruction as to the location, size and other characteristics of the obstruction, or it may be provided with machine intelligence to enable it to assess and autonomously handle any obstructions to the presence of which it has been alerted by the instruction at 130.

If, at 126, it is determined that there is no coupled or integrated mechanical obstruction clearance system available, the method proceeds by emitting an outbound signal at 128. The outbound signal may be provided with details of the location, size and other characteristics of the obstruction, as well as of the instance of the image processing apparatus that is emitting the signal, so that appropriate action may be taken by some external entity. In either case -that is, whether a coupled or integrated mechanical obstruction clearance system is activated or an outbound signal is emitted --the current iteration of the method completes at END 132, after which further iterations may proceed until the image processing apparatus ceases to be active.

The method of Figure 1 thus provides a method of operating an image processing apparatus to monitor a physical space for obstructions by capturing a first image of a physical space, applying a trained model to the first image to delimit a portion of the physical space as a normal portion image of an object of interest, and derive a mask comprising a plurality of pixel values at defined positions of the normal portion image. The mask is then retained and the physical space is monitored with the image processing apparatus to capture further images of the physical space. The image data of at least one of the further images is compared on a pixel-by-pixel basis with the mask --the mask thus allows for analysis on a pixel-by-pixel level to determine whether there are any "holes" in the further image. Responsive to non-detection in the further image of at least one of the plurality of pixel values that should, according to the mask, be at a defined position, the present technology emits a control signal to an obstruction awareness system. The existence of any "holes" in the masked area shows that something is interfering between the image capture apparatus and the normal image portion of the physical space, and that the "something" is probably an obstruction. In implementations, a threshold size of "something" may be set as a measure of whether the "something" should be counted as an obstruction. This threshold setting would eliminate minor glitches of one or a few pixels in the captured image data, and thereby avoid false positives where, for example, specks of warehouse dust on a camera lens might otherwise cause the system to trigger obstruction clearance actions unnecessarily.

The mask described above also allows the determination of a plurality of vertices representing the boundaries of the image portion of interest, so that any further image that should show the same image portion of interest may be checked for any distortion caused by, for example, a shift in the position of the apparatus that captured both the original image and the further image. It achieves this by analysing image data of the further image in comparison with the mask to detect any displacement of the image processing apparatus relative to the normal portion of the physical space, and adjusting the current portion image to compensate for any such displacement.

The method of the present technology enables further analysing the image data of the further image in comparison with the mask to detect any data characteristic of a person in the further image and resuming the monitoring without further action if the difference is found to be caused by a person entering into the physical space that is being monitored, and thus being "caught on camera." This act of detecting data characteristic of a person may use a machine-learning technique to individuate characteristics of a human form in the further image.

Turning now to Figure 2, there is shown a simplified representation of one possible arrangement of electronic circuit components in an image processing apparatus operable according to an implementation of the presently described technology. The components take the form of an apparatus 200 broadly comprising an image capture device 202, electronic logic circuitry 204, and at least one of a clearance device 220 or an outbound signaller 222. As will be clear to one of ordinary skill in the art, these components may be implemented as separate devices connected over a network, or some of the devices may be combined within a larger system. For example, if an image capture device 202 is provided with memory and processor functions of sufficient size and sophistication, electronic logic circuitry 204 (and possibly also outbound signaller 222) may be incorporated within image capture device 202 to form a single image capture, processing (and alert raising) system. In another scenario, a possible mechanical obstruction clearance device 220 in the form of a robot may be provided with image capture device 202 and electronic logic circuitry 220, to form an autonomous system for detection and clearance of obstructions. Electronic logic circuitry 204 may comprise a processor having permanently fixed processor logic embedded within it, or it may comprise a general purpose processor provided with a set of program instructions operable to perform the required processing.

Image capture device 202 may comprise a conventional analogue camera, in which case, electronic logic circuitry 204 will require the addition of an analogue-to-digital converter arranged as a pre-processor for converting both the first image and the further image prior to any further analysis steps. Typically, however, image capture device 202 will be a digital camera arranged to provide digital image data for the first and further images in a form in which it can be immediately analysed by electronic logic circuitry 204.

Electronic logic circuitry 204 according to the present technology comprises a data store 206, operable to store at least the first and further images captured by image capture device 202. In real life practice, data store 206 will need to be capable of storing a plurality of image data elements derived from the frames captured by image capture device 202. The data captured by image capture device 202 may comprise a quasi-continuous video camera feed, or it may comprise discrete frame data of images captured at intervals -in such a case, as would be clear to one of ordinary skill in the art, the intervals may be set at some suitable time lapse, so as to capture adequate data for the purposes of analysis, while minimising resource consumption for storage and processing of the data.

Electronic logic circuitry 204 further comprises a delimiter 208, a masker 210, and a normal image modeler 209. Normal image modeler 209 may comprise a model-based machine learning system (for example, a neural network) that may be trained to detect the characteristics of objects of interest in image data -analysing a plurality of factors such as shape, conformation, texture, position and sizing relative to other image elements, and the like. Normal image modeler 209 may comprise a semantic segmentation neural network, as has been described above. Normal image modeler 209 may be implemented entirely within electronic logic circuitry 204, or it may be initially implemented and trained in another processing environment and then imported into electronic logic circuitry 204 -for example, normal image modeler 209 may be produced and trained centrally, and then deployed to a plurality of distributed instances of apparatus 200. Normal image modeler 209 is thus operable to identify the portion of the image that represents an object of interest, and this data may then be acquired and used by delimiter 208 and masker 210. Delimiter 208 establishes the vertices of the polygon representing the object of interest, along with the relations among the vertices, such that any distortion or translation of any further image of the same object of interest may be detected, as will be described below. Masker 210 is operable to use the image data provided by normal image modeler 209 to create a mask, typically a pixel-level mask comprising pixel characteristic data for each pixel of the image data for the object of interest.

As described above, image capture device may be operable to monitor a physical space and thus provide further images of the same object of interest that was earlier used to create the mask. To process these further images, each is submitted by electronic logic circuitry 204 to comparator 212, which takes as its input the mask and the further image data. Comparator 212 is operable to make a pixel-by-pixel comparison of the mask and the further image data to detect any differences. In some instances of comparator 212, a threshold level of difference may be set, so as to ignore very minor deviations, such as might be caused by transient signal errors, network noise, dust-specks on a lens, and the like. Comparator 212 may, in implementations, operate on a percentage of coverage basis, or it may operate on a probabilistic basis, judging that an image difference that shows certain pixel-level characteristics most probably represents a genuine obstruction. Two special cases of difference may be compensated for in implementations of the present technology, but in the simple case the comparator 212 is operable, on detecting a difference between the further image and the mask representing the object of interest as originally captured, to activate a signaller 218. Signaller 218 is operable to emit a control signal, which may comprise a basic alert to inform a further entity that an obstruction has been detected by apparatus 200, or may comprise a more complex instruction and data flow, possibly including details of the detected obstruction, such as its size and position.

Electronic logic circuitry 204 may be, in a first implementation, operatively coupled to a clearance device 220, which may be a mechanical obstruction clearance system as described above. Clearance device may be operable to receive and interpret (by means of some processor arrangement) the control signal emitted by signaller 218, and to act on any instructions and data comprised in the signal. In an alternative, electronic logic circuitry 204 may be connected to an outbound signaller 222 that is operable, responsive to receiving a control signal from signaller 218, to onwardly transmit a signal (which may be a duplicate of the original, or may comprise an interpretation of the original signal) outside the apparatus 200 to some external entity.

Turning to the first of the two special cases mentioned above, comparator 212 having detected a difference between the further image and the mask representing the object of interest as originally captured, a coupled displacement detector 214 may analyse the difference, using data concerning the vertices of the polygon representing the object of interest, along with the relations among the vertices that was captured by delimiter 208 as described above. Displacement detector 214 is thus operable to detect when a difference is most likely to have been caused by a displacement of the image capture device 202 and to communicate this conclusion to masker 210, which, in collaboration with delimiter 208, is operable to adjust the floor image data to compensate for the displacement such that any distortion or translation of the further image is both eliminated a source of a false positive obstruction detection, and so that the adjusted image data may be used to construct a replacement image of the object of interest that is correctly aligned and proportioned for comparison with the mask.

Turning now to the second of the two special cases mentioned above, comparator 212 is operatively coupled to person detector 216, which is in turn operable to analyse the further image to determine whether or not a difference between the further image and the mask is caused by the presence of a person in the further image. Person detector 216 is operable using, for example, semantic segmentation techniques performed by neural networks in combination with known techniques of, for example, human pose estimation and gait analysis over plural frames, to provide the person detection required by the present technology.

Person detector 216 may be implemented entirely within electronic logic circuitry 204, or it may be initially implemented and trained in another processing environment and then imported into electronic logic circuitry 204 -for example, person detector 216 may be produced and trained centrally, and then deployed to a plurality of distributed instances of apparatus 200.

The two special cases of differences detected between the further image and the mask representing the object of interest as originally captured are thus, in implementations, handled so as to reduce the number of false positives that would otherwise cause waste of time and resources, such as storage and processor capacity in electronic logic circuitry 204 or the energy needed to activate and move any coupled mechanical obstruction clearance device 220 to no purpose.

The apparatus 200 of Figure 2 according to the present technology is thus operable to monitor a physical space for obstructions by operating image capture device 202 in collaboration with electronic logic circuitry 204 to control operatively coupled clearance device 220 and operatively coupled outbound signaller 222, while providing means for reducing false positives caused by accidental displacement of image capture device 202 or by the presence of a person in the field of view of image capture device 202.

The mask described above also allows the determination of a plurality of vertices representing the boundaries of the object of interest, so that any further image that should show the same object of interest may be checked for any distortion caused by, for example, a shift in the position of the apparatus that captured both the original image of the object of interest and the further image.

It achieves this by analysing image data of the further image in comparison with the mask to detect any displacement of the image processing apparatus relative to the normal image portion of the physical space, and adjusting the current portion image to compensate for any such displacement.

As will be appreciated by one skilled in the art, the present technique may be embodied as a system, method or computer program product. Accordingly, the present technique may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Where the word "component" is used, it will be understood by one of ordinary skill in the art to refer to any portion of any of the above embodiments.

Furthermore, the present technique may take the form of a computer program product embodied in a non-transitory computer readable medium having computer readable program code embodied thereon. The computer readable medium may be a computer readable storage medium. A computer readable medium may be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present techniques may be written in any combination of one or more programming languages, including object-oriented programming languages and conventional procedural programming languages.

For example, program code for carrying out operations of the present techniques may comprise source, object or executable code in a conventional programming language (interpreted or compiled) such as C, or assembly code, code for setting up or controlling an ASIC (Application Specific Integrated Circuit) or FPGA (Field Programmable Gate Array), or code for a hardware description language such as VerilogTM or VHDL (Very high speed integrated circuit Hardware

Description Language).

The program code may execute entirely on the user's computer, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network. Code components may be embodied as procedures, methods or the like, and may comprise sub-components which may take the form of instructions or sequences of instructions at any of the levels of abstraction, from the direct machine instructions of a native instruction-set to high-level compiled or interpreted language constructs.

It will also be clear to one of skill in the art that all or part of a logical method according to embodiments of the present techniques may suitably be embodied in a logic apparatus comprising logic elements to perform the steps of the method, and that such logic elements may comprise components such as logic gates in, for example a programmable logic array or application-specific integrated circuit. Such a logic arrangement may further be embodied in enabling elements for temporarily or permanently establishing logic structures in such an array or circuit using, for example, a virtual hardware descriptor language, which may be stored using fixed carrier media.

In one alternative, an embodiment of the present techniques may be realized in the form of a computer implemented method of deploying a service comprising steps of deploying computer program code operable to, when deployed into a computer infrastructure or network and executed thereon, cause said computer system or network to perform all the steps of the method.

In a further alternative, an embodiment of the present technique may be realized in the form of a data carrier having functional data thereon, said functional data comprising functional computer data structures to, when loaded into a computer system or network and operated upon thereby, enable said computer system to perform all the steps of the method.

It will be clear to one skilled in the art that many improvements and modifications can be made to the foregoing exemplary embodiments without departing from the scope of the present technique.

Claims

CLAIMS1 A method of operating an image processing apparatus to monitor a space for obstructions comprising: capturing a first image of a physical space; applying a trained model to said first image to delimit a portion of said physical space as a normal portion image; deriving a mask comprising a plurality of pixel values at defined positions of said normal portion image; monitoring said physical space with said image processing apparatus; capturing at least one further image of said physical space; analysing image data of said further image in comparison with said mask; responsive to non-detection in said further image of at least one of said plurality of pixel values at a defined position of said mask, emitting a control signal to an obstruction awareness system.
2 The method according to claim 1, said analysing image data of said further image in comparison with said mask further comprising: detecting a displacement of said image processing apparatus relative to said normal portion image of said physical space; and adjusting said further image to compensate for said displacement.
3 The method according to claim 1 or claim 2, said analysing image data of said further image in comparison with said mask further comprising: detecting data characteristic of a person in said further image and resuming said monitoring.
4 The method according to claim 3, said detecting data characteristic of a person comprising use of a machine-learning technique to individuate characteristics of a human form in said further image.
5. The method according to any preceding claim, said emitting a control signal to an obstruction awareness system comprising sending an instruction to sound an audible alarm or to emit a visual indication.
6. The method according to any preceding claim, said emitting a control signal to an obstruction awareness system comprising sending an instruction to an automatic obstruction clearance apparatus.
7 An image processing apparatus operable to monitor a space for obstructions comprising: an image capture device operable to capture a first image of a physical space; a modeler operable to apply a trained model to said first image to delimit a portion of said physical space as a normal portion image; a pixel-level masker operable to derive a mask comprising a plurality of pixel values at defined positions of said normal portion image; said image processing apparatus further operable to monitor said physical space; said image capture device further operable to capture at least one further image of said physical space; a comparator operable to analyse image data of said further image in comparison with said mask; a signaller responsive to non-detection in said further image of at least one of said plurality of pixel values at a defined position of said mask, to emit a control signal to an obstruction awareness system.
8. The apparatus according to claim 7, said comparator further operable to detect a displacement of said image processing apparatus relative to said normal portion of said physical space
9. The apparatus according to claim 8, further operable to adjust said further image to compensate for said displacement.
10. The apparatus according to any of claims 7 to 9, said comparator further operable to detect data characteristic of a person in said further image and to cause resumption of said monitoring.
11. The apparatus according to claim 10, said comparator being operable to detect data characteristic of a person by using a machine-learning technique to individuate characteristics of a human form in said further image.
12. The method according to any of claims 7 to 11, said signaller operable to emit a control signal to an obstruction awareness system comprising control circuitry to send an instruction to sound an audible alarm or to emit a visual indication.
13. The method according to any of claims 7 to 12, said signaller operable to emit a control signal to an obstruction awareness system comprising control circuitry to send an instruction to an automatic obstruction clearance apparatus.