GB2568278A

GB2568278A - Image replacement system

Info

Publication number: GB2568278A
Application number: GB1718613.1A
Authority: GB
Inventors: John Hudson Raymond
Original assignee: Individual
Current assignee: Individual
Priority date: 2017-11-10
Filing date: 2017-11-10
Publication date: 2019-05-15
Also published as: GB201718613D0; WO2019092445A1

Abstract

The invention relates to removal of objects in an image or sequence of images that are captured through cameras, and replaces the deleted part of the image using constructed imagery to mask the removal of the objects. Image processing, comprising: first input for receiving primary image data relating to scene observed by first camera 10; second input for receiving image data relating to scene observed by second camera 20; reference database to store object 30, 40, 50 information for identification of objects in received primary or secondary image data, reference database including replacement object information for identification of replacement objects which are to be removed; processor arranged to: identify objects in scene; determine if any identified objects correspond to the replacement objects; identify primary image object data in primary image data corresponding to portions of scene incorporating the respective replacement objects; obtain replacement data from secondary image data corresponding to of the portions of scene obscured by replacement objects in the scene as observed by first camera; and modifying the primary image data using the replacement data to replace the respective primary image object data; and an output for providing the modified primary image data.

Description

IMAGE REPLACEMENT SYSTEM

FIELD OF THE INVENTION

The invention relates to image processing and in particular to processing of images to selectively identify and replace portions of the image to remove objects visible in the image.

BACKGROUND OF THE INVENTION

Video feeds such as from web cams and other CCTV sources are becoming commonplace. Those images are often made publicly available or provided to customers wishing to have access to live video imagery. However, such images may inadvertently show objects such as people or vehicles that it may be preferable not to display. For example, it may be desirable to avoid showing people in public video feeds for reasons of privacy or security. Similarly, it may be desirable to avoid having undesirable elements such as cars in scenes which are being displayed for aesthetic purposes. A video feed may be of a landscape scene to be displayed for its aesthetic value and so it may be undesirable to have for example cars passing through the scene.

The removal of objects from images for reasons of security, aesthetics and privacy are known. For example publicly available mapping services often provide images of public places but have to modify those images to remove recognisable features of people cars and sometimes buildings or business establishments. However, the removal of such elements often leaves an unsatisfactory result in the modified image. Blurred faces or image elements covered by black blocks are very obvious and distract from the quality and value of the image. Preferably, the unwanted image would be removed and replaced by other imagery which is less disruptive to the flow of the image. Ideally, the real background image behind the object to be removed could be used to provide a seamless replacement for the object to be removed. However, this requires careful modification and information regarding the background imagery to accurately map the background image onto the replaced object image. This might be done with manual intervention by a human editor or using crude automated systems but often with poor results. However, often the background imagery may not be available and so it may be difficult or impossible to replace the removed image with satisfactory results.

Existing techniques may use adjacent surrounding imagery as a reference, this at best may match missing scenery if the scene is very simple and homogenous. Otherwise it requires a skilled operator with results that are a judgment of how the scene might really appear. The time taken makes this impossible to do in live feed scenario and also is prohibitively expensive in most applications even if a live feed is not essential.

There is therefore a need to overcome these shortcomings and to provide a system for modifying images without manual intervention and which is able to produce images with selected objects removed without creating areas of poor or distorted quality or with disjointed imagery. It is also preferable to do this in as close to real time as possible to enable fast real time replacement editing of live (or near-live) feeds.

SUMMARY OF THE INVENTION

The present invention therefore provides an image processing device having: a first input for receiving primary image data relating to a scene as observed by a first camera, a second input for receiving secondary image data relating to the scene as observed by a second camera, a reference database arranged to store object information for identification of objects in the received primary or secondary image data, the reference database including replacement object information for identification of one or more replacement objects which are to be removed, a processor arranged to: identify objects in the scene; determine if any of the identified objects correspond to the one or more replacement objects; identify primary image object data in the primary image data corresponding to one or more portions of the scene incorporating the respective one or more replacement objects; obtain replacement data from the secondary image data corresponding to at least part of the portions of the scene obscured by the one or more replacement objects in the scene as observed by the first camera; and modifying the primary image data using the replacement data to replace the respective primary image object data, and an output for providing the modified primary image data.

By utilising the image data available from the second camera, unwanted objects in the view from the main camera can be removed and replaced. Using this imagery allows the scene to look normal and consistent by replacing the object with image data that reproduces with a good degree of accuracy the background imagery that would be visible if the object was not present. This allows the system incorporating the device to produce images which look like an accurate reflection of a scene as if the unwanted objects were simple not present.

The processor is preferably further adapted to analyse the replacement data to determine any deficient locations in the scene where the replacement data does not contain sufficient information to replace the corresponding primary image object data. This allows the system to identify areas where if an object is removed, there is no or insufficient data to replace the part of the image occupied by the removed object which can, for example, occur where the second camera’s view of the desired background is obscured. This allows the system to use alternative methods to complete the replacement data to completely replace the data representing the removed object.

The processor may comprise a data store for storing at least some of the primary image data, secondary image data and the modified image data as historical image data, wherein said processor is further adapted to: obtain historical replacement data from previously stored historical image data, the historical replacement data representing the historical image data corresponding to at least part of the deficient locations; and further modifying the primary image data using the historical replacement data.

This provides a method to derive data to replace the data representing the object to be removed. By using historical data, which may be only a fraction of a second old, some or all of a removed object can be replaced by suitable background imagery to provide a reliable substitute for the removed object. The above technique can be used to replace data where the second camera is unable to view the required background scenery, for example when the object obscures the view of some part of the background scene as viewed by the second camera as well as the first camera.

The processor may be further adapted to: replicate the image data corresponding to the portions of the scene surrounding the one or more portions of the scene incorporating the respective one or more replacement objects, to produce respective replicated image data corresponding to at least part of the one or more portions of the scene incorporating the respective one or more replacement objects; and further modify the primary image data using the replicated image data.

The above arrangements provides an alternative method of deriving data to replace the removed object data. By observing the image data around the object to be removed, the replacement data can be derived by replicating the image data around the object to produce consistent imagery to provide a background. Whilst this will not be an accurate reflection of the unknown background, by replicating the surrounding imagery, it can provide a suitable substitute that, to a viewer, looks consistent with the rest of the image.

This can be useful where the required image data for the background is not available, perhaps because it is obscured, as described above. Historical data may not be able to provide suitable image data either, for example because the background image is different to normal/previously or changing. This can happen when multiple objects are moving in a scene and where a foreground object is to be removed but the background object retained. Since the background object may not be recorded in the historical data, then this technique provides a means for producing suitable replacement image data.

The processor is preferably further adapted to: modify said replacement data to at least partially conform the perspective, tone and brightness of the image represented by the replacement data to that of the corresponding portion of the primary image data. The different angle of the second camera or in historical data may mean that it observes the scene differently, e.g. from a different perspective, with different lighting and so on. In order to reflect how the replacement imagery would be observed by the first camera, this modification of the replacement imagery allows the possibly different appearance of the imagery form the second camera to be adjusted to match that of the first camera to provide an accurate and consistent set of replacement data.

The modification of said primary image data preferably includes modifying the primary image data based on said secondary image data to produce an at least partial three dimensional representation of the scene, and removing image data in the three dimensional representation of the scene corresponding to the one or more replacement objects. By producing a three dimensional model of the scene observed by the cameras, objects to be removed can be identified using three dimension image models which may be scaled and manipulated for their orientation in a full three dimensional context.

Preferably, the three dimensional representation of the scene is stored as a matrix pixel array and the object information adaptively compared to the matrix pixel array to determine a best object, size and perspective match to identify the pixels to be removed from the matrix to allow background imagery represented by the replacement data form the secondary image data to be visible.

The image processing device may include one or more further inputs for receiving further image data relating to the scene as observed by a one or more additional cameras, wherein said further image data can be used to supplement or substitute said secondary image data. Where a scene is observed by three or more cameras, then the information available in addition to that from the primary and secondary cameras may provide useful additional information by having a different perspective on the scene. This might allow actual scenery to be obtained that is obscured in the views observed by the first and second camera. Where a three dimensional model is constructed, the additional data allows a more complete model to be developed using the additional data available.

The primary image data may include additional data from other sensors located on or near said first camera, said additional data representing information relating to one of more of: carbon dioxide levels, oxygen levels, atmospheric pressure, temperature, humidity, and particulate concentration and other information or data relating to the local environment. This data may be used by the observer to provide simple information or by using using the data to replicate some or all of the environmental features of the scene. For example, the temperature recorded at the scene may be used to control heaters to replicate the temperature for a user observing the scene. Similarly, the recorded wind speed may be used to control fans to replicate the wind for an observer.

The output modified data may include some or all of said additional data. This allows the information to be sent to an observer or user. Optionally, only relevant or required data may be sent if only some is required by the observer.

The modified image data may include time information reflecting the time that the primary image data and said additional data was obtained. Apart from providing a record of this information, for example when it is used as historical data, this allows image data to be sent from a different time. For example, a natural landscape scene may be completely dark at night time and so it may be preferable to send imagery which is delayed by 12 hours so that a user observing wishing to view a scene at night may be able to see the scene as it was during the daytime using the time marked historical data.

The present invention also provides a system including an image processing device such as that described above along with an environmental chamber, said environmental chamber including display means for displaying images represented by said modified image data and one of more transducers arranged to replicate the environmental conditions measured by said other sensors in said environmental chamber. Such a chamber provides an immersive environment in which a user can observe a scene and experience a number of environmental parameters in a controlled environment to provide a close replication of the scene in real time but at a remote location. The environmental chamber may include heaters/coolers to control temperature, fans to replicate wind, lights to simulate sun, humidifiers/dehumidifiers to control humidity and so on.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will now be described in detail by reference exemplary embodiments and by reference to the drawings in which:

Figure 1 shows a schematic plan view of an observed scene;

Figure 2 shows a camera view of the scene;

Figure 3 shows a schematic plan view with an additional object;

Figure 4 shows a view from one camera;

Figure 5 shows a modified view from the camera;

Figure 6 shows a view from a second camera;

Figure 7 shows a schematic plan view of a second scene with an additional object in a second location;

Figure 8 shows a view from one camera of the second scene;

Figure 9 shows a view from a second camera of the second scene; and

Figure 10 shows a resultant view showing the missing imagery.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Figure 1 shows a simple layout of a camera system. Two cameras 10,20 are arranged to monitor a scene. The cameras are arranged to view the scene from different angles so that each camera observes the scene from a different perspective. Camera 10 is shown in the lower right of Figure 1. The camera 10 has a field of view, shown by the lines 11a,11b encompassing the desired scene. It therefore sees the scene from the right hand perspective between lines 11a,11b. A second camera 20 is arranged in the lower left corner and views the scene from the left hand perspective. The camera 20 also has a field of view, shown by the lines 21 a,21b which similarly encompasses the desired scene. With this arrangement, the two cameras can observe the entire scene and all the elements within the common field of view. Figure 2 shows the sort of view that each camera would observe albeit each from different perspectives.

When an object moves into the scene this may obscure part of the scene. For example as a person moves across the scene as shown in Figure 2, they will obscure parts of the scene behind them. However, because the two cameras have a different perspective on the scene, the person 50 will obscure different parts of the scene in the view of the scene observed by each camera. The shaded regions 35,36 between the dotted lines in Figure 3 show the parts of the scene obscured by the person 50. This means that the part of the scene obscured for one camera may still be visible to the other camera.

Considering the scene shown in Figure 2, it may be deemed that some objects viewed by the camera are undesirable and so need to be removed from the image. If camera 10 is providing a stream of images to form a live feed, as the person 50 enters, the scene from the left, it will be necessary to identify the change to the scene and recognise that the changed part of the image shows a person. Having identified the changed part, the content can be considered to determine the content of this part. Having identified the content, it can be determined whether the content should be removed. In this example, the system determines that the content relates to a person, which needs to be removed.

Once the part of the image containing the object to be removed is identified and determined to be an object to be removed, the part can be removed. Figure 4 shows the view of the scene as observed by camera 10. Once the unwanted part of the scene has been determined, e.g. because it contains a person, that part 51 of the scene can be removed, as shown in Figure 5, leaving the area 51 with no image data. Figure 5 shows a simple oval section having been removed but the removed unwanted part is preferably shaped to match the object. The unwanted part 51 may be closely mapped to the undesired image, e.g. closely following the outline of the person or it may extend into the background imagery some distance to provide a margin around the object to be deleted.

The absence of this portion of the image would be clearly apparent to an observer and so it is necessary to replace the removed image with some alternative imagery. To try to make the image look like it has not been modified, the alternative imagery should be blended into the rest of the original image so that the continuity of the image is maintained.

One way to achieve this is to utilise the part of the scene that would have been visible before the person passed in front of it. However camera 10 can longer observe that part of the scene due to the presence of the person obscuring it. Figure 3 shows the shadow regions 35, 36 which lie behind the person 50. As viewed by camera 10, the region 36 is obscured by the person 50. Similarly, the region 35 is obscured by the person 50 in the view from camera 20.

One option is to save a view of the entire scene from before the person entered the scene and extract the view of the obscured part of the scene from the historical view. The missing part of the scene can then be spliced into the part of the image which has been removed. This provides a satisfactory result where the scene remains the same over time. However, this is not always the case as lighting conditions may vary throughout the day and even from second to second, for example if the sun passes behind a cloud the lighting may change significantly in a fraction of a second.

Furthermore, the scene behind the person may be a dynamic scene. For example, the scene may include the sea with rolling waves or trees which are moving around in the wind. If the part of the image that was deleted was replaced by the same part of the image but from a different time, even if only a fraction of a second before, it may be inconsistent with the rest of the image and may look highly incongruous. Furthermore, if the person remains still for a period of time, then the historical imagery will become more and more out of date. In a video feed, the lighting in a previous picture might appear like a ghostly inconsistency moving across the scene as the person moved across the scene.

To overcome this potential temporal difference, it is desirable to use contemporaneous imagery to replace the deleted portion of the image. In this way, the lighting and imagery would be correct and consistent with the rest of the image. As noted above, the required background image is obscured by the person 50. However, by using the second camera 20, which has an alternative perspective, then it may be possible to obtain an image of the part of the scene that that is hidden to camera 10. This allows the part of the scene 51 observed by camera 10, that is obscured by the person, to be viewed.

Figure 6 shows the view from camera 20. As can be seen, the different position of camera 20 means that the person appears to be in a different part of the scene and more importantly, the person does not obscure the part 51 shown in Figures 4 and 5 of the background scene that camera 10 cannot see. By identifying the corresponding portion 53 in the view seen by camera 20, the required imagery can be obtained and stitched into the image from camera 10 to replace the removed portion shown in Figure 5.

As the camera 20 observes the scene from a different angle, the perspective of the portion 53 will be different to the perspective of the same part of the scene as would be seen by camera 10, if the person 50 was not present. However, the portion 53 extracted from the view seen by camera 20 can be processed to change the apparent perspective to correspond to that as would be viewed by camera 10. As well as the different perspective, the different view that the camera 20 has might mean that the lighting is different which may result in other differences in the image such as the tone or brightness. The replacement image data is modified to correspond to the perspective, tone and brightness etc. of the imagery from the main camera 10. The immediate surrounding area around the object to be removed can be used to judge the level of different aspects of the scene such as brightness/intensity/tone so that the replacement image can be matched to the scene when it is inserted. This enables a seamless integration of the replacement image form camera 20 into the image from camera 10.

There may be situations where an object to be deleted obscures regions of the background that overlap. Figure 7 shows a situation where the regions 35 and 36 overlap such that a portion of the building 40 is obscured by the person 50 in both camera views. Figures 8 and 9 shows the respective views of cameras 10 and 20 respectively. This means that the portion 81a, 81b (see Figure 10) of the building not visible to either camera cannot be obtained from camera 20 to replace the removal of the part of the image containing the person 50.

In this situation, the image from camera 20 may be used to obtain some of the imagery of the background obscured by the person 50 in the view from camera 10. However, the portions 81a, 81b of the deleted image which is also not visible to camera 20 will still be unavailable. This allows a partial reconstruction of the background imagery using the imagery from camera 20 that is available but there will therefore be a portion that is not available to the cameras - see Figure 9. This may only be a small portion of the deleted image as apparent form Figure 10 and so other measures may be used to supplement the missing imagery.

As mentioned above, historical imagery may be used to fill the missing parts. This might provide an acceptable final image on the basis that the missing parts are relatively small and so any inconsistency may not have a significant effect on the overall image. However, where the missing part is significant or there is a significant difference in the actual current scene and the scene when the imagery to be used is obtained, then this may still present a significant and noticeable distortion of the scene. Continuous updating of the historical scenery will help to make the imagery as current as possible but if the obstruction remains for a period of time, e.g. the person stands still for a period of time in the position shown in Figure 7, then the historical imagery may become out of date.

A further problem may arise where the required background imagery is not available. For example where there are two moving objects within the field of view. If one of the objects is in front of the other then part of the rear object maybe obscured. If one of the objects is to be removed but the other is to be retained and particularly if the object to be removed is in front of the one to be retained then the information to replace the front object may be dynamic. For example, consider a situation where a boat is passing through the rear of the scene and a person passes through the front of the scene and it is determined that for this particular configuration people should be removed but boats are acceptable to be retained. If the person is in front of the boat, then the part of the scene showing the person is to be removed and replaced by imagery from behind the person. If this imagery is not visible to the second camera then historical imagery might be used, as suggested above. However, since the boat is moving any imagery will become out of date immediately and so the space where the person is may be replaced by imagery from the boat as it was at some previous time which would look incongruous. In other words a section of the boat e.g. the front that was in the position obscured by the person when the historical footage was collected might be displayed where a different part of the boat, e.g. the middle, should be displayed. This would have the effect of showing a part of the front of the boat in the middle of the boat, which would clearly look incorrect.

In this situation, the historical data is not suitable. To try to produce replacement imagery that does not look out of place or disjointed, a representative sample of the pixels surrounding the part of the image to be removed are chosen, the length and depth of this area is defined and may be refined to suit the scene and the size of the image that is required.

In addition, the available pixels can be assessed to try to determine whether they would be suitable for replication. Specific unique objects may be rejected as unsuitable such as bottles drink cans, flower pots, etc. A library contains objects of many different kinds and can be updated at any time based on a variety of factors such as location and type of scene e.g. urban versus rural. These items would tend to represent the sort of objects that would not be likely to be replicated. For example, if a bottle is present in a scene, it is unlikely that a similar bottle would be next to it and so replicating that element would not be likely to produce a convincing replacement set of pixels. It can therefore be presumed highly likely that the items in the index would not be suitable as a choice when considering adjacent visual data as a mechanism to emulate the background scene.

These items can therefore be identified and not used as part of the pixel data used to determine the pixels for the replacement image.

The representative sample of pixels are then replicated as required to fill the shape of the replaced object or that part of the object that cannot be replaced by the other means described above. For example, in the situation above, the section of the imagery required would be a part of the side of the ship. By considering the pixels to either side of the required section, it might be determined that the ship is generally uniform in a horizontal direction, so that simply repeating vertical strips of the pixels either side of the section would provide a good approximation of the pixels in the section. Similarly if the was a rolling wave moving in the background, the wave may be determined to have a generally consistent shape in a horizontal direction or perhaps an angle of say 20 degrees. The available pixels either side of the missing part can then replicated and offset for each strip so that the angle is taken into account.

Depending on the source of the pixels surrounding the deleted imagery, it may be necessary to correct the perspective to correlate it with the primary image.

The above processes may be used in combination or in isolation to replace deleted imagery with imagery that directly reflects the real-time view or replicates as closely as possible that view.

In order or establish, which objects are to be removed from the scene, a database of images to be removed is established. These objects may be specific to a particular camera or scene or may be more generic. Different locations with different scenes may require the removal or allow the retention of particular objects. For example, in a scene of a public area such as a park it may be desirable to remove people from a scene for purposes of privacy whilst retaining images of animals. In that situation, the database would be provided with information for identifying people. It may also be provided with information defining animals which it can use to distinguish between people and animals in moving parts of the scene.

The images produced from cameras 10,20 are passed to an image processing system which analyses the images from the cameras. One of the cameras may be defined as a primary camera with the other camera used as a secondary camera. The primary camera acts as the main source of the images, with the secondary camera providing the alternative perspective image data to provide image data for replacing the deleted portions of the main image received by the primary camera.

If a third (or more) camera is available, then imagery from the third camera may also be used for replacing deleted sections of the main image from the primary camera. This provides for further redundancy in the case that the secondary camera is unable to observe the desired background imagery required to complete the main image. The third camera should have a different perspective to both the primary and secondary cameras. The system may use both the secondary and the third (and any other) cameras to produce replacement image data and the camera which produces the best results may be selected to provide the data to be used. In this way the modified image can be optimised using the best available source.

In the embodiment described above, the system identifies objects within the image (either single or as part of a video feed) in order to identify those that are to be removed. Since the objects may be oriented at different angles and viewed by the cameras at different angles, the same object may look quite different depending on its orientation and the camera viewing it. Also, the same object might appear larger or smaller depending on how close to the camera it is. In order to maximise the reliability of the system’s ability to identify objects, it can use techniques to be able to identify an object from any angle and by utilising the 3D information available from having two or more views from two or more cameras.

Objects within the image are identified via a known technique in machine learning defined as ensemble learning as well as computation for tendency behaviour. Different kinds of classification of objects are defined in this technique, such as people, cars, boats, animals etc. For a given object, basic parameters defining the object are provided to allow a basic identification of an object from perhaps a single direction. However, once the system is established, it can enhance the information it holds about objects by observing them in the real world to derive more detailed information describing the appearance of the object from multiple directions and also to establish variances between objects such as size, shape, colour etc. For example, cars will have a great variety of shapes but by obtaining information about a number of cars and the same car from multiple angels and positions, the features can be established to allow reliable identification of all cars. Also, by observing and identifying a car object, that object can be more reliably observed and tracked as it passes across a scene.

The system can then gather knowledge about how the object looks at different positions and angles and from different cameras. This allows a comprehensive set of information to be gathered such that the system learns more about the objects it is looking to identify. This may be supplemented by specific confirmation or rejection by a human viewer confirming or rejecting identified objects. For example, the system may incorrectly identify a small building as a bus which a human can correct to improve recognition accuracy.

Once the parameters or factors of these classifications are trained and identified, they can each be applied separately to the live image feed. A classifier (defined identified object/s) are applied to the image/s of a scene to identify the corresponding objects within the scene; the identified similar objects are collected and are used as additional information for the next classifier for each object and so on. This multistage propagation technique enables the algorithm to identify the objects correctly. This can also further enhance the learning process and hence improve accuracy. To compare a classifier with a potential object, the system compares the difference between the classifier and the target part of the image. A sum of the differences between the classifier and the target image area is produced.

An exemplary method for determining the summation for a classifier is uses the following equation:

where:

S_e represents the cumulative difference between a classifier object and a potential object in a scene;

m and n stand for number of features of images; yj stands for the classifier at the current iteration;

c stands for classifier at back stage (it is the feedback into the system to train yj).

The sum S_e helps to determine the classifiers with the minimum difference of variational parameters. The error between them is reduced to the minimum so that the selected classifiers do not deviate from the actual feature extraction. In this way, the objects in an image can be matched to a classifier and correctly identified as objects to be removed or not.

The learning rate a at any point of the classification iteration can be represented by the equation.

a =aX^b where:

a is the learning rate;

a is the number of times the classifier needs to be running;

b is the number of arguments/parameters defining the object of choice.

Other techniques used in the ensemble learning to train the classifier include reading the variation of histogram patterns. This technique works best when the object that needs to be recognised is actually zoomed in or covers much of the space of the images.

This technique is extracted from the basic integral relation (also called as fundamental calculus theorem) g'(t)dt = g(b~)~ g(a)

Using this fundamental technique an object pixel is zoomed to identify the histogram of object gradients. With the slope of those gradients edges, nodes e. of an object are identified. This identifies the object of choice.

The gradient of the histogram at any point of a slope is given by the equation ^Fx= ^(f(^x.y') ~ ^z) = fx ^Fy = ~= fy ^Fz = ^Cf(.^x>y)-z) = -i

Where F_x, F_y, F_z are the gradients of objects.

Given the projection of images on a two dimensional plane the gradient on the z axis is generally set as -1 to map into the two dimensional plane.

Once the objects have been identified, the summation of gradients is taken and combined as a single object of choice. That means taking the global maximum from all local maximum at any given internal of time. In this case, interval of image feeds f(a) <= f(x) we take f(x) as the Region of Interest. This is the region where the object is identified.

With the system described above, whilst camera 10 may act as the primary camera and camera 20 as the secondary camera in the above example, with the primary camera producing the main image and effectively defining the perspective of the final view, the roles may be reversed with the camera 20 acting as the primary camera and camera 10 as the secondary camera. The secondary camera would then be used to provide the bulk of the main image with undesirable objects removed and replaced by portions from the camera 10. In practice, both options may be used to produce two separate image sequences from two perspectives, both having been processed to identify and remove unwanted objects using information from the other camera.

The image processing is preferably carried out locally, so that modified images can be produced without having to send the images from both cameras to a remote processing device to produce the modified image. The cameras will be located relatively close to each other by virtue of them observing the same scene and so transmission of the images from one camera to the other or from both to a local processing device will not delay the transmission of the image to the end user. In this way, one or both cameras may include processing capabilities to use its own image data as well as the imagery received from other cameras nearby. Equally, several cameras may provide feeds to a local processing device located close and aligned to the cameras observing a particular scene, to allow the cameras to be relatively standard simple devices. The local processing device may be a processor system such as a computer, PC or other processor based device such as a Raspberry Pi ®.

Whilst processing may be carried out by the processor located either within or relatively local to the cameras, the information about objects may be obtained from a source in a database of objects located remotely. This allows object information to be obtained and updated in addition to the object information held by the processor. This may be an ongoing update process or may be done initially when the system is first set up and in response to the type of scene observed by the camera. So a system set up in an urban environment may be provided with a set of objects likely to be observed in such an urban environment. Similarly a camera system set up in a natural environment may be provided with a different set of object information.

This allows the systems to be tailored to their environment to minimise the amount of information that needs to be stored and processed to analyse the images. Furthermore, the system may select to partially or totally discard information on objects that do not appear in the scene in a predetermined period of time.

The above processing arrangements (i.e. in-camera or in nearby processing devices) avoid long distance transmission of the image data. This is particularly important when the cameras are providing live video streams. The images can be prepared relatively local to the cameras and then sent directly to the end recipient, minimising the time between the images being recorded and reception at the end user.

One camera may incorporate the processing computer with another second camera synchronising with it. The other camera may be on a motorised platform controlled by the software on the processing computer to ensure the two cameras capture the same scene simultaneously. Objects can then be removed in live time form the scene that both cameras can see.

If the images are transmitted from the camera to a remote processing device, then the additional transmission time from the cameras to the remote processing device and potentially then to the device or server providing the video streams and finally to the recipient, may introduce significant delay into the images/videos.

The system described above is primarily aimed at providing a live streamed video but may be used to produce individual images either on demand or otherwise.

The data from the cameras may also include sound as well as information about the local environment such as: air temperature; pressure; quality; percentage of oxygen or carbon dioxide; wind speed. This may allow the end user to experience the location of the cameras beyond just the imagery and sound.

Where the two cameras respectively act as primary camera and secondary camera to produce two separate image streams from different perspectives, the two streams may be used to provide 3D image data using the two different perspective images. As noted above, the 3D imagery may be used with the other environmental information to provide a multisensory output to the user, such as in 4D/5D cinemas or other multifunctional experience presentation environments.

By utilising the two cameras to obtain a stereoscopic view of the scene, the system can derive a partial 3D view of the scene. An algorithm computes a matrix of clusters of pixel arrays of the identified image as a 3D data set from the foreground. This is done as an affine transformation, the background to the identified image as well as adjacent clusters of pixel arrays (minus any identified images from the library), are used as the replacement cluster of pixel array, this replacement pixel array is a self learning model which checks for the best pixel array that gives the feed the most apt view. The new emulated background is embedded in the image, without the images chosen to be removed.

Initially when an image is created it is converted to a matrix of pixels. For example, an Image A is converted into [[[190 184 179] [192 186 181] [193 187 182] [252 216 192] [253 217 193] [254 218 194]] [[190 184 179] [192 186 181] [193 187 182] [251 215 191] [252 216 192] [253 217 193]] [[191 185 180] [192 186 181] [192 186 181] [251 215 191] [252 215 193] [253 216 194]] [[102 110 110] [95 103 103] [ 89 97 97] [115 145 164] [116 146 165] [115 145 164]] [[100 108 108] [ 94 102 102] [ 89 97 97] [116 146 165] [116 146 165] [113 143 162]] [[100 108 108] [ 94 102 102] [ 89 97 97] [115 145 164] [115 145 164] [110 140 159]]]

For a point in time the images are stored as matrix data frame, described above. The pixels are diffracted to each other by the equation

Rt = aR_t-1 + ξι

This is done to account for the variability in the images created/captured. In the above equation R_t stands for the image pixel frame stored at any instant of time t. This value is equal to a constant multiplier times the value of the pixel at a point intermediately on the local minimum side added with an error rate ξι at an instant of time t. The error rate can be due to sunlight glare or snow etc. The image is processed repeatedly to produce a minimised value of ξι by performing what is called an image blending.

g(x)=(1- a)f₀(x)+afi(x)

In the above equation we vary the alpha from 0 to 1, thus providing a subtle transition between one image and another such that the error in the images fall under the intersection curve and thus get reduced.

The algorithm identifies background and foreground objects from the image (3D data set). Once it has identified the background and foreground objects it computes the matrix pixel array of each object which has to be ghosted out. The algorithm then maps the array with an affine transformation and tries to find a replacement for the matrix pixel array (of the object that is to be ghosted) by computing an array from an adjacent background section of the 3D feed data set. The replacement matrix pixel array is a self learning model which checks for the best pixel array that gives the feed the most apt view. It then replaces the object matrix pixel array with a background matrix pixel array thereby ghosting the objects from the image feeds.

The process of learning in the 3D environment initially requires a selection of an object in the field of view which can be 2D or 3D. This object is added to the library by the classifier. One or more images of objects of this type are uploaded via the classifier, the stored item is now accessed as live image frames that are run through the processor. When an object in the field of view matches the 2D or 3D profile viewed at different magnifications of the scene, any variations in profile will be learned and stored as part of the reference library for future image recognition, this process runs continuously with the process mapping the object in 3D from every perspective angle as it goes through the scene, this learning process increases the ability of the system to identify specified objects.

This learned imagery and angular perspective enables identification of replicated imagery in a scene with objects moving across it with constantly changing perspective. For each mapped 3D object the process provides a 3 x 2D map, changing for each stored frame as the angle and perspective moves through a scene.

Claims

1. An image processing device having:

a first input for receiving primary image data relating to a scene as observed by a first camera, a second input for receiving secondary image data relating to the scene as observed by a second camera, a reference database arranged to store object information for identification of objects in the received primary or secondary image data, the reference database including replacement object information for identification of one or more replacement objects which are to be removed, a processor arranged to:

identify objects in the scene;

determine if any of the identified objects correspond to the one or more replacement objects;

identify primary image object data in the primary image data corresponding to one or more portions of the scene incorporating the respective one or more replacement objects;

obtain replacement data from the secondary image data corresponding to at least part of the portions of the scene obscured by the one or more replacement objects in the scene as observed by the first camera; and modifying the primary image data using the replacement data to replace the respective primary image object data, and an output for providing the modified primary image data.

2. An image processing device according to claim 1 wherein said processor is further adapted to analyse the replacement data to determine any deficient locations in the scene where the replacement data does not contain sufficient information to replace the corresponding primary image object data.

3. An image processing device according to claim 2 further comprising a data store for storing at least some of the primary image data, secondary image data and the modified image data as historical image data, wherein said processor is further adapted to:

obtain historical replacement data from previously stored historical image data, the historical replacement data representing the historical image data corresponding to at least part of the deficient locations; and further modifying the primary image data using the historical replacement data.

4. An image processing device according to claim 2 or 3, wherein said processor is further adapted to:

replicate the image data corresponding to the portions of the scene surrounding the one or more portions of the scene incorporating the respective one or more replacement objects, to produce respective replicated image data corresponding to at least part of the one or more portions of the scene incorporating the respective one or more replacement objects; and further modify the primary image data using the replicated image data.

5. An image processing device according to any one of the preceding claims wherein said processor is further adapted to: modify said replacement data to at least partially conform the perspective, tone and brightness of the image represented by the replacement data to that of the corresponding portion of the primary image data.

6. An image processing device according to any one of the preceding claims wherein said modification of said primary image data includes modifying the primary image data based on said secondary image data to produce an at least partial three dimensional representation of the scene, and removing image data in the three dimensional representation of the scene corresponding to the one or more replacement objects.

7. An image processing device according to claim 6, wherein said three dimensional representation of the scene is stored as a matrix pixel array and said object information is adaptively compared to the matrix pixel array to determine a best object, size and perspective match to identify the pixels to be removed from the matrix to allow background imagery represented by the replacement data form the secondary image data to be visible.

8. An image processing device according to any one of the preceding claims comprising one or more further inputs for receiving further image data relating to the scene as observed by a one or more additional cameras, wherein said further image data be used to supplement or substitute said secondary image data.

9. An image processing device according to any one of the preceding claims wherein said primary image data includes additional data from one or more other sensors located on or near said first camera, said additional data representing information relating to one of more of: carbon dioxide levels, oxygen levels, atmospheric pressure, temperature, humidity, and particulate concentration.

10. An image processing device according to claim 9, wherein said output modified data includes some or all of said additional data.

11. An image processing device according to claim 9, wherein said output modified image data includes time information reflecting the time that the primary image data and said additional data was obtained.

12. A system including an image processing device of claim 9, 10 or 11 and an environmental chamber, said environmental chamber including display means for displaying images represented by said modified image data and one of more transducers arranged to replicate the environmental conditions measured by said other sensors in said environmental chamber.