AU2018202801A1

AU2018202801A1 - Method, apparatus and system for producing a foreground map

Info

Publication number: AU2018202801A1
Application number: AU2018202801A
Authority: AU
Inventors: Jonathan GAN
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-04-23
Filing date: 2018-04-23
Publication date: 2019-11-07

Abstract

Abstract METHOD, APPARATUS AND SYSTEM FOR PRODUCING A FOREGROUND MAP The present disclosure provides a method (300) of determining an object of interest in a dynamic scene in presence of turbulence. The method (300) comprises receiving a target image of the dynamic scene and at least two reference images capturing a static scene associated with the dynamic scene, wherein the target image comprises the object of interest; determining a turbulence colour change model using the reference images, the turbulence colour change model defining colour change for pixels co-occurring in the reference images caused by the turbulence; and determining whether a pixel in the target image corresponds to the object of interest based on the turbulence colour change model, colour at said pixel position in the target image, and a colour change associated with said pixel. P298570_14591766_1 -n Q) -0 - -~CD N, D- D4- -C-D CD ~ CD CD 0 0~ 0 0c cin -01- C

Description

METHOD, APPARATUS AND SYSTEM FOR PRODUCING A FOREGROUND MAP

TECHNICAL FIELD [0001] The present invention relates generally to digital video signal processing and, in particular, to segmentation of video data into foreground and background components.

BACKGROUND [0002] A fundamental problem in computer vision is the detection of moving objects. The problem can be stated as: given a sequence of image frames, each image frame is to be segmented into a number of components. Ideally, the image is to be segmented into two components: a foreground component (having moving objects in the scene of the image) and a background component (having static objects in the scene of the image).

[0003] In some cases, a third component (i.e., a “shadow” component), which includes the shadows cast by the moving objects, is included. The shadow component has unusual properties since the shadow appears to move with the moving objects, but should still be considered as part of the background of the scene. Finally, the output of the foreground / background segmentation can be efficiently represented by a binary image or map, or a ternary map if a shadow component is included.

[0004] Moving object detection was traditionally employed for the purpose of surveillance from fixed cameras. Early methods were based on background subtraction. That is, given the sequence of image frames, a reference frame representative of the background scene may be obtained by averaging the image sequence across time. Then, the moving objects are detected by taking the difference between the current image frame and the reference frame, and comparing this difference image against a threshold value.

[0005] Modem methods are commonly based on Gaussian Mixture Models (GMMs). In these methods, each pixel of the image in the image frame sequence is statistically modelled as a mixture of Gaussian distributions. The mean and variance of these distributions are estimated by observing the values the pixel takes over time. While the learned Gaussian distributions are adaptive to the video data, classification of each of the distributions as background or foreground may be heuristic. For example, distributions with lower variance may be assumed

P298570_14591766_1

-22018202801 23 Apr 2018 to be modelling background scene variation. The pixel for the current frame can then be classified by determining which Gaussian distribution has the highest probability of producing the current pixel value.

[0006] Both methods of moving object detection described above operate at pixel-level. That is, their output is a foreground map that has the potential to be pixel-level accurate, because each pixel in the current frame is classified independently from a pixel-level background model. In the case of background subtraction, the background “model” is the value of the corresponding pixel in the reference frame, while in the case of GMM methods, the background model is the mixture of Gaussian distributions. One problem with this approach is a vulnerability to camera shake. Camera shake means that a particular feature in the scene will be imaged at substantially different pixel locations in successive image frames over time. Therefore, the classification of a pixel from the current frame will be incorrectly based on a background model built from samples that should not correspond to that pixel. Typically, camera shake can be corrected for by using rigid registration. For example, the image frames may be registered by estimating, and applying, a linear transform, which is sufficient to describe global variations in rotation, scale, and translation.

[0007] Advances in camera sensors and lenses have improved the distance capability of video surveillance. In particular, the most limiting factor in image quality of long range video surveillance is not the sensor size, nor the lens, but geometric distortions introduced by atmospheric turbulence in the imaging path between the scene and the camera. Atmospheric turbulence is mainly due to fluctuation in the refractive index of the atmosphere. Variation in the refractive index involves many factors including wind, temperature gradients, and elevation. Thus, the geometric distortion introduced by atmospheric turbulence is both spatially and temporally varying. For the same reasons as camera shake, atmospheric turbulence reduces the accuracy of pixel-based moving object methods. However, unlike camera shake, atmospheric turbulence cannot be corrected by a global transform. Thus, there is a need for a method of moving object detection that is resilient to the spatially and temporally varying geometric distortion caused by atmospheric turbulence.

P298570 14591766 1

2018202801 23 Apr 2018

-3 SUMMARY [0008] It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.

[0009] In an aspect of the present disclosure, there is provided a method of determining an object of interest in a dynamic scene in presence of turbulence, the method comprising: receiving a target image of the dynamic scene and at least two reference images capturing a static scene associated with the dynamic scene, wherein the target image comprises the object of interest; determining a turbulence colour change model using the reference images, the turbulence colour change model defining colour change for pixels co-occurring in the reference images caused by the turbulence; and determining whether a pixel in the target image corresponds to the object of interest based on the turbulence colour change model, colour at said pixel position in the target image, and a colour change associated with said pixel.

[00010] In another aspect of the present disclosure, there is provided a system for determining an object of interest in a dynamic scene in presence of turbulence, the system comprising: a display configured to display images of the dynamic scene; memory coupled with the display configured to store the images; and a processor coupled with the memory configured to: receive a target image of the dynamic scene and at least two reference images capturing a static scene associated with the dynamic scene, wherein the target image comprises the object of interest; determine a turbulence colour change model using the reference images, the turbulence colour change model defining colour change for pixels co-occurring in the reference images caused by the turbulence; determine the object of interest in the target image based on the turbulence colour change model and colour changes associated with pixel positions in the target image; process the target image based on the determined object of interest to form a processed target image with reduced effects of turbulence; and display the processed target image on the display.

[00011] In another aspect of the present disclosure, there is provided a non-transitory computer readable medium comprising one or more software application programs that are executable

P298570 14591766 1

-42018202801 23 Apr 2018 by a processor, the one or more software application programs comprising a method of determining an object of interest in a dynamic scene in presence of turbulence, the method comprising: receiving a target image of the dynamic scene and at least two reference images capturing a static scene associated with the dynamic scene, wherein the target image comprises the object of interest; determining a turbulence colour change model using the reference images, the turbulence colour change model defining colour change for pixels cooccurring in the reference images caused by the turbulence; and determining whether a pixel in the target image corresponds to the object of interest based on the turbulence colour change model, colour at said pixel position in the target image, and a colour change associated with said pixel.

[00012] In another aspect of the present disclosure, there is provided a method of determining an object of interest in a target image of a scene, the method comprising: receiving a target image of the scene and at least two reference images capturing a background of the scene, wherein the target image comprises the object of interest; determining a turbulence colour change model using the reference images of the scene, the turbulence colour change model comprising a plurality of reference pairs of colours co-occurring in the reference images, wherein each reference pair being associated with a likelihood score; determining a colour at a pixel position in the target image and a colour at a corresponding pixel position in at least one of the reference images to form a target pair of corresponding colours; and determining whether the pixel position in the target image corresponds to the object of interest using a likelihood score associated with the target pair determined based on the turbulence colour change model.

BRIEF DESCRIPTION OF THE DRAWINGS [00013] At least one embodiment of the present invention will now be described with reference to the following drawings and and appendices, in which:

[00014] Fig. 1 is a schematic block diagram showing functional modules of a moving object detection system;

P298570 14591766 1

-52018202801 23 Apr 2018 [00015] Figs. 2A and 2B form a schematic block diagram of a general purpose computer system upon which arrangements described can be practiced;

[00016] Fig. 3 is a schematic flow diagram showing a method of producing a foreground map;

[00017] Fig. 4 shows an example joint histogram calculated from two reference frames;

[00018] Fig. 5 shows an example pair of candidate reference frames containing a moving object, with division of the candidate reference frames into regions; and [00019] Fig. 6 is a schematic flow diagram showing the generate foreground map sub-process of the method shown in Fig. 3.

DETAILED DESCRIPTION INCLUDING BEST MODE [00020] Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function/s) or operation/s), unless the contrary intention appears.

[00021] Fig. 1 is a schematic block diagram showing functional modules of a moving object detection system 100. The moving object detection system 100 comprises a rigid registration module 106, a reference frame selector module 110, a colour to label map generator module 120, a first colour to label mapper module 130, a joint histogram module 140, a second colour to label mapper module 150, and a likelihood score module 160. Each of the modules 106, 110, 120, 130, 140, 150, and 160 is performed by one or more of the software application programs 233, which is executable by the processor 205 (see Figs. 2A and 2B) or dedicated hardware such as graphic processors, digital signal processors, or one or more microprocessors and associated memories.

[00022] The rigid registration module 106 receives original video frames 104 from a source device 102. Examples of the source device 102 include a camera, which can be a visible light camera, infrared camera or any other device capable of capturing an image-based representation of a scene. The rigid registration module 106 performs a rigid registration to produce registered video frames 108. The rigid registration performed by the module 104 on

P298570 14591766 1

-62018202801 23 Apr 2018 the original video frames 103 is optional. For example, if the original video frames 104 are already well aligned, the rigid registration may be skipped. However, if there is any camera shake on the source device 102, the rigid registration may be performed.

[00023] The rigid registration module 106 applies a global, linear transformation to each of the original video frames 104. A global linear transformation can apply rotation, scaling, and translation, which is sufficient to compensate for distortion caused by camera shake. The linear transformation can be estimated by a number of methods, such as feature point matching or phase correlation.

[00024] The rigid registration module 106 is connected to the reference frame selector module 110 and the second colour to label mapper module 150. If rigid registration is performed by the rigid registration module 106, the output of the rigid registration module 106 is registered video frames 108. Otherwise, the output of the rigid registration module 106 is the original video frames 104.

[00025] The reference frame selector module 110 receives the registered video frames 108 or the original video frames 104 from the rigid registration module 106. The reference frame selector module 110 then selects a pair of reference frames 112 from the registered video frames 108 or the original video frames 104. The pair of reference frames 112 is selected such that the reference frames 112 do not contain any moving objects, but the reference frames 112 do include differences caused by variation in the background scene. The variation in the background scene may be due to geometric distortion caused by atmospheric turbulence. The manner in which the pair of reference frames 112 is selected is described in further detail below with reference to Figure 3. In other words, the reference frames 112 include a static scene in which no moving object is in the scene. A static scene, however, may still have motion caused by the atmospheric turbulence.

[00026] The reference frame selector module 110 is connected to the colour to label map generator module 120 and the first colour to label mapper module 130.

[00027] The colour to label map generator module 120 receives the pair of reference frames 112 and produces a colour to label map 122. The purpose of the colour to label map 122 is to limit the dimensionality of joint statistics calculated over the pair of reference frames 112. If P298570 14591766 1

2018202801 23 Apr 2018

-7the original video frames 104 are 8-bit greyscale images, then there are only 256 different colours, and the colour to label map may be one-to-one. However, if the original video frames 104 are 8-bit RGB images, then there are 2^Λ24 = 16777216 different colours. The colours may be mapped to a reduced number of labels by quantisation. For example, if 8-bit RGB colours are quantised by a factor of 8 for each of the red, green and blue values, then the number of labels corresponding to these quantised colours is 2^Λ15 = 32768. Alternatively, or in addition to quantisation, the colours may be mapped to labels by adapting to the colours present in the pair of reference frames 112. Although the full range of possible colours is very large, the number of colours exercised by the pair of reference frames 112 for natural scenes may be sparse. Then, the colours may be mapped to labels by assigning a label to each different colour, or each different quantised colour, observed in the pair of reference frames 112. The remaining colours not present in the pair of reference frames 112 may be assigned to a single label. The colour to label map generator module 120 is connected to the first colour to label mapper module 130 and the second colour to label mapper module 150.

[00028] The first colour to label mapper module 130 receives the colour to label map 122 (from the colour to label map generator 120) and the pair of reference frames 112 (from the reference frame selector 110), and maps each pixel from the pair of reference frames 112 from colours to labels to produce a first labelled reference frame 132 and a second labelled reference frame 134. For convenience sake, the first labelled reference frame 132 will be referred to as R_o, and the second labelled reference frame 134 will be referred to as R_v [00029] The first colour to label mapper module 130 is connected to the joint histogram module 140 and the likelihood score module 160.

[00030] The joint histogram module 140 receives the first and second labelled reference frames R_o and from the first colour to label mapper module 130, and calculates a joint histogram of the labels to produce joint statistics 142. Because the pair of reference frames 112 is selected to exclude moving objects, but still contain differences caused by variation in the background scene, the joint statistics 142 model the statistics of the background scene. The variation in the background scene may be due to geometric distortion caused by atmospheric turbulence, in which case the joint statistics 142 may be considered a turbulence colour change model. The turbulence colour change model defines the colour change for pixels co-occurring in the pair of reference frames 112 caused by the turbulence. The process

P298570 14591766 1

-82018202801 23 Apr 2018 by which the joint statistics 142 are produced is described in further detail below with reference to Fig. 4.

[00031] The joint histogram module 140 is connected to the likelihood score module 160.

[00032] The second colour to label mapper module 150 receives the colour to label map 122 (from the colour to label map generator 120) and a current frame from the registered video frames 108 or the original video frames 104 (from the rigid registration module 106), and maps each pixel from the current frame from colours to labels, producing a labelled current frame 152. For convenience sake, the labelled current frame 152 will be referred to as A.

[00033] The current frame received from the rigid registration module 106 includes a dynamic scene where a moving object (i.e., foreground) is in the scene.

[00034] The second colour to label mapper module 150 is connected to the likelihood score module 160.

[00035] The likelihood score module 160 receives the labelled reference frame R_o (from the first colour to label mapper module 130), the labelled current frame A (from the second colour to label mapper module 150), and joint statistics 142 (from the joint histogram module 140). For each co-occurring pair of pixels in R_o and A, the likelihood score module 160 calculates a likelihood score of the pair of colour labels associated with the co-occurring pair of pixels in the background model associated with the joint statistics 142. The likelihood score is compared against a threshold. If the likelihood score is higher than the threshold, then the corresponding pixel in A is considered to be predicted by the joint statistics 142 as a background pixel. Conversely, if the likelihood score is lower than the threshold, the corresponding pixel in A is marked as a foreground pixel. Therefore, the likelihood score determines whether a pixel position in the labelled current frame A corresponds to foreground (e.g., the moving object) based on the turbulence colour change model (i.e., the joint statistics 142), colour at the pixel position in the labelled current frame A, and a colour change associated with the pixel position. The colour change associated with a pixel position can be determined, for example, as a pair of colours (or labels) of a pixel identified at that pixel position such that one colour in the pair corresponds to colour at that pixel position of the current frame A and the other colour corresponds to colour at that pixel position of the

P298570 14591766 1

-92018202801 23 Apr 2018 reference frame, for example R_o. Over all the pixels in A, the background and foreground labelled pixels constitute a foreground map 162, which is the output of the likelihood score module 160.

[00036] Figs. 2A and 2B depict a general-purpose computer system 200, upon which the various arrangements described can be practiced.

[00037] As seen in Fig. 2A, the computer system 200 includes: a computer module 201; input devices such as a keyboard 202, a mouse pointer device 203, a scanner 226, a camera 227, and a microphone 280; and output devices including a printer 215, a display device 214 and loudspeakers 217. An external Modulator-Demodulator (Modem) transceiver device 216 may be used by the computer module 201 for communicating to and from a communications network 220 via a connection 221. The communications network 220 may be a wide-area network (WAN), such as the Internet, a cellular telecommunications network, or a private WAN. Where the connection 221 is a telephone line, the modem 216 may be a traditional “dial-up” modem. Alternatively, where the connection 221 is a high capacity (e.g., cable) connection, the modem 216 may be a broadband modem. A wireless modem may also be used for wireless connection to the communications network 220.

[00038] The computer module 201 typically includes at least one processor unit 205, and a memory unit 206. For example, the memory unit 206 may have semiconductor random access memory (RAM) and semiconductor read only memory (ROM). The computer module 201 also includes an number of input/output (I/O) interfaces including: an audio-video interface 207 that couples to the video display 214, loudspeakers 217 and microphone 280; an I/O interface 213 that couples to the keyboard 202, mouse 203, scanner 226, camera 227 and optionally a joystick or other human interface device (not illustrated); and an interface 208 for the external modem 216 and printer 215. In some implementations, the modem 216 may be incorporated within the computer module 201, for example within the interface 208. The computer module 201 also has a local network interface 211, which permits coupling of the computer system 200 via a connection 223 to a local-area communications network 222, known as a Local Area Network (LAN). As illustrated in Fig. 2A, the local communications network 222 may also couple to the wide network 220 via a connection 224, which would

P298570_14591766_1

-102018202801 23 Apr 2018 typically include a so-called “firewall” device or device of similar functionality. The local network interface 211 may comprise an Ethernet circuit card, a Bluetooth® wireless arrangement or an IEEE 802.11 wireless arrangement; however, numerous other types of interfaces may be practiced for the interface 211.

[00039] The I/O interfaces 208 and 213 may afford either or both of serial and parallel connectivity, the former typically being implemented according to the Universal Serial Bus (USB) standards and having corresponding USB connectors (not illustrated). Storage devices 209 are provided and typically include a hard disk drive (HDD) 210. Other storage devices such as a floppy disk drive and a magnetic tape drive (not illustrated) may also be used. An optical disk drive 212 is typically provided to act as a non-volatile source of data. Portable memory devices, such optical disks (e.g., CD-ROM, DVD, Blu-ray Disc™), USBRAM, portable, external hard drives, and floppy disks, for example, may be used as appropriate sources of data to the system 200.

[00040] The components 205 to 213 of the computer module 201 typically communicate via an interconnected bus 204 and in a manner that results in a conventional mode of operation of the computer system 200 known to those in the relevant art. For example, the processor 205 is coupled to the system bus 204 using a connection 218. Likewise, the memory 206 and optical disk drive 212 are coupled to the system bus 204 by connections 219. Examples of computers on which the described arrangements can be practised include IBM-PC’s and compatibles, Sun Sparcstations, Apple Mac™ or like computer systems.

[00041] The method of producing a foreground map may be implemented using the computer system 200 wherein the processes of Figs. 3 and 6 to be described, may be implemented as one or more software application programs 233 executable within the computer system 200. In particular, the steps of the method of producing a foreground map are effected by instructions 231 (see Fig. 2B) in the software 233 that are carried out within the computer system 200. The software instructions 231 may be formed as one or more code modules, each for performing one or more particular tasks. The software may also be divided into two separate parts, in which a first part and the corresponding code modules performs the foreground map generation methods and a second part and the corresponding code modules manage a user interface between the first part and the user (e.g., showing the generated foreground map on the user interface).

P298570 14591766 1

-11 2018202801 23 Apr 2018 [00042] The software may be stored in a computer readable medium, including the storage devices described below, for example. The software is loaded into the computer system 200 from the computer readable medium, and then executed by the computer system 200. A computer readable medium having such software or computer program recorded on the computer readable medium is a computer program product. The use of the computer program product in the computer system 200 preferably effects an advantageous apparatus for producing a foreground map.

[00043] The software 233 is typically stored in the HDD 210 or the memory 206. The software is loaded into the computer system 200 from a computer readable medium, and executed by the computer system 200. Thus, for example, the software 233 may be stored on an optically readable disk storage medium (e.g., CD-ROM) 225 that is read by the optical disk drive 212. A computer readable medium having such software or computer program recorded on it is a computer program product. The use of the computer program product in the computer system 200 preferably effects an apparatus for producing a foreground map.

[00044] In some instances, the application programs 233 may be supplied to the user encoded on one or more CD-ROMs 225 and read via the corresponding drive 212, or alternatively may be read by the user from the networks 220 or 222. Still further, the software can also be loaded into the computer system 200 from other computer readable media. Computer readable storage media refers to any non-transitory tangible storage medium that provides recorded instructions and/or data to the computer system 200 for execution and/or processing. Examples of such storage media include floppy disks, magnetic tape, CD-ROM, DVD, Bluray™ Disc, a hard disk drive, a ROM or integrated circuit, USB memory, a magneto-optical disk, or a computer readable card such as a PCMCIA card and the like, whether or not such devices are internal or external of the computer module 201. Examples of transitory or nontangible computer readable transmission media that may also participate in the provision of software, application programs, instructions and/or data to the computer module 201 include radio or infra-red transmission channels as well as a network connection to another computer or networked device, and the Internet or Intranets including e-mail transmissions and information recorded on Websites and the like.

[00045] The second part of the application programs 233 and the corresponding code modules mentioned above may be executed to implement one or more graphical user interfaces (GUIs)

P298570 14591766 1

- 122018202801 23 Apr 2018 to be rendered or otherwise represented upon the display 214. Through manipulation of typically the keyboard 202 and the mouse 203, a user of the computer system 200 and the application may manipulate the interface in a functionally adaptable manner to provide controlling commands and/or input to the applications associated with the GUI(s). Other forms of functionally adaptable user interfaces may also be implemented, such as an audio interface utilizing speech prompts output via the loudspeakers 217 and user voice commands input via the microphone 280.

[00046] Fig. 2B is a detailed schematic block diagram of the processor 205 and a “memory” 234. The memory 234 represents a logical aggregation of all the memory modules (including the HDD 209 and semiconductor memory 206) that can be accessed by the computer module 201 in Fig. 2A.

[00047] When the computer module 201 is initially powered up, a power-on self-test (POST) program 250 executes. The POST program 250 is typically stored in a ROM 249 of the semiconductor memory 206 of Fig. 2A. A hardware device such as the ROM 249 storing software is sometimes referred to as firmware. The POST program 250 examines hardware within the computer module 201 to ensure proper functioning and typically checks the processor 205, the memory 234 (209, 206), and a basic input-output systems software (BIOS) module 251, also typically stored in the ROM 249, for correct operation. Once the POST program 250 has run successfully, the BIOS 251 activates the hard disk drive 210 of Fig. 2A. Activation of the hard disk drive 210 causes a bootstrap loader program 252 that is resident on the hard disk drive 210 to execute via the processor 205. This loads an operating system 253 into the RAM memory 206, upon which the operating system 253 commences operation. The operating system 253 is a system level application, executable by the processor 205, to fulfil various high level functions, including processor management, memory management, device management, storage management, software application interface, and generic user interface.

[00048] The operating system 253 manages the memory 234 (209, 206) to ensure that each process or application running on the computer module 201 has sufficient memory in which to execute without colliding with memory allocated to another process. Furthermore, the different types of memory available in the system 200 of Fig. 2A must be used properly so that each process can run effectively. Accordingly, the aggregated memory 234 is not intended to illustrate how particular segments of memory are allocated (unless otherwise

P298570 14591766 1

- 13 2018202801 23 Apr 2018 stated), but rather to provide a general view of the memory accessible by the computer system 200 and how such is used.

[00049] As shown in Fig. 2B, the processor 205 includes a number of functional modules including a control unit 239, an arithmetic logic unit (ALU) 240, and a local or internal memory 248, sometimes called a cache memory. The cache memory 248 typically includes a number of storage registers 244 - 246 in a register section. One or more internal busses 241 functionally interconnect these functional modules. The processor 205 typically also has one or more interfaces 242 for communicating with external devices via the system bus 204, using a connection 218. The memory 234 is coupled to the bus 204 using a connection 219.

[00050] The application program 233 includes a sequence of instructions 231 that may include conditional branch and loop instructions. The program 233 may also include data 232 which is used in execution of the program 233. The instructions 231 and the data 232 are stored in memory locations 228, 229, 230 and 235, 236, 237, respectively. Depending upon the relative size of the instructions 231 and the memory locations 228-230, a particular instruction may be stored in a single memory location as depicted by the instruction shown in the memory location 230. Alternately, an instruction may be segmented into a number of parts each of which is stored in a separate memory location, as depicted by the instruction segments shown in the memory locations 228 and 229.

[00051] In general, the processor 205 is given a set of instructions which are executed therein. The processor 205 waits for a subsequent input, to which the processor 205 reacts to by executing another set of instructions. Each input may be provided from one or more of a number of sources, including data generated by one or more of the input devices 202, 203, data received from an external source across one of the networks 220, 202, data retrieved from one of the storage devices 206, 209 or data retrieved from a storage medium 225 inserted into the corresponding reader 212, all depicted in Fig. 2A. The execution of a set of the instructions may in some cases result in output of data. Execution may also involve storing data or variables to the memory 234.

[00052] The disclosed foreground map generation arrangements use input variables 254, which are stored in the memory 234 in corresponding memory locations 255, 256, 257. The foreground map generation arrangements produce output variables 261, which are stored in

P298570 14591766 1

- 142018202801 23 Apr 2018 the memory 234 in corresponding memory locations 262, 263, 264. Intermediate variables 258 may be stored in memory locations 259, 260, 266 and 267.

[00053] Referring to the processor 205 of Fig. 2B, the registers 244, 245, 246, the arithmetic logic unit (ALU) 240, and the control unit 239 work together to perform sequences of microoperations needed to perform “fetch, decode, and execute” cycles for every instruction in the instruction set making up the program 233. Each fetch, decode, and execute cycle comprises:

a fetch operation, which fetches or reads an instruction 231 from a memory location 228, 229, 230;

a decode operation in which the control unit 239 determines which instruction has been fetched; and an execute operation in which the control unit 239 and/or the ALU 240 execute the instruction.

[00054] Thereafter, a further fetch, decode, and execute cycle for the next instruction may be executed. Similarly, a store cycle may be performed by which the control unit 239 stores or writes a value to a memory location 232.

[00055] Each step or sub-process in the processes of Figs. 3 and 6 is associated with one or more segments of the program 233 and is performed by the register section 244, 245, 247, the ALU 240, and the control unit 239 in the processor 205 working together to perform the fetch, decode, and execute cycles for every instruction in the instruction set for the noted segments of the program 233.

[00056] The method of producing a foreground map may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of the foreground map generation. Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.

[00057] Fig. 3 is a schematic flow diagram showing a method 300 of producing the foreground map 162 from the original video frames 104. The method 300 is performed by the modules P298570 14591766 1

- 15 2018202801 23 Apr 2018

106, 110, 120, 130, 140, 150, and 160 of the system 100. Each of the steps of the method 300 is performed by the one or more software application programs 233, which is executable by the processor 205 as discussed above in relation to Figs. 2A and 2B.

[00058] The method 300 begins at a rigid registration step 302. As described above with reference to Fig. 1, the rigid registration step 302 is performed by the rigid registration module 106. The rigid registration module 106 perform rigid registration on the original video frames 104 to remove camera shake. As discussed above, the rigid registration step 102 may be skipped if the original video frames 104 do not have any camera shake.

[00059] An example of the rigid registration performed in the rigid registration step 302 by the rigid registration module 110 is now discussed. Let the original image samples in a current original video frame O be O[x_o, y₀] where x₀ denotes the column location of the sample, and y₀ denotes the row location of the sample. Then, each original image sample O[x_o,y_o] may be mapped to a registered image sample P[x_R, γ_β], which together form a current registered video frame P. In general, the mapped registered image samples P[x_R, y_fi] may not occur at integer pixel locations. Then, desired registered image samples at integer pixel locations may be interpolated from the mapped registered image samples. The mapping from original image samples to registered image samples may be performed by a linear transform T:

-x_Ry_R

L1J

-x₀y₀ i

[00060] T is a global linear transform if it is fixed for each image sample O [x₀, y₀J. A global linear transform is sufficient to correct for camera shake. In one arrangement, T may be a pure translation transform:

.0 o x_ti y_t o 1.

[00061] An advantage of the above arrangement is that solving for T is greatly simplified. In another arrangement, T may be a rotation, scale, and translation transform:

P298570_14591766_1

- 162018202801 23 Apr 2018

	1	0	x_t	s	0	0'	cos Θ	—sin Θ	0'
T =	0	1	y_t	0	s	0	sin Θ	cos Θ	0
	.0	0	1.	.0	0	1.	. 0	0	1.

[00062] The advantage of the above arrangement is that a rotation, scale and translation transform will easily model any perspective distortion caused by camera shake.

[00063] Before applying the transform T, the rigid registration module 106 first estimates the parameters of T. In one arrangement, T may be estimated by first identifying sets of corresponding feature points in the images, and then solving for the best fitting T by least squares or otherwise. In another arrangement, T may be estimated by maximising a phase correlation coefficient. After performing rigid registration on the original video frames 104, the method 300 then proceeds from the rigid registration step 302 to a select reference frames step 304.

[00064] At the select reference frames step 304, the reference frame selector module 110 selects the pair of reference frames 112. In one arrangement, the pair of reference frames may be selected manually by a human operator. The pair of reference frames 112 may be consecutive frames from the registered video frames 108. An advantage of the present arrangement is that selecting consecutive frames 112 only requires a single input from the human operator. It should be noted that here and after rather than using registered video frames 108 for the purposes of reference frame selection at the select reference frames step 304, the reference frame selector module 110 may use the original video frames 104, for example if the rigid registration step 102 is skipped.

[00065] In another arrangement of the select reference frames step 304, the pair of reference frames 112 may be selected in order to capture differences due to variation in the background scene. In this arrangement, the pair of reference frames 112 may be individually selected manually by the human operator.

[00066] In another arrangement of the select reference frames step 304, a first reference frame may be manually selected by the human operator. A second reference frame is automatically selected at a fixed time period separated from the first reference frame. The fixed time period may be set equal to a known de-correlation time τ of atmospheric turbulence in the scene, or of other temporal phenomena in the scene. The de-correlation time τ may be defined as a time

P298570 14591766 1

- 172018202801 23 Apr 2018 required for a geometric distortion caused by atmospheric turbulence at time t₀, expressed as a vector field of phase shifts v(t₀), and a geometric distortion caused by atmospheric turbulence v(tf) at time t_1? to be de-correlated with each other. For example, the crosscorrelation of v(t₀) with itself, also known as the autocorrelation, may be expressed as v(t₀) * v(t₀). I^{n suc}h ^case, v(t₀) is known to be correlated with itself, and the correlation is confirmed by the cross-correlation signal v(t₀) ★ v(t₀) having a peak at zero relative displacement. In another example, the atmospheric turbulence at time t₀ and at time may be said to be correlated if v(t₀) = v_d(t±), where d is a spatial displacement caused by wind velocity. In such a case, the atmospheric turbulence satisfies Taylor’s frozen flow hypothesis, which posits that over a short duration, the vector field of phase shifts does not evolve, but is only displaced by the wind. Then, the correlation signal expressed by v(t₀) * v(tf) will have a peak at d. Over time, the frozen turbulence model fails to hold, and the atmospheric turbulence de-correlates. By observing the decay in the size of the space-time correlation peak of v(t₀) * v(t₀ + τ) as τ increases, the temporal de-correlation time τ is obtained. For example, the temporal de-correlation time τ is set to the minimum time after which the spacetime correlation peak has decayed to a particular fraction, such as 0.05, of the correlated peak magnitude:

[00067] In practice, the de-correlation time τ may be estimated from more easily measured causative factors of atmospheric turbulence. For example, higher wind speed and longer imaging distance between the camera and target are both factors contributing to stronger atmospheric turbulence, and therefore shorter de-correlation time τ. The strength of atmospheric turbulence may also depend on the irregularity of wind, which may be affected by temperature, time of day, and altitude. As such, a de-correlation time can be predetermined for a set of atmospheric conditions and an imaging distance. For example, given a current wind speed, temperature, and an imaging distance (a distance at which the object of interest is expected to appear), a de-correlation time can be determined. In some implementations, the decorrelation time can be determined by searching a look-up table for given atmospheric conditions, such as wind speed and temperature, and an imaging distance.

P298570 14591766 1

- 18 2018202801 23 Apr 2018 [00068] Then, the present arrangement combines the advantages of the above arrangements. Firstly, the required input from the human operator is reduced to a single selection. Secondly, the pair of reference frames 112 are selected separated by a fixed time period, which allows the pair of reference frames to capture differences due to variation in the background scene.

[00069] In another arrangement of the select reference frames step 304, the pair of reference frames 112 may be automatically selected from the registered video frames 108 without input from a human operator. Then, any two frames from the registered video frames 108 may be candidates for the pair of reference frames. To reduce the search complexity, the method 300 may test each frame from the registered video frames 108 as a candidate first reference frame R_o, but automatically select a candidate second reference frame R_r at a fixed time period separated from the candidate first reference frame R_o. The fixed time period may be set equal to a known de-correlation time of atmospheric turbulence in the scene, or of other temporal phenomena in the scene. In the present arrangement, the method 300 should reject candidate reference frames that contain moving objects. However, if the candidate reference frames are close in time, then slowly moving objects may not result in significant differences between the candidate reference frames. Then, the fixed time period may also be set greater than a minimum time period required to reject slowly moving objects.

[00070] To determine whether to select a candidate pair of reference frames, the method 300 may calculate a similarity score between the two frames. For example, the similarity score may be a cross-correlation between the two frames, at zero relative displacement:

(R_o * Rq )[°] = x,y where (R_o [v y]> Ri [^χ, 3⁷]) indicates an inner product between RGB values R_o [%, y] and R-Jx, y]. If the cross-correlation exceeds a predetermined threshold, the candidate pair of reference frames may be selected as the pair of reference frames 112.

[00071] Alternatively, the similarity score may be a mutual information between the two frames:

P298570 14591766 1

- 192018202801 23 Apr 2018 .p(A[*,y])p(A[*,y]) where p(R₀[x, y]) indicates the marginal probability of the RGB value R_o [x, y] occurring in R_o, ρ(βι [x, y]) indicates the marginal probability of the RGB value Ri[x,y] occurring in R_±, and p(R₀ [x, y], Ri [x> y]) indicates the joint probability of the RGB values P₀[x, y] ^and Ri [x, y] co-occurring at [x, y] in R_o and R_r. If the mutual information, or a similarity score in general, between a pair of candidate reference frames exceeds a predetermined threshold, the candidate pair of reference frames may be selected as the pair of reference frames 112 for the purposes of determining the joint statistics 142 by the joint histogram module 140.

[00072] Similarity measures such as cross-correlation and mutual information are sensitive to pixel-level variations between images. In particular, pixel-level variations between the candidate reference frames may exist because of atmospheric turbulence. Then, rather than calculating the similarity score directly on the candidate reference frames R_o and the method 300 may first apply a low-pass filter to the candidate reference frames. The low-pass filter spatially smooths the candidate reference frames, which reduces the extent of pixel-level variations. The amount of smoothing may be controlled based on a known strength of atmospheric turbulence in the scene. For example, the low-pass filter may be a Gaussian filter 91 x²+y² ^{Sfey) =} 2^^e‘²’² where the parameter σ is set equal to the known strength of atmospheric turbulence, expressed as the variance of the local image shifts in the pixels caused by turbulence in the current atmospheric and imaging conditions.

[00073] Further arrangements of the select reference frames step 304 are described below, with reference to Fig. 5.

[00074] The method 300 then proceeds from the select reference frames step 304 to a generate colour to label map step 306. At the generate colour to label map step 306, the colour to label map generator module 120 produces a colour to label map 122, as described above with

P298570 14591766 1

-202018202801 23 Apr 2018 reference to Fig. 1. The colour to label map 122 maps the set of colour values possible in the pair of reference frames 112 to a reduced set of colour labels. For example, the colour to label map 122 may map the set of 8-bit RGB colours, which comprise 16777216 different colours to a set of 1024 colour labels. The 8-bit RGB colours may be assigned to colour labels by kmeans clustering of the colour values in the pair of reference frames 112, where the resulting clusters correspond to the colour labels. Then the colour to label map 122 may be implemented as a look-up table (LUT) with 16777216 entries. In another arrangement of the generate colour to label map step 306, the colour to label map 122 may be implemented as a quantisation step, followed by a LUT. One advantage of the present arrangement is that the size of the LUT is reduced. For example, if a quantisation factor of 8 is chosen, then the LUT required for 8-bit RGB colours only contains 32768 entries. The method 300 then proceeds from the generate colour to label map step 306 to a map reference frames to labels step 308.

[00075] At the map reference frames to labels step 308, the pair of reference frames 112 are mapped to be labelled reference frames R_o and R_r by the first colour to label mapper module 130, using the colour to label map 122. For convenience sake, the pair of reference frames 112 will be referred to separately as the selected first reference frame R_o, and the selected second reference frame R_r. Then, all pixels in the pair of reference frames 112 are mapped to labels by quantisation and indexing into the LUT:

R₀[%,y] = LUT

R₀[x>y]'

Q

Ri [%,y] = LUT where Q is the quantisation factor, and L[7T[·] represents the look-up operation. The method 300 then proceeds from the map reference frames to labels step 308 to an estimate joint statistics step 310.

[00076] At the estimate joint statistics step 310, the joint histogram module 140 estimates the joint statistics 142 by computing a joint histogram H_{R R} between the labelled reference frames R_o and R_r. Let the set of colour labels be £ — {0,1,2,... L}, where labels 1,2,... L correspond to colours existing in the reference frames R_o and and the label 0 corresponds

P298570 14591766 1

-21 2018202801 23 Apr 2018 to colours not found in the reference frames R_o and R_±. Then, in one arrangement of the estimate joint statistics step 310, each bin of the joint histogram H_{R R1} [Z_o, Ζχ] is a count of the number times a pixel in R_o having a label l₀ co-occurs with a pixel in R_± having a label Ζχ. The joint histogram H_{Rq Ri} may be computed by:

h_Ro,_R1 [z₀, M = <5ζο.ίι [^βο [χ- yl Ri l*> y] ] x,y where _{Λ hl} _ fl if a = l_Q and b = l_r ^l°'^li ’ 10 otherwise [00077] Note that by definition, colours corresponding to label 0 do not occur in the reference frames R_Q and R_r, and so H_RqRi [l₀, ZJ — 0 when l₀ — 0 or l_L — 0. Therefore, the joint histogram bins for label 0 do not need to be computed or stored. However, the joint histogram bins for label 0 are used to calculate probabilities in subsequent steps of the method 300.

[00078] Fig. 4 shows an example joint histogram that may be produced by the present arrangement of step 310. A label axis 410 corresponds to colour labels occurring in the labelled reference frame R_o, while a label axis 412 corresponds to colour labels occurring in the labelled reference frame Λ-χ. Histogram bins arranged on a diagonal 420 are shown with relatively large values. The histogram bins on the diagonal 420 correspond to co-occurrences of the same colour label in R_o and R_±. If R_o and R_L were identical images, then the histogram bins on the diagonal 420 would be the only non-zero valued bins, and would reflect the relative distribution of colours in the images. However, the reference frames are selected to contain differences caused by variation in the background scene. The variation in the background scene may be due to geometric distortion caused by atmospheric turbulence. Then, the values of the histogram bins on the diagonal 420 correspond to co-occurrences of the same colour label in R_o and R_r, which result generally from large, homogenous regions such as the interior of background objects. However, at the edges between background objects, or in textured regions of the background scene, the geometric distortion caused by atmospheric turbulence may result in co-occurrences of different colour labels in R_Q and R_r, which in the example joint histogram of Fig. 4 are demonstrated by non-zero off-diagonal histogram bins 430. The non-zero off-diagonal histogram bins 430 are shown with relatively small values, compared to the histogram bins on the diagonal 420. This is because coP298570 14591766 1

-222018202801 23 Apr 2018 occurrences of different colour labels generally occur due to the geometric distortion caused by atmospheric turbulence acting on point or edge background features, while co-occurrences of same colour labels generally occur at pixels within background object interiors.

[00079] If the reference frames R_o and R± are selected from a single set of registered video frames 108 or a single set of original video frames 104, it may be reasonable to assume that the two reference frames are observations from the same underlying statistics, e.g. from the same background turbulence model. For example, the strength of atmospheric turbulence corresponding to the two reference frames may be known to have not changed. Then, if the underlying statistics for R_o and R_± are the same, the order of the reference frames should not matter. Therefore H_{R R1} and H_R1R should both be estimates for the same underlying joint statistics 142, or equivalently, H_{R R} should be equal to H_{R R} and the joint histogram should be symmetric. However, the joint histograms are only estimates for the underlying joint statistics. Then, because the joint histograms are computed from a limited number of observations corresponding to the number of pixels in the reference frames, the estimates for the joint statistics are affected by statistical uncertainty, and in general H_{R R} and H_{R R} as calculated in the above arrangement will not be symmetric.

[00080]In another arrangement of the estimate joint statistics step 310, the joint histogram

H_RqiR1 may be computed to guarantee symmetry by:

Hr_OiR1 Uo, M = δ_1ο>1ι [7?₀ [x, y], R₁ [x, y]] x,y (1 where

Si_o,ija,b] = if a ~ l₀ and b — if α — Ιγ and b — l_Qotherwise [00081] In the present arrangement, each observation of different colour label pairings results in incrementing two bins of the symmetric joint histogram H_{R R} . Then, the symmetric joint histogram H_{R R1} is effectively populated from approximately twice as many observations compared to the joint histogram of the above arrangement, and therefore the symmetric joint histogram is less affected by statistical uncertainty. Then, one advantage of the present arrangement is that if the assumption that the same background turbulence model describes

P298570 14591766 1

-23 each of the two reference frames is true, then the symmetric joint histogram H_R(jRi may more accurately estimate the underlying joint statistics 142. Another advantage of the present arrangement is that the symmetric joint histogram may be implemented and stored in memory efficiently in an upper-triangular form. Note that the computation of the symmetric joint histogram results in non-zero off-diagonal histogram bins that are approximately doubled in value. The discrepancy can easily be corrected by doubling the value of histogram bins on the diagonal. However, the effectiveness of moving object detection is most dependent on the non-zero off-diagonal histogram bins, so correction of the histogram bins on the diagonal may be unnecessary, and possibly undesirable.

[00082]From the joint histogram H_{Rq Λι}, the estimate joint statistics step 310 may then compute joint probabilities p_fioRi and marginal probabilities ρ_βθ and p_Ri:

N — ^R₀.Ri A] [00083] As described above, the joint statistics 142 is considered to be a turbulence colour change model when the variation in the background scene is due to geometric distortion caused by atmospheric turbulence.

[00084] The method 300 then proceeds from the estimate joint statistics step 310 to a map current frame to labels step 312.

[00085] At the map current frame to labels step 312, the current frame from the registered video frames 108 or the original video frames 104 is mapped to the labelled current frame A by the second colour to label mapper module 150, using the colour to label map 122. For

P298570 14591766 1

-242018202801 23 Apr 2018 convenience sake, the current frame from the registered video frames 108 or the original video frames 104 will be referred to as A. Then, all pixels in the current frame A are mapped to labels by quantisation and indexing into the LUT associated with the colour to label map 122:

A[x,y] = LUT

A[x, y]

Q [00086] In particular, any colours in A that did not occur in the reference frames R_o and R_r are mapped to label 0. The method 300 then proceeds from the map current frame to labels step 312 to a generate foreground map sub-process 314.

[00087] Fig. 6 is a schematic flow diagram showing sub-steps of the generate foreground map sub-process 314. The generate foreground map sub-process 314 begins at an initialise pixel location sub-step 602. At the initialise pixel location sub-step 602, a pixel location (x, y) is initialised to point to a beginning location of the labelled current frame A. For example, the pixel location (x, y) may be initialised to the top-left corner of the labelled current frame A, with a coordinate system chosen such that the initialisation corresponds to x — 0, y — 0. The sub-process 314 then proceeds from the initialise pixel location sub-step 602 to a calculate likelihood score sub-step 604.

[00088] At the calculate likelihood score sub-step 604, the sub-process 314 calculates a likelihood S that the joint statistics 142 would predict a current label a — A [x, y] from the labelled current frame A occurring, given that a reference label r — R_o [x, y] occurred at the same pixel location in the labelled reference frame R_o. Note that in the arrangements below of the likelihood score, if the joint probability p_{R R} (a, r) is zero, then the likelihood score S is set to a predetermined threshold Θ, regardless of the value of the denominator. Thus, there is no possibility of division by zero exceptions.

[00089] In one arrangement, the likelihood may be a conditional probability. The conditional probability may be calculated as:

S p(R_± = a| R_o = ^PR°'^R^'^r)

Pr₀\T)

P298570 14591766 1

-25 2018202801 23 Apr 2018 [00090] One advantage of the present arrangement is that the conditional probability is a probability, and thus bounded in the range [0,1], which is convenient for thresholding. Another advantage of the conditional probability is that the likelihood is normalised for the probability p^ (r). That is, if the label r occurring in R_o is a rare event, the conditional probability is not affected.

[00091] In another arrangement of the calculate likelihood score sub-step 604, the likelihood may be a pointwise mutual information between the current label a and the reference label r:

S ~ pmi(a;r) — log

Prq.r/^O

PRofaPRifa [00092] One advantage of the present arrangement is that the pointwise mutual information is normalised for both p_Ro (r) and p_Ri (a). Therefore, the pointwise mutual information is a symmetric measure, or equivalently pmi(a; r) = pmi(r; d). However, one disadvantage of the pointwise mutual information is that the likelihood is not bounded in the range [0,1].

[00093] In another arrangement of the calculate likelihood score sub-step 604, the likelihood may be a “covariance score”:

S = cov(a, r) —

Pr₀,rS^u> ^r'>________

PRoWPR!^) +PR₀,Rifa^r) [00094] The present arrangement combines advantages of the above arrangement, as the covariance score is both symmetric (cov(a,r) — cov(r, a)), and bounded in the range [0,1].

[00095] In the above arrangements of the calculate likelihood score sub-step 604, the occurrence of the current label a is tested jointly with the co-occurrence of the reference label r from the labelled reference frame R_o. One disadvantage of these arrangements is that the estimated joint statistics 314 are only reliable if there are sufficient observations. Then, if the label r occurring in R_o is a rare event, the likelihood score calculated for R_o [x, y] = r, A[x, y] — a may be unreliable. As described above, it may be reasonable to assume that the two reference frames R_o and f are observations from the same underlying statistics. Then, if the two reference frames are non-consecutive frames, it is also reasonable to assume that a set

P298570 14591766 1

-262018202801 23 Apr 2018 of reference frames R, consisting of the consecutive frames from R_o to R_± inclusive, are all observations from the same underlying statistics.

[00096] In another arrangement of the calculate likelihood score sub-step 604, an average reference frame R may be calculated from the set of reference frames R. The “averaging” is applied for each pixel location across the set of reference frames. The method of averaging may be a mean, median or mode. An advantage of performing the median or mode averaging is that these averages cannot produce a new pixel value that did not exist in at least one of the reference frames R. An advantage of performing the mode averaging is that it may be performed on labelled reference frames. Mean and median averaging cannot be performed on labelled reference frames, because labels do not have ordinality.

[00097] Then, in the present arrangement of the calculate likelihood score sub-step 604, the likelihood score is calculated for the current label a — A[x, y] in the current frame A, given a reference label r = R[x, y] in the average frame R. An advantage of the present arrangement is that because R is an averaged reference frame, each label /?[%, y] is less likely to be a rare event. Then, the likelihood score may be calculated from more reliable joint statistics 142.

[00098] As described above, the likelihood score (as determined by any of the above arrangements) determines whether a pixel position in the labelled current frame A corresponds to foreground (e.g., an object of interest) based on the turbulence colour change model (i.e., the joint statistics 142), colour at the pixel position in the labelled current frame A, and a colour change associated with the pixel position.

[00099] The sub-process 314 then proceeds from the calculate likelihood score sub-step 604 to a compare with threshold sub-step 606.

[000100] At the compare with threshold sub-step 606, the sub-process 314 compares the likelihood score S against a predetermined threshold Θ. The predetermined threshold Θ is set based on the measure used for calculating the likelihood score S, and a desired trade-off between false positives and false negatives. For example, if the conditional probability or the covariance score is used to calculate S, then the predetermined threshold may be set to Θ — 0.1. If the pointwise mutual information is used to calculate S, then the predetermined threshold may be set to Θ = —10. If the likelihood score is greater than the predetermined

P298570 14591766 1

-272018202801 23 Apr 2018 threshold: (S > Θ), then the current label a is considered to be well predicted by the joint statistics 142. Then, the sub-process 314 proceeds from the compare to threshold sub-step 606 to a mark pixel as background sub-step 608. Otherwise if (S < 0), the current label a is not well predicted by the joint statistics 142. Therefore, the current pixel location is assumed to be a moving object pixel. Then, the sub-process 314 proceeds from the compare to threshold sub-step 606 to a mark pixel as foreground sub-step 610.

[000101] At the mark pixel as background sub-step 608, the sub-process 314 marks the current pixel location as a background pixel in the foreground map 162. For convenience of notation, the foreground map 162 will be referred to as M. The foreground map 162 may be a binary-valued image, in which case the current pixel location may be marked as a background pixel by setting M[x, y] = 0. The sub-process 314 then proceeds from the mark pixel as background sub-step 608 to an end of image check sub-step 612.

[000102] At the mark pixel as foreground sub-step 610, the sub-process 314 marks the current pixel location as a foreground pixel in the foreground map 162, by setting M[x, y] — 1. The method 300 then proceeds from the mark pixel as foreground sub-step 610 to the end of image check sub-step 612.

[000103] At the end of image check sub-step 612, the sub-process 314 checks whether the pixel location (x, y) has reached the end of the labelled current frame A. For example, if the labelled current frame A is being traversed in raster scan order, the pixel location (x, y) has reached the end of the labelled current frame A if (x, y) is pointing at the bottom-right pixel of A. If the pixel location (x, y) has reached the end of the labelled current frame A, the sub-process 314 (and ultimately the method 300) outputs the foreground map 162, and then terminates. If the pixel location (x, y) has not reached the end of A, then the sub-process 314 proceeds from the end of image check sub-step 612 to an increment pixel location sub-step 614.

[000104] At the increment pixel location sub-step 614, the pixel location (x, y) is incremented in a predetermined scan order of the labelled current frame A. For example, if the labelled current frame A is being traversed in a raster scan order, then x is incremented by one, unless x has already reached the end of the image row; in such case, x is reset to zero, and y is incremented by one. The sub-process 314 then proceeds from the increment pixel location

P298570 14591766 1

-28 2018202801 23 Apr 2018 sub-step 614 to the calculate likelihood score 604. Therefore, the sub-process 314 repeats sub-steps 604 to 610 for each pixel of the labelled current frame A.

[000105] When all of the pixels in the labelled current frame A have been processed, the sub-process 314 (and the method 300) concludes.

[000106] In the above arrangements, the method 300 has been described with the reference frames R_o and and the current frame A, all being RGB frames from a video sequence. However, the video sequence may consist of hyper spectral data. For example, the video sequence may, in addition to the red, green and blue colour components, include an infra-red component. Then, in one arrangement of the generate colour to label map step 306, the colour to label map generator module 120 may take hyper spectral data, and assign the hyper spectral colours to labels. There is no change in the method 300, apart from a requirement to process higher dimensional data.

[000107] Assigning hyper spectral colours to labels implies the existence of a distance measure in the hyper spectral colour space. For example, if a clustering operation is performed on the hyper spectral data, the clustering depends on evaluating distance between data points and cluster centres. Then, imposing a distance measure in a colour space implies a relative weighting of importance between the colour components. In another arrangement of the method 300, the colour components may be separately assigned to labels. For example, let R_{O n} be the n^th colour component of the first reference frame, let Rln be the n^th colour component of the second reference frame, and let An be the n^th colour component of the current frame. Then, the generate colour to label map step 306 assigns “colours” corresponding to the n^th colour component to labels. The method 300 may be applied separately to each of the colour components, resulting in a number of foreground maps corresponding to the number of colour components. The foreground maps may be combined by a subsequent fusion post-processing step. For example, the fusion post-processing step may be implemented by a voting policy. For each pixel position of a fused foreground map, the pixel may be marked as a foreground pixel if a majority of the corresponding pixels in the foreground maps corresponding to the colour components are marked as foreground pixels. One advantage of the present arrangement is that if the relative importance of the colour components is not known a priori, the relative importance of the colour components can be

P298570 14591766 1

-292018202801 23 Apr 2018 determined at the fusion post-processing step. For example, important colour components may be assigned more votes.

[000108] The video sequence may be divided into arbitrary combinations of colour components. Let us define a “modality” as any combination of colour components, such as RGB, or infra-red. Then, in another arrangement of the method 300, let R_{o n} be the n^thmodality of the first reference frame, let Rl n be the n^th modality of the second reference frame, and let dn be the n^th modality of the current frame. Each modality may be processed separately by the method 300, and the resulting foreground maps may be combined by a subsequent fusion post-processing step, as in the above arrangement.

[000109] Another arrangement of the method 300 may operate on reference frames with different modality. For example, let R_{o RGB} represent a first reference frame in RGB modality, while R_1;IR represents a second reference frame in infra-red modality. Then, the first reference frame P_{o RGB} may be processed by the colour to label map generator module 120, producing a first colour to label map which maps RGB colours to labels, and the second reference frame R_{1 iR} may be processed by the colour to label map generator module 120, producing a second colour to label map which maps infrared colours to labels. At the map reference frames to labels step 308, the first reference frame is mapped to the first labelled reference frame R_ousing the first colour to label map, while the second reference frame is mapped to the second labelled reference frame R± using the second colour to label map.

[000110] After mapping the reference frames to labelled reference frames, the estimate joint statistics step 310 is unchanged. However, the dimensions of the joint histogram H_{R R1}correspond to different modalities. Therefore, in the example of Fig. 4, the diagonal 420 corresponding to co-occurrences of the “same colour” does not have meaning. However, the joint histogram still models the statistics of the background scene. For example, if a large object in the background scene has a colour label of l₀ in the first labelled reference frame R_oand a colour label of l_r in the second labelled reference frame R_r, we expect the corresponding histogram bin H_{Ro R} [Z_o, to have a large value.

[000111] Then in the present arrangement, the current frame A may be processed by the method 300 as long as the current frame A is in the same modality as the second reference P298570 14591766 1

-302018202801 23 Apr 2018 frame R^. If the current frame A is in the same modality as the second reference frame in the map current frame to labels step 312, the current frame A may be mapped to the labelled current frame A using the second colour to label map, which corresponds to the second reference frame R_t. Therefore, the reference label r — R_o [x, y] and the current label a = A [x, y] match the respective dimensions of the joint histogram, and the generate foreground map step 314 may proceed unchanged.

[000112] In yet another arrangement, let us define a “modality” as a particular lighting condition of the scene. Then the method 300 may operate on reference frames with different modality, where different modalities correspond to different lighting conditions. For example, the first reference frame R_o may be captured at noon, while the second reference frame R_rmay be captured at sunset. Then, in the present arrangement, the current frame A may be processed as long as the current frame A is in the same modality as the second reference frame R_r - for example, if the current frame A is also captured at sunset.

[000113] Turning back to the select reference frames step 304 of the method 300. In the arrangements of the select reference frames step 304 described above, methods are described for manually or automatically selecting reference frames from the registered video frames 108 or the original video frames 104. However, in some surveillance situations the scene may always contain moving objects. Then, rather than selecting complete reference frames, the select reference frames step 304 may select partial reference frames, where the pixel locations corresponding to moving objects are ignored.

[000114] In another arrangement of the select reference frames step 304 and the estimate joint statistics step 310, the candidate reference frames R_o and R_± may have already been processed by the method 300, resulting respectively in foreground maps M_o and M_r. Then, the foreground maps may be used as “feedback” to mask out any pixels that contain moving objects in either of the reference frames R_o or R_±. A set of background pixel locations X may be determined as:

(x, y) e X if M_o [x, y] = 0 and M_± [x, y] — 0 (1)

P298570 14591766 1

-31 2018202801 23 Apr 2018 [000115] That is, a pixel location is only included in the set X if the pixel location is labelled as a background pixel in both foreground maps M_o and Then, the joint histogram H_{Ro R1} may be computed over the set of background pixel locations X only:

H_Ro,_R1[l_o,k] = ^[^[vyj^jx.y]]

x.yeX [000116] An advantage of the present arrangement is that it allows the computation of the joint statistics 142 even when the candidate reference frames R_o and contain moving objects. However, the present arrangement cannot operate unless the method 300 has already processed the candidate reference frames. Therefore, if the scene always contains moving objects, there still remains a need for a “bootstrapping” method to obtain the set of background pixel locations X.

[000117] An issue with using methods such as image differencing, or Gaussian mixture models, for obtaining the set of background pixel locations X is that these methods may produce false positive moving object results in the presence of atmospheric turbulence. If the pixel locations of false positives due to turbulence are excluded from the set X, then the joint statistics 142 will not model the statistics of turbulence in the background scene. However, the estimate joint statistics step 310 may be resilient to false positives due to poor spatial resolution. For example, if a moving object detection method has poor accuracy around the border of moving objects, such a method can be modified to overestimate the size of moving objects. Then, the set of background pixel locations X will be smaller, but will still contain edge features in the background scene, which are important for the joint statistics 142 to model the statistics of turbulence in the background scene.

[000118] In another arrangement of the select reference frames step 304, the registered video frames 108 or the original video frames 104 may be processed by an optical flow method, producing optical flow fields between consecutive pairs of the registered video frames 108 or the original video frames 104. The optical flow fields may contain flows due to both motion and atmospheric turbulence. However, over a sufficiently long period of time, optical flow due to atmospheric turbulence may appear as a zero mean random process. In contrast, optical flow due to motion may be consistent in the direction of motion. Then, moving objects in the candidate reference frames R_o and R_L may be identified by an optical

P298570 14591766 1

-32 2018202801 23 Apr 2018 flow-based moving object tracking method. As described above with reference to Equation (1), the set of background pixels X may then be determined from foreground maps M_o and M_r produced by the optical flow-based moving object tracking method. Note that although the optical flow-based moving object tracking method may be patch based, or inaccurate at the borders of moving objects, the background pixels X produced are still sufficient to bootstrap the method 300. Additionally, while the tracking of optical flow vectors across successive frames may be computationally expensive, the cost of the optical flow-based method is only borne during the bootstrapping phase.

[000119] In another arrangement of the select reference frames step 304, the candidate reference frames R_o and R_L may be divided into corresponding regions. The candidate reference frames may be selected automatically, or manually by a human operator. Fig. 5 shows an example pair of candidate reference frames 500 and 510, with dotted lines indicating division into regions. The candidate reference frame 500 depicts a background scene with no moving objects, while the candidate reference frame 510 depicts the same background scene with a moving object 520. Then, in the example of Fig. 5, the regions 530 and 540 should be excluded from the set of background pixel locations X.

[000120] Let the i^th corresponding regions of the candidate reference frames Ro and Rr be represented by a set of pixel positions XL. Then for each Xt, the method 300 calculates a similarity score between the two corresponding regions of Ro [%, y] and R± [x, y] for (%, y) e Xi, which for convenience will be referred to as R^l0 and R{. In other arrangements of the select reference frames step 304, the calculation of similarity scores between the two candidate reference frames R_o and R_± has been described. However, regions are smaller and may contain too few pixels to perform low-pass filtering and still preserve discriminative features of the regions. Then, mutual information or correlation based measures sensitive to pixel-level variations may be ineffective. In the present arrangement, the method 300 may calculate a similarity score based on colour histogram distance. A colour histogram is computed over each of the regions:

x.yeX,

P298570_14591766_1

-33 2018202801 23 Apr 2018

x.yeXi where ό’_; [a] = if a — I otherwise [000121] Then, the similarity score may be a distance measure d [Ηχί,, calculated between the two colour histograms H _R^ and H_R^. For example, the similarity score may be the Li distance, or the cosine distance, or the Kullback-Liebler divergence. Then, the set of pixel positions X, are considered background pixels if the distance measure between the two colour histograms H_ri and is below a predetermined threshold s. The set of background pixel locations X is obtained by combining the regions identified as background pixels:

X = Xj Vi such d Hr{) < ^s ί

[000122] Colour histograms are only discriminative of the frequency of colour labels occurring within an analysed region, and not discriminative of the spatial position of colours. Thus, one advantage of the present arrangement is that the similarity score is resilient to the geometric distortion caused by atmospheric turbulence. However, if a moving object shifts position, but is still contained within the analysed region, the colour histogram would not be discriminative of the differences caused by the moving object. Then, to ensure the colour histogram is discriminative of moving objects, the regions may be set to a small size, and the time period between the candidate reference frames long enough for moving objects to traverse the entirety of a region.

[000123] In another arrangement of the estimate joint statistics step 310, the number of observations may be increased by partial volume interpolation. Rather than accumulating over each co-occurring pair of pixels in R_o and R₁, the joint histogram H_Ro>Ri is calculated by accumulating over each pair of pixels in R_o and R_r co-occurring within a neighbourhood. The size of the neighbourhood is controlled by an interpolation kernel w.

Hr₀,r₁ Uo, M = w[i,J] x 5_ίοΛ[R_o[x,y],R_±[x + i,y + ;]]

x.y i.j

P298570 14591766 1

-342018202801 23 Apr 2018 [000124] For example, the interpolation kernel may be a 5x5 kernel of unit weights defined over the support i E [—2,2],; E [—2,2]:

	T 1	1 1	1 1	1 1	1- 1
W =	1	1	1	1	1
	1	1	1	1	1
	-1	1	1	1	1-

[000125] Alternatively, the interpolation kernel may be a Gaussian, where the parameter σ is set equal to an expected strength of atmospheric turbulence in the scene i²+J² [000126] The effect of the interpolation kernel w is that the joint histogram H_{R R1} is also populated by pairs of pixels in R_o and R_± that are slightly perturbed spatially, rather than cooccurring. Then, the interpolation kernel w may mimic the geometric distortion introduced by atmospheric turbulence, where the support size of w is determined using the expected strength of atmospheric turbulence in the scene. The present arrangement increases the sampling of different-label pairings that would be caused by the modelled atmospheric turbulence, without requiring additional reference frames.

[000127] In another arrangement of the select reference frames step 304, the estimate joint statistics step 310, and the calculate likelihood score sub-step 604, the joint statistics 142 may be estimated by accumulating over more than one pair of reference frames. The method 300 may select K > 2 reference frames, resulting in a set of labelled reference frames R_o, R_lt.... R_R-i· Then, a set of pairs of labelled reference frames are chosen from the set of labelled reference frames. To limit complexity, the number of pairs of labelled reference frames chosen may be less than the total number of pairs possible. For example, if K = 4, then 6 unordered pairs of labelled reference frames, or 12 ordered pairs of labelled reference frames, are possible. To limit complexity, the chosen pairs of labelled reference frames may be {CR₀, R^, (R_lt /?₂),... (R_K-2, Rk-i)}· Then, in the present arrangement of the estimate joint statistics step 310, a joint histogram H may be accumulated over the chosen pairs of labelled reference frames. For the example policy of chosen pairs of labelled reference frames described above, the joint histogram H is calculated as:

P298570 14591766 1

-35 2018202801 23 Apr 2018

[000128] The joint histograms H_{R R} calculated from each pair of labelled reference frames may be computed to guarantee symmetry, as described in a previous arrangement of the estimate joint statistics step 310. Additionally, the joint histograms H_{R R} may be restricted to accumulation over corresponding sets of background pixel locations X. The corresponding sets of background pixel locations may be identified by feedback from prior foreground maps, or identified by region-based similarity matching, as described in previous arrangements of the select reference frames step 304. One advantage of this is that pixel locations that are ignored in one pair of reference frames due to the presence of moving objects, may be eligible as background pixel locations in another pair of reference frames. Therefore, over a set of pairs of labelled reference frames, there is a greater likelihood that all pixel locations will be observed and modelled by the joint statistics 142.

[000129] Then, rather than H_{Rq R} , in the present arrangement the joint histogram H is used to compute the joint and marginal probabilities. The likelihood score S is calculated from the join and marginal probabilities computed from H.

[000130] In the arrangements described above, it is assumed that the set of reference frames are all observations from the same underlying statistics. One situation where this assumption may not hold is if the scene lighting conditions change. Then, the set of reference frames should be chosen with the same scene lighting conditions, and with the same scene lighting conditions as observed in the current frame 108.

[000131] In one arrangement of the select reference frames step 304 and the estimate joint statistics step 310, a new set of reference frames is selected, and a new joint histogram H is calculated, for each new current frame. In another arrangement of the select reference frames step 304 and the estimate joint statistics step 310, the joint histogram H may be reused over multiple current frames. The new joint histogram H may be calculated only when the method 300 detects a change in scene lighting conditions.

[000132] In another arrangement of the select reference frames step 304 and the estimate joint statistics step 310, the joint histogram H may be periodically updated by a new set of P298570 14591766 1

-362018202801 23 Apr 2018 selected reference frames, and a new joint histogram H_new. The update may incorporate a decay factor a so that past statistics of the background in the past are gradually replaced by present statistics of the background:

H — aH T H_new

INDUSTRIAL APPLICABILITY [000133] The arrangements described are applicable to the computer and data processing industries and particularly for image processing.

[000134] The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.

[000135] In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of’. Variations of the word comprising, such as “comprise” and “comprises” have correspondingly varied meanings.

Claims

2018202801 23 Apr 2018

CLAIMS:

1. A method of determining an object of interest in a dynamic scene in presence of turbulence, the method comprising:

receiving a target image of the dynamic scene and at least two reference images capturing a static scene associated with the dynamic scene, wherein the target image comprises the object of interest;

determining a turbulence colour change model using the reference images, the turbulence colour change model defining colour change for pixels co-occurring in the reference images caused by the turbulence; and determining whether a pixel in the target image corresponds to the object of interest based on the turbulence colour change model, colour at said pixel position in the target image, and a colour change associated with said pixel.
2. The method of claim 1, wherein the reference images and the target image are from a sequence of video frames, and wherein the reference images are selected so that the reference images do not contain a moving object.
3. The method of claim 1, wherein the reference images are separated in time by a time period associated with a de-correlation time of the turbulence.
4. The method according to claim 3, wherein the de-correlation time of the turbulence is pre-determined for at least one observed atmospheric condition and an imaging distance.
5. The method according to claim 1, wherein the reference images are separated in time based on at least one observed atmospheric condition and an imaging distance.
6. The method of claim 2, wherein the reference images are selected if a similarity score calculated between the two reference images exceeds a similarity threshold.

P298570 14591766 1

-38 2018202801 23 Apr 2018
7. The method of claim 2, wherein the references images are selected based on a similarity score, wherein the similarity score is determined by applying a low-pass filter to the video frames based on a known strength of the turbulence.
8. The method of claim 1, wherein the turbulence colour change model further comprises a first value that indicates a joint probability of colours co-occurring in the reference images, and a second value indicating a marginal probability of the colours occurring in each of the reference images.
9. The method of claim 1, wherein the determining of whether a pixel in the target image corresponds to the object of interest based on the turbulence colour change model comprises:

determining a target pair of corresponding colours for a pixel position in the target image, the target pair comprising a colour at the pixel position in the target image and a colour at a corresponding pixel position in at least one of the reference images;

determining a likelihood score of the target pair of corresponding colours using the turbulence colour change model;

comparing the likelihood score with a predetermined threshold; and assigning the pixel position in the target image as part of the object of interest if the likelihood score is below the predetermined threshold.
10. The method of claim 1, wherein the turbulence colour change model is determined by using an interpolation kernel with a support size being based on an expected strength of the turbulence.
11. A system for determining an object of interest in a dynamic scene in presence of turbulence, the system comprising:

a display configured to display images of the dynamic scene;

memory coupled with the display configured to store the images; and a processor coupled with the memory configured to:

receive a target image of the dynamic scene and at least two reference images capturing a static scene associated with the dynamic scene, wherein the target image comprises the object of interest;

P298570 14591766 1

-392018202801 23 Apr 2018 determine a turbulence colour change model using the reference images, the turbulence colour change model defining colour change for pixels co-occurring in the reference images caused by the turbulence;

determine the object of interest in the target image based on the turbulence colour change model and colour changes associated with pixel positions in the target image;

process the target image based on the determined object of interest to form a processed target image with reduced effects of turbulence; and display the processed target image on the display.
12. The system of claim 11, wherein the determination of the object of interest in the target image comprises:

determining whether each of the pixels in the target image corresponds to the object of interest based on the turbulence colour change model and the colour change associated with said pixel.
13. The system of claim 11, wherein the images are from a sequence of video frames, and wherein the reference images are selected so that the reference images do not contain a moving object.
14. The system of claim 11, wherein the reference images are separated in time by a time period based on at least one observed atmospheric condition and an imaging distance.
15. The system of claim 13, wherein the processor is further configured to: determine a similarity score between two of the reference images, wherein the reference images are selected if the similarity score calculated between the two reference images exceeds a similarity threshold.
16. The system of claim 13, wherein the processor is further configured to:

select the references images based on a similarity score, wherein the similarity score is determined by applying a low-pass filter to the images based on a known strength of the turbulence.

P298570 14591766 1

-402018202801 23 Apr 2018
17. The system of claim 11, wherein the turbulence colour change model further comprises a first value for the reference images that indicates a joint probability of colours cooccurring in said reference images, and a second value for the reference images indicating a marginal probability the colours occurring in each of the reference images.
18. The system of claim 11, wherein the processor is further configured to:

determine a target pair of corresponding colours for a pixel position in the target image, the target pair comprising a colour at the pixel position in the target image and a colour at a corresponding pixel position in at least one of the reference images;

determine a likelihood score of the target pair of corresponding colours using the turbulence colour change model;

compare the likelihood score with a predetermined threshold; and assign the pixel position in the target image as part of the object of interest if the likelihood score is below the predetermined threshold.
19. The system of claim 10, wherein the processor is further configured to determine the turbulence colour change model by using an interpolation kernel having a support size determined using an expected strength of the turbulence.
20. A non-transitory computer readable medium comprising one or more software application programs that are executable by a processor, the one or more software application programs comprising a method of determining an object of interest in a dynamic scene in presence of turbulence, the method comprising:

receiving a target image of the dynamic scene and at least two reference images capturing a static scene associated with the dynamic scene, wherein the target image comprises the object of interest;

determining a turbulence colour change model using the reference images, the turbulence colour change model defining colour change for pixels co-occurring in the reference images caused by the turbulence; and determining whether a pixel in the target image corresponds to the object of interest based on the turbulence colour change model, colour at said pixel position in the target image, and a colour change associated with said pixel.

P298570 14591766 1

-41 2018202801 23 Apr 2018
21. A method of determining an object of interest in a target image of a scene, the method comprising:

receiving a target image of the scene and at least two reference images capturing a background of the scene, wherein the target image comprises the object of interest;

determining a turbulence colour change model using the reference images of the scene, the turbulence colour change model comprising a plurality of reference pairs of colours co-occurring in the reference images, wherein each reference pair being associated with a likelihood score;

determining a colour at a pixel position in the target image and a colour at a corresponding pixel position in at least one of the reference images to form a target pair of corresponding colours; and determining whether the pixel position in the target image corresponds to the object of interest using a likelihood score associated with the target pair determined based on the turbulence colour change model.