WO2016106595A1

WO2016106595A1 - Moving object detection in videos

Info

Publication number: WO2016106595A1
Application number: PCT/CN2014/095643
Authority: WO
Inventors: Xiaoli Li
Original assignee: Nokia Technologies Oy; Navteq (Shanghai) Trading Co., Ltd.
Priority date: 2014-12-30
Filing date: 2014-12-30
Publication date: 2016-07-07
Also published as: EP3241185A4; CN107209941A; EP3241185A1

Abstract

The present disclosure relates to moving object detection in videos. In one embodiment, a plurality of frames in a video are transformed to a high dimensional image space in a non-linear way. Then the background of the plurality of frames can be modeled in the high dimensional image space. The foreground or moving object can be detected in the plurality of frames based on the modeling of the background in the high dimensional image space. By use of the non-linear model which is more powerful for describing complex factors such as changing background, changing background, illumination variation, camera motion, noise and the like, embodiments of the present invention is more robust and accurate to detect moving objects under the complex situations.

Description

MOVING OBJECT DETECTION IN VIDEOS

FIELD

The present disclosure generally relates to video processing， and more specifically， to moving object detection in videos.

BACKGROUND

Detecting moving objects such as persons， automobiles and the like in the video plays an important role in video analysis such as intelligent video surveillance， traffic monitoring， vehicle navigation， and human-machine interaction. In the process of video analysis， the outcome of moving object detection can be input into the modules like object recognition， object tracking， behavior analysis or the like for further processing. Therefore， high performance of moving object detection is a key for successful video analysis.

In moving object detection， the detection of the background is a fundamental problem. In lot of conventional approaches for moving object detection in the videos， the detection accuracy is limited due to the changing background. More specifically， if the background of the video scene includes water ripples or waving trees， the detection of moving objects is prone to error. In addition， the illumination variation， camera motion， and/or other kinds of noises in the background may also put negative effects on the moving object detection. Due to the changes of the background， in the conventional solutions， parts of the background might be classified as moving objects， while parts of foreground might be classified as background.

SUMMARY

In general， embodiments of the present invention provide a solution for moving object detection in the videos.

In one aspect， one embodiment of the present invention provides a computer-implemented method. The method comprises： transforming a plurality of frames in a video from an initial image space to a high dimensional image space in a non-linear way； modeling background of the plurality of frames in the high dimensional image space； and detecting a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.

In another aspect， one embodiment of the present invention provides a computer-implemented apparatus. The apparatus comprises： an image transformer configured to transform a plurality of frames in a video from an initial image space to a high dimensional image space in a non-linear way； a modeler configured to model background of the plurality of frames in the high dimensional image space； and a moving object detector configured to detect a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.

Through the following description， it would be appreciated that in accordance with example embodiments of the present invention， the frames in the videos may be transformed into a very high dimensional image space. By use of the non-linear model which is more powerful for describing complex factors such as changing background， changing background， illumination variation， camera motion， noise and the like， embodiments of the present invention is more robust and accurate to detect moving objects under the complex situations. Additionally， embodiments of the present invention achieve less false alarms and high detection rate.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a flowchart of a method of detecting moving objects in a video according to one embodiment of the present invention；

FIGs. 2A-2C show the results of moving object detection obtained by a conventional approach and one embodiment of the present invention；

FIG. 3 shows a block diagram of an apparatus of detecting moving objects in a video according to one embodiment of the present invention； and

FIG. 4 shows a block diagram of an example computer system suitable for implementing example embodiments of the present invention.

Throughout the drawings， the same or corresponding reference symbols refer to the same or corresponding parts.

DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS

Example embodiments of the present invention will now be discussed with reference to several example implementations. It should be understood these implementations are discussed only for the purpose of enabling those skilled persons in the art to better understand and thus implement embodiments of the invention， rather than suggesting any limitations on the scope of the invention.

As used herein， the term “includes” and its variants are to be read as open terms that mean “includes， but is not limited to. ” The term “or” is to be read as “and/or” unless the context clearly indicates otherwise. The term “based on” is to be read as “based at least in part on. ” The term “one implementation” and “an implementation” are to be read as“at least one implementation. ” The term “another implementation” is to be read as “at least one other implementation. ” The terms “first， ” “second， ” “third” and the like may be used to refer to different or same objects. Other definitions， explicit and implicit， may be included below.

Traditionally， the background modeling in the moving object detection is done using a linear model. The underlying assumption of the linear model is that the background follows a Gaussian distribution. However， the inventors have found that this is usually not the case in practice. Therefore， the linear model is unable to fully describe the complex factors of changing background， illumination， camera motion， noise， and the like. Example embodiments of the present invention model the background of the frames in the videos using a non-linear model. By using a nonlinear model which is better than the linear one in the sense of describing the complex factors， the accuracy and performance of moving object detection in the videos can be improved.

In general， the non-linear modeling of the background is achieved by transforming or mapping the original frames or images of the video being processed into a higher dimensional space. By modeling the background of the transformed frame in that high dimensional image space， the non-linear modeling of the initial background can be done effectively and efficiently.

For the sake of discussion， a number of notations are defined as follows. The input of the moving object detection is a sequence of frames or images in the video， denoted as

where

represents vectorized image， n represents the number of pixels in a frame， T represents the number of frames being taken into consideration. In the following， the terms “image” and “frame” can be used interchangeably.

The goal is to find the positions of a moving object (s) or foreground in the frame x_t. In the context of the present disclosure， the terms “foreground” and “moving object” can be used interchangeably. In one embodiment， the position of foreground location is represented by a foreground-indicator vector s ∈ {0，1} ⁿ. The i-element of s is s_i which equals to either zero or one， where s_i＝1 means that the i pixel in the frame x_t is foreground， while s_i＝0 means that the i pixel in the frame x_t is background. That is，

The pixel value of the foreground can be determined according to the foreground-indicator vector：

where P_s represents a foreground-extract operator. For the sake of discussion， the foreground-extract operator can be expressed as

The pixel value of the background can also be determined according to the foreground-indicator vector：

where

represents a background-extract operator. For the sake of discussion， the background-extract operator can be expressed as

Now some example embodiments of the present invention will be discussed. Reference is first made to FIG. 1 which shows the flowchart of a method 100 of detecting moving object in a video. According to embodiments of the present invention， the video may be of any suitable format. The video may be compressed or encoded by any suitable technologies， either currently known or to be developed in the future.

As shown， the method 100 is entered at step 110， where a plurality of frames [x_t-T， x_t-T-1， ... ， x_t-2， x_t-1， x_t] in the video are transformed into a high dimensional image space in a non-linear way. According to embodiments of the present invention， the dimension m of the high dimensional image space can be very high. Theoretically， the dimension can be even infinite. For example， in one embodiment， the value of m can be selected such that m is much greater than the number of pixels in each frame. In this way， the non-linear correlations among the frames in the low dimensional image space can be better characterized and modeled.

In one embodiment， it is possible to use a non-linear transformation or mapping function， denoted as φ， in order to transform the frames. Any suitable mapping functions can be used in connection with embodiments of the present invention. Specifically， in one embodiment， the mapping function satisfying the Mercer’s theorem can be used to guarantee the compactness and convergence of the transform.

By applying the mapping function， the frames in the initial image space is transformed into the high dimensional image space， thereby obtaining a plurality of transformed frames [φ (x_t-T) ， … ， φ (x_t-1) ， φ (x_t) ] . Specifically， in one embodiment， by selecting the proper parameters for the mapping function φ (x) ， the transformed frames [φ (x_t-T) ， … ， φ (x_t-1) ， φ (x_t) ] may be linear and can thus be more easily described， which will be discussed below. However， it is to be understood that the transformed frames [φ(x_t-T) ， … ， φ (x_t-1) ， φ (x_t) ] are not necessarily linear in the high dimensional image space. The scope of the invention is not limited in this regard.

Specifically， in one embodiment， the frames can be transformed into the high dimensional image space without explicitly defining the mapping function. For example， in one embodiment， the transformed frames and the modeling thereof can be described by use of proper kernel functions. Example embodiment in this regard will be discussed below.

The method 100 then proceeds to step 120， where the background of the plurality of frames [x_t-T， x_t-T-1， ... ， x_t-2， x_t-1， x_t] is modeled in the high dimensional image space.

Traditionally， the background of the frames is assumed to follow the Gaussian distribution and therefore is modeled by a linear transformation matrix

where d represents the number of bases and u_i is the i-th base vector. In such convention approaches， the representation

of the background is given by：

where

represents the background， U′ represents the transposed matrix of U. As a result， background

is approximated as follows：

It can be seen from equations (4) and (5) that the relationship between background and the base vectors is always linear. However， it is possible that the frames [x_t-T， x_t-T-1， ... ， x_t-2， x_t-1， x_t] are not linear， for example， if there is changing background， illumination， camera motion， noise， or the like. Experiments of the inventors have found that the conventional linear model is not robust to describe such complex factors. Inaccurate background modeling in turn degrades the detection rate of the moving objects in the foreground.

On the contrary， according to embodiments of the present invention， the initial frames are transformed into the high dimensional image space at step 110 and modeled in the image space with a very high dimension at step 120. As such， the non-linear modeling of the background of the initial frames is achieved. Along this line， the correlations of the frames can be better characterized to thereby identify the background and foreground (moving objects) more accurately.

Specifically， the transformed frames may be linear in the high dimensional image space in one embodiment， as described above. In this embodiment， the base vector u_j may be calculated as a linear sum of the background of the transformed frames as follows：

where

represents the background part of the transfonned frames in the high dimensional image space，and where

represents a coefficient. For the sake of discussion，

is defined. That is， in this embodiment， the non-linear modeling of background of the flames is achieved by modeling or approximating the background of the transformed flames using a linear model in the high dimensional image space.

These base vectors u_j together form a linear transformation matrix

Thus， the background

of each transformed flame in the high dimensional image space can be represented as follows：

where

represents the coefficient vector.

It is to be understood that modeling the background of the transformed frame using a linear model in me high dimensional image space would be beneficial in terms of operation efriciency and computation complexity. However， this is not necessarily required. In an alternative embodiment， the background of the transformed frames can be approximated using any non-linear model in the high dimensional image space.

Still wim reference to FIG. 1， the method 100 proceeds to step 130 where one or more moving objects (foreground) are detected based on the modeling of the background of the frames in the high dimensional image space.

In one embodiment， at step 130， an objective function can be defined based on the modeling at step 120. More specifically， the objective function at least characterizes the error in the modeling or approximation of background of the frames. By way of example， in the embodiment where the background of me transformed frames are modeled in a linear way， the objective function may be defined as follows：

Substituting equation (6) into equation (8) yields：

In some embodiments， one or more other relevant factors can be used in the objective function. For example， in one embodiment， the area of the foreground (moving object) may be taken into consideration. In general， it is desired that the area of the moving object in each frame is below a predefined threshold because a too large moving objection would probably means inaccurate detection. In one embodiment， the area term can be given by：

where ||·||₁ represents the one-norm operator.

Additionally or alternatively， in one embodiment， the connectivity of the moving object across the plurality of frames can be considered. It would be appreciated that the trajectory of a moving object is usually continuous between two consecutive frames. In order to measure the object connectivity， in one embodiment， the connectivity may be defined as follows：

where N (i) is the set of neighbors of the pixel i.

In one embodiment， the modeling error， foreground area and the connectivity can be combined together to define the objective function as follows：

L＝L_background+βL_area+γL_connectivity， (12)

where β and γ represent weights and can be set depending on specific requirements and use cases. By substituting equations (9) ， (10) and (11) into equation (12) ， the objective function is expressed as：

It is to be understood that the objective function shown in equation (13) is discussed merely for the purpose of illustration， without suggesting any limitations to the scope of the invention. In other embodiment， any additional or alternative factors may be used in the objective function. Moreover， as described above， it is possible to simply use the approximation error L_background as the objective function.

In one embodiment， the background of the frames can be detected by minimizing the objective function. To this end， in one embodiment， it is possible to directly solve the foreground-indicator vector s， coefficient

and the low-dimensional representation y_i that can minimize the objective function L. In practice， however， it is sometimes difficult to directly find the optimal solutions. In order to improve the efficiency and reduce computation complexity， in one embodiment， the kernel functions associated with the high dimensional image space can be used to solve this optimization problem.

More specifically， given the objective function such as the one shown in equation (13) ， the goal is to solve

when s and y_i are fixed. In one embodiment， the Kernel Principal Component Analysis (KPCA) can be used to accomplish this task. A T×T kernel matrix K is defined， in which the ij-th element is denoted by a kernel function

k_ij＝k (x_i， x_j) . (14)

The kernel function k (x_i， x_j) can be in any form as long as the resulting kernel matrix K is semi-definite. In one embodiment， an example of the kernel function is shown as follows：

where σ is parameter which can be selected empirically. It is to be understood that the kernel functions shown in equation (15) is given merely for the purpose of illustration， without suggesting any limitations as to the scope of the invention. In other embodiments， any suitable kernel functions such as gaussian kernel function， radial basis function， and the like can be used as well.

In one embodiment， the optimization of the objective function can be achieved by solving the following eigen-decomposition problem：

Kα＝λα， (16)

where λ and α represent the eigenvalue and eigenvector， respectively. It would be appreciated that there are totally d eigenvalues λ₁， ... ， λ_d. In one embodiment， the eigenvalues may be sorted in ascending order， such that λ₁＞λ₂＞…λ_d. The eigenvector α_i corresponds to the eigenvalues λ_i. The j entry of α_i is

Given the s and

the low-dimensional version of

denoted as y_i， can be solved. That is， y_i is the background of the initial frames in the low dimensional image space. In one embodiment， y_i is expressed as follows：

Substituting equation (6) into equation (17) yields：

It can be seen from equation (18) that y_i can be determined by using the kernel function， without explicitly defining or applying the mapping function.

Next， in the case that y_i and

are fixed， s can be solved. More specifically， it can be determined from equation (13) that the j-th element of y_i is

Therefore， in one embodiment， the background approximation error L_background is written as：

where

Equation (19) can be calculated by the kernel function because

Therefore， equation (19) can be formulated as terms in the form of kernel function

as follows：

where c_ij represents a coefficient and C represents a constant independent to the data. As such， at least a part of the objective function (that is， the approximation error L_background) and the background of the frames are associated using the kernel functions.

In one embodiment， as described above， the foreground and background parts in the frames can be identified or indicated by the foreground indicator s which is defined in equation (1) ， for example. Based on equation (21) ， in one embodiment， it is possible to express the objective function at least in part by the foreground indicator. That is， by means of the kernel functions， the objective function can be associated with the foreground indicator related to each pixel in each of the plurality of frames， where the foreground indicator indicates whether the related pixel belongs to the moving object (foreground) .

For the sake of discussion， suppose that the kernel function is in the form of equation (15) . By use of the Taylor expansion， the kernel function can be approximated by：

where x_iz represents the value of pixel z in frame i. By substituting equation (22) into equation (23) ， L_background as defined in equation (19) can be expressed as follows：

where

It would be appreciated that C′ is a constant and thus can be removed from equation (24) . As a result， L_background is expressed as：

For the sake of discussion， suppose that the objective function is in the form of equation (13) . That is， in addition to L_background， the objective function also includes the terms related to the area and connectivity of the moving object (s) . Based on equations (25) and (13) ， the objective function L can be written as：

where s_z is the z entry of s. It can be seen that equation (26) is in a standard form of graph cuts. In one embodiment， by use of the well-known algorithm of graph cuts， the optimal solution s can be efficiently obtained.

In one embodiment where the kernel functions are used， the method 100 can be implemented by the pseudo code shown in the following table.

Table 1

It is to be understood that the pseudo code in Table 1 is given only for the purpose of illustration， without suggesting any limitations as to the scope of the invention. Various modifications or variations are possible in practice.

By use of the non-linear model which is more powerful for describing complex factors such as changing background (e.g.， water ripples and waving trees) ， illumination variation， camera motion， noise and the like， embodiments of the present invention is more robust and accurate to detect moving objects under the complex situations. The proposed approach achieves less false alarms and high detection rate.

FIGs. 2A-2C show an example of moving object detection. FIG. 2A shows frame in a video which has dynamic rain. FIG. 2B is the result of a conventional approach of moving object detection. It can be seen that in FIG. 2B， the spring is incorrectly classified as moving objects. On the contrary， in the result obtained by one embodiment of the present invention as shown in FIG. 2C， the spring was removed from the foreground and the moving person is correctly detected.

FIG. 3 shows a block diagram of a computer-implemented apparatus for moving object detection according to one embodiment of the present invention. As shown， the apparatus 300 comprises an image transformer 310 configured to transform a plurality of frames in a video from an initial image space to a high dimensional image space in a non-linear way； a modeler 320 configured to model background of the plurality of frames in the high dimensional image space； and a moving object detector 330 configured to detect a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.

In one embodiment， the dimension of the high dimensional image space is greater than the number of pixels in each of the plurality of frames.

In one embodiment， the modeler 320 may comprise a non-linear modeler 325 configured to model background of a plurality of transformed frames using a linear model in the high dimensional image space， the plurality of transformed frames obtained by transforming the plurality of frames in the non-linear way.

In one embodiment， the apparatus 300 may further comprise an objective function controller 340 configured to determine an objective function characterizing an error of the modeling of the background of the plurality of frames. In this embodiment， the moving object detector 330 is configured to detect the moving object based on the objective function.

In one embodiment， the objective function may further characterize at least one of： areas of the moving object in the plurality of frames， and connectivity of the moving object across the plurality of frames.

In one embodiment， the apparatus 300 may further comprise a kernel function controller 350 configured to determine a set of kernel functions associated with the high dimensional image space. In this embodiment， the objective function controller 340 is configured to associate at least a part of the objective function and the background of the plurality of frames using the set of kernel functions， and the moving object detector 330 is configured to detect the moving object by minimizing the objective function.

In one embodiment， the objective function controller 340 is configured to associate the objective function with a foreground indicator related to each pixel in each of the plurality of frames using the set of kernel functions， where the foreground indicator indicates whether the related pixel belongs to the moving object.

FIG. 4 shows a block diagram of an example computer system 400 suitable for implementing example embodiments of the present invention. The computer system 400 can be a fixed type machine such as a desktop personal computer (PC) ， a server， a mainframe， or the like. Alternatively， the computer system 400 can be a mobile type machine such as a mobile phone， tablet PC， laptop， intelligent phone， personal digital assistance (PDA) ， or the like.

As shown， the computer system 400 comprises a processor such as a central processing unit (CPU) 401 which is capable of performing various processes in accordance with a program stored in a read only memory (ROM) 402 or a program loaded from a storage unit 408 to a random access memory (RAM) 403. In the RAM 403， data required when the CPU 401 performs the various processes or the like is also stored as required. The CPU 401， the ROM 402 and the RAM 403 are connected to one another via a bus 404. An input/output (I/O) interface 405 is also connected to the bus 404.

The following components are connected to the I/O interface 405： an input unit 406 including a keyboard， a mouse， or the like； an output unit 407 including a display such as a cathode ray tube (CRT) ， aliquid crystal display (LCD) ， or the like， and a loudspeaker or the like； the storage unit 408 including a hard disk or the like； and a communication unit 409 including a network interface card such as a LAN card， a modem， or the like. The communication unit 409 performs a communication process via the network such as the internet. A drive 410 is also connected to the I/O interface 405 as required. A removable medium 411， such as a magnetic disk， an optical disk， a magneto-optical disk， a semiconductor memory， or the like， is mounted on the drive 410 as required， so that a computer program read therefrom is installed into the storage unit 408 as required.

Specifically， in accordance with example embodiments of the present invention， the processes described above with reference to FIG. 1 and Table 1 may be implemented by computer program. For example， embodiments of the present invention comprise a computer program product including a computer program tangibly embodied on a machine readable medium， the computer program including program code for performing the method 100 and/or the pseudo code shown in Table 1. In such embodiments， the computer program may be downloaded and mounted from the network via the communication unit 409， and/or installed from the removable medium 411.

The functionally described herein can be performed， at least in part， by one or more hardware logic components. For example， and without limitation， illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs) ， Application-specific Integrated Circuits (ASICs) ， Application-specific Standard Products (ASSPs) ， System-on-a-chip systems (SOCs) ， Complex Programmable Logic Devices (CPLDs) ， and the like.

Various embodiments of the invention may be implemented in hardware or special purpose circuits， software， logic or any combination thereof. Some aspects may be implemented in hardware， while other aspects may be implemented in firmware or software which may be executed by a controller， microprocessor or other computing device. While various aspects of embodiments of the present invention are illustrated and described as block diagrams， flowcharts， or using some other pictorial representation， it will be appreciated that the blocks， apparatus， systems， techniques or methods described herein may be implemented in， as non-limiting examples， hardware， software， firmware， special purpose circuits or logic， general purpose hardware or controller or other computing devices， or some combination thereof.

By way of example， embodiments of the present invention can be described in the general context of machine-executable instructions， such as those included in program modules， being executed in a device on a target real or virtual processor. Generally， program modules include routines， programs， libraries， objects， classes， components， data structures， or the like that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various implementations. Machine-executable instructions for program modules may be executed within a local or distributed device. In a distributed device， program modules may be located in both local and remote storage media.

Program code for carrying out methods of the invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer， special purpose computer， or other programmable data processing apparatus， such that the program codes， when executed by the processor or controller， cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may execute entirely on a machine， partly on the machine， as a stand-alone software package， partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure， a machine readable medium may be any tangible medium that may contain， or store a program for use by or in connection with an instruction execution system， apparatus， or device. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may include but not limited to an electronic， magnetic， optical， electromagnetic， infrared， or semiconductor system， apparatus， or device， or any suitable combination of the foregoing. More specific examples of the machine readable storage medium would include an electrical connection having one or more wires， a portable computer diskette， a hard disk， a random access memory (RAM) ， a read-only memory (ROM) ， an erasable programmable read-only memory (EPROM or Flash memory) ， an optical fiber， a portable compact disc read-only memory (CD-ROM) ， an optical storage device， a magnetic storage device， or any suitable combination of the foregoing.

Further， while operations are depicted in a particular order， this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order， or that all illustrated operations be performed， to achieve desirable results. In certain circumstances， multitasking and parallel processing may be advantageous. Likewise， while several specific implementation details are contained in the above discussions， these should not be construed as limitations on the scope of the the present invention， but rather as descriptions of features that may be specific to particular implementations. Certain features that are described in the context of separate implementations may also be implemented in combination in a single implementation. Conversely， various features that are described in the context of a single implementation may also be implemented in multiple implementations separately or in any suitable sub-combination

Although the invention has been described in language specific to structural features and/or methodological acts， it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather， the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

A computer-implemented method comprising：

transforming a plurality of frames in a video to a high dimensional image space in a non-linear way；

modeling background of the plurality of frames in the high dimensional image space； and

detecting a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
The method of claim 1， wherein a dimension of the high dimensional image space is greater than the number of pixels in each of the plurality of frames.
The method of claim 1， wherein the modeling background of the plurality of frames in the high dimensional image space comprises：

modeling background of a plurality of transformed frames using a linear model in the high dimensional image space， the plurality of transformed frames obtained by transforming the plurality of frames in the non-linear way.
The method of claim 1， wherein the detecting a moving object in the plurality of frames comprises：

determining an objective function characterizing an error of the modeling of the background of the plurality of frames； and

detecting the moving object based on the objective function.
The method of claim 4， wherein the objective function further characterizes at least one of：

areas of the moving object in the plurality of frames， and

connectivity o f the moving object across the plurality o f frames.
The method of claim 4， wherein the detecting the moving object based on the objective function comprises：

determining a set of kernel functions associated with the high dimensional image space；

associating at least a part of the objective function and the background of the plurality of frames using the set of kernel functions； and

detecting the moving object by minimizing the objective function.
The method of claim 6， wherein the associating at least a part of the objective function and the background of the plurality of frames using the set of kernel functions comprises：

associating the objective function with a foreground indicator related to each pixel in each of the plurality of frames using the set of kernel functions， the foreground indicator indicating whether the related pixel belongs to the moving object.
A computer-implemented apparatus comprising：

an image transformer configured to transform a plurality of frames in a video to a high dimensional image space in a non-linear way；

a modeler configured to model background of the plurality of frames in the high dimensional image space； and

a moving object detector configured to detect a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
The apparatus of claim 8， wherein a dimension of the high dimensional image space is greater than the number of pixels in each of the plurality of frames.
The apparatus of claim 8， wherein the modeler comprises：

a non-linear modeler configured to model background of a plurality of transformed frames using a linear model in the high dimensional image space， the plurality of transformed frames obtained by transforming the plurality of frames in the non-linear way.
The apparatus of claim 8， further comprising：

an objective function controller configured to determine an objective function characterizing an error of the modeling of the background of the plurality of frames，

wherein the moving object detector is configured to detect the moving object based on the objective function.
The apparatus of claim 11， wherein the objective function further characterizes at least one of：

areas of the moving object in the plurality of frames， and

connectivity of the moving object across the plurality of frames.
The apparatus of claim 11， further comprising：

a kernel function controller configured to determine a set of kernel functions associated with the high dimensional image space，

wherein the objective function controller is configured to associate at least a part of the objective function and the background of the plurality of frames using the set of kernel functions，

and wherein the moving object detector is configured to detect the moving object by minimizing the objective function.
The apparatus of claim 13， wherein the objective function controller is configured to associate the objective function with a foreground indicator related to each pixel in each of the plurality of frames using the set of kernel functions， the foreground indicator indicating whether the related pixel belongs to the moving object.
A device comprising：

a processor； and

a memory including computer-executable instructions which， when executed by the processor， cause the device to carry out the method of any one of claims 1 to 7.
A computer program product being tangibly stored on a non-transient computer-readable medium and comprising machine executable instructions which， when executed， cause the machine to perform steps of the method according to any one of claims 1 to 7.