WO2016106595A1 - Moving object detection in videos - Google Patents

Moving object detection in videos Download PDF

Info

Publication number
WO2016106595A1
WO2016106595A1 PCT/CN2014/095643 CN2014095643W WO2016106595A1 WO 2016106595 A1 WO2016106595 A1 WO 2016106595A1 CN 2014095643 W CN2014095643 W CN 2014095643W WO 2016106595 A1 WO2016106595 A1 WO 2016106595A1
Authority
WO
WIPO (PCT)
Prior art keywords
frames
moving object
background
objective function
dimensional image
Prior art date
Application number
PCT/CN2014/095643
Other languages
French (fr)
Inventor
Xiaoli Li
Original Assignee
Nokia Technologies Oy
Navteq (Shanghai) Trading Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy, Navteq (Shanghai) Trading Co., Ltd. filed Critical Nokia Technologies Oy
Priority to CN201480084456.9A priority Critical patent/CN107209941A/en
Priority to EP14909406.2A priority patent/EP3241185A4/en
Priority to PCT/CN2014/095643 priority patent/WO2016106595A1/en
Publication of WO2016106595A1 publication Critical patent/WO2016106595A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/262Analysis of motion using transform domain methods, e.g. Fourier domain methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/215Motion-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory

Definitions

  • the present disclosure generally relates to video processing, and more specifically, to moving object detection in videos.
  • Detecting moving objects such as persons, automobiles and the like in the video plays an important role in video analysis such as intelligent video surveillance, traffic monitoring, vehicle navigation, and human-machine interaction.
  • video analysis the outcome of moving object detection can be input into the modules like object recognition, object tracking, behavior analysis or the like for further processing. Therefore, high performance of moving object detection is a key for successful video analysis.
  • the detection of the background is a fundamental problem.
  • the detection accuracy is limited due to the changing background. More specifically, if the background of the video scene includes water ripples or waving trees, the detection of moving objects is prone to error.
  • the illumination variation, camera motion, and/or other kinds of noises in the background may also put negative effects on the moving object detection. Due to the changes of the background, in the conventional solutions, parts of the background might be classified as moving objects, while parts of foreground might be classified as background.
  • embodiments of the present invention provide a solution for moving object detection in the videos.
  • one embodiment of the present invention provides a computer-implemented method.
  • the method comprises: transforming a plurality of frames in a video from an initial image space to a high dimensional image space in a non-linear way; modeling background of the plurality of frames in the high dimensional image space; and detecting a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
  • one embodiment of the present invention provides a computer-implemented apparatus.
  • the apparatus comprises: an image transformer configured to transform a plurality of frames in a video from an initial image space to a high dimensional image space in a non-linear way; a modeler configured to model background of the plurality of frames in the high dimensional image space; and a moving object detector configured to detect a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
  • the frames in the videos may be transformed into a very high dimensional image space.
  • the non-linear model which is more powerful for describing complex factors such as changing background, changing background, illumination variation, camera motion, noise and the like
  • embodiments of the present invention is more robust and accurate to detect moving objects under the complex situations. Additionally, embodiments of the present invention achieve less false alarms and high detection rate.
  • FIG. 1 shows a flowchart of a method of detecting moving objects in a video according to one embodiment of the present invention
  • FIGs. 2A-2C show the results of moving object detection obtained by a conventional approach and one embodiment of the present invention
  • FIG. 3 shows a block diagram of an apparatus of detecting moving objects in a video according to one embodiment of the present invention.
  • FIG. 4 shows a block diagram of an example computer system suitable for implementing example embodiments of the present invention.
  • the term “includes” and its variants are to be read as open terms that mean “includes, but is not limited to. ”
  • the term “or” is to be read as “and/or” unless the context clearly indicates otherwise.
  • the term “based on” is to be read as “based at least in part on. ”
  • the term “one implementation” and “an implementation” are to be read as“at least one implementation. ”
  • the term “another implementation” is to be read as “at least one other implementation. ”
  • the terms “first, ” “second, ” “third” and the like may be used to refer to different or same objects. Other definitions, explicit and implicit, may be included below.
  • Example embodiments of the present invention model the background of the frames in the videos using a non-linear model.
  • a nonlinear model which is better than the linear one in the sense of describing the complex factors, the accuracy and performance of moving object detection in the videos can be improved.
  • the non-linear modeling of the background is achieved by transforming or mapping the original frames or images of the video being processed into a higher dimensional space.
  • the non-linear modeling of the initial background can be done effectively and efficiently.
  • the input of the moving object detection is a sequence of frames or images in the video, denoted as where represents vectorized image, n represents the number of pixels in a frame, T represents the number of frames being taken into consideration.
  • image and “frame” can be used interchangeably.
  • the goal is to find the positions of a moving object (s) or foreground in the frame x t .
  • the terms “foreground” and “moving object” can be used interchangeably.
  • the position of foreground location is represented by a foreground-indicator vector s ⁇ ⁇ 0,1 ⁇ n .
  • the pixel value of the foreground can be determined according to the foreground-indicator vector:
  • P s represents a foreground-extract operator.
  • the foreground-extract operator can be expressed as The pixel value of the background can also be determined according to the foreground-indicator vector:
  • FIG. 1 shows the flowchart of a method 100 of detecting moving object in a video.
  • the video may be of any suitable format.
  • the video may be compressed or encoded by any suitable technologies, either currently known or to be developed in the future.
  • the method 100 is entered at step 110, where a plurality of frames [x t-T , x t-T-1 , ... , x t-2 , x t-1 , x t ] in the video are transformed into a high dimensional image space in a non-linear way.
  • the dimension m of the high dimensional image space can be very high. Theoretically, the dimension can be even infinite.
  • the value of m can be selected such that m is much greater than the number of pixels in each frame. In this way, the non-linear correlations among the frames in the low dimensional image space can be better characterized and modeled.
  • mapping function denoted as ⁇
  • any suitable mapping functions can be used in connection with embodiments of the present invention.
  • the mapping function satisfying the Mercer’s theorem can be used to guarantee the compactness and convergence of the transform.
  • the frames in the initial image space is transformed into the high dimensional image space, thereby obtaining a plurality of transformed frames [ ⁇ (x t-T ) , ... , ⁇ (x t-1 ) , ⁇ (x t ) ] .
  • the transformed frames [ ⁇ (x t-T ) , ... , ⁇ (x t-1 ) , ⁇ (x t ) ] may be linear and can thus be more easily described, which will be discussed below.
  • the transformed frames [ ⁇ (x t-T ) , ... , ⁇ (x t-1 ) , ⁇ (x t ) ] are not necessarily linear in the high dimensional image space.
  • the scope of the invention is not limited in this regard.
  • the frames can be transformed into the high dimensional image space without explicitly defining the mapping function.
  • the transformed frames and the modeling thereof can be described by use of proper kernel functions. Example embodiment in this regard will be discussed below.
  • step 120 the background of the plurality of frames [x t-T , x t-T-1 , ... , x t-2 , x t-1 , x t ] is modeled in the high dimensional image space.
  • the background of the frames is assumed to follow the Gaussian distribution and therefore is modeled by a linear transformation matrix where d represents the number of bases and u i is the i-th base vector.
  • d represents the number of bases
  • u i is the i-th base vector.
  • the initial frames are transformed into the high dimensional image space at step 110 and modeled in the image space with a very high dimension at step 120.
  • the non-linear modeling of the background of the initial frames is achieved.
  • the correlations of the frames can be better characterized to thereby identify the background and foreground (moving objects) more accurately.
  • the transformed frames may be linear in the high dimensional image space in one embodiment, as described above.
  • the base vector u j may be calculated as a linear sum of the background of the transformed frames as follows:
  • the non-linear modeling of background of the flames is achieved by modeling or approximating the background of the transformed flames using a linear model in the high dimensional image space.
  • modeling the background of the transformed frame using a linear model in me high dimensional image space would be beneficial in terms of operation efriciency and computation complexity. However, this is not necessarily required.
  • the background of the transformed frames can be approximated using any non-linear model in the high dimensional image space.
  • step 130 one or more moving objects (foreground) are detected based on the modeling of the background of the frames in the high dimensional image space.
  • an objective function can be defined based on the modeling at step 120. More specifically, the objective function at least characterizes the error in the modeling or approximation of background of the frames.
  • the objective function may be defined as follows:
  • the area of the foreground may be taken into consideration.
  • the area of the moving object in each frame is below a predefined threshold because a too large moving objection would probably means inaccurate detection.
  • the area term can be given by:
  • the connectivity of the moving object across the plurality of frames can be considered. It would be appreciated that the trajectory of a moving object is usually continuous between two consecutive frames.
  • the connectivity may be defined as follows:
  • N (i) is the set of neighbors of the pixel i.
  • the modeling error, foreground area and the connectivity can be combined together to define the objective function as follows:
  • the background of the frames can be detected by minimizing the objective function.
  • the foreground-indicator vector s, coefficient and the low-dimensional representation y i that can minimize the objective function L.
  • the kernel functions associated with the high dimensional image space can be used to solve this optimization problem.
  • KPCA Kernel Principal Component Analysis
  • the kernel function k (x i , x j ) can be in any form as long as the resulting kernel matrix K is semi-definite.
  • an example of the kernel function is shown as follows:
  • is parameter which can be selected empirically. It is to be understood that the kernel functions shown in equation (15) is given merely for the purpose of illustration, without suggesting any limitations as to the scope of the invention. In other embodiments, any suitable kernel functions such as gaussian kernel function, radial basis function, and the like can be used as well.
  • the optimization of the objective function can be achieved by solving the following eigen-decomposition problem:
  • ⁇ and ⁇ represent the eigenvalue and eigenvector, respectively. It would be appreciated that there are totally d eigenvalues ⁇ 1 , ... , ⁇ d . In one embodiment, the eigenvalues may be sorted in ascending order, such that ⁇ 1 > ⁇ 2 >... ⁇ d .
  • the eigenvector ⁇ i corresponds to the eigenvalues ⁇ i .
  • the j entry of ⁇ i is
  • y i is the background of the initial frames in the low dimensional image space.
  • y i is expressed as follows:
  • Equation (19) can be calculated by the kernel function because
  • equation (19) can be formulated as terms in the form of kernel function as follows:
  • the foreground and background parts in the frames can be identified or indicated by the foreground indicator s which is defined in equation (1) , for example.
  • the objective function at least in part by the foreground indicator. That is, by means of the kernel functions, the objective function can be associated with the foreground indicator related to each pixel in each of the plurality of frames, where the foreground indicator indicates whether the related pixel belongs to the moving object (foreground) .
  • the kernel function is in the form of equation (15) .
  • the kernel function can be approximated by:
  • Equation (19) L background as defined in equation (19) can be expressed as follows:
  • the objective function is in the form of equation (13) . That is, in addition to L background , the objective function also includes the terms related to the area and connectivity of the moving object (s) . Based on equations (25) and (13) , the objective function L can be written as:
  • equation (26) is in a standard form of graph cuts.
  • the optimal solution s can be efficiently obtained.
  • the method 100 can be implemented by the pseudo code shown in the following table.
  • embodiments of the present invention is more robust and accurate to detect moving objects under the complex situations.
  • the proposed approach achieves less false alarms and high detection rate.
  • FIGs. 2A-2C show an example of moving object detection.
  • FIG. 2A shows frame in a video which has dynamic rain.
  • FIG. 2B is the result of a conventional approach of moving object detection. It can be seen that in FIG. 2B, the spring is incorrectly classified as moving objects. On the contrary, in the result obtained by one embodiment of the present invention as shown in FIG. 2C, the spring was removed from the foreground and the moving person is correctly detected.
  • FIG. 3 shows a block diagram of a computer-implemented apparatus for moving object detection according to one embodiment of the present invention.
  • the apparatus 300 comprises an image transformer 310 configured to transform a plurality of frames in a video from an initial image space to a high dimensional image space in a non-linear way; a modeler 320 configured to model background of the plurality of frames in the high dimensional image space; and a moving object detector 330 configured to detect a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
  • the dimension of the high dimensional image space is greater than the number of pixels in each of the plurality of frames.
  • the modeler 320 may comprise a non-linear modeler 325 configured to model background of a plurality of transformed frames using a linear model in the high dimensional image space, the plurality of transformed frames obtained by transforming the plurality of frames in the non-linear way.
  • the apparatus 300 may further comprise an objective function controller 340 configured to determine an objective function characterizing an error of the modeling of the background of the plurality of frames.
  • the moving object detector 330 is configured to detect the moving object based on the objective function.
  • the objective function may further characterize at least one of: areas of the moving object in the plurality of frames, and connectivity of the moving object across the plurality of frames.
  • the apparatus 300 may further comprise a kernel function controller 350 configured to determine a set of kernel functions associated with the high dimensional image space.
  • the objective function controller 340 is configured to associate at least a part of the objective function and the background of the plurality of frames using the set of kernel functions, and the moving object detector 330 is configured to detect the moving object by minimizing the objective function.
  • the objective function controller 340 is configured to associate the objective function with a foreground indicator related to each pixel in each of the plurality of frames using the set of kernel functions, where the foreground indicator indicates whether the related pixel belongs to the moving object.
  • FIG. 4 shows a block diagram of an example computer system 400 suitable for implementing example embodiments of the present invention.
  • the computer system 400 can be a fixed type machine such as a desktop personal computer (PC) , a server, a mainframe, or the like.
  • the computer system 400 can be a mobile type machine such as a mobile phone, tablet PC, laptop, intelligent phone, personal digital assistance (PDA) , or the like.
  • PC personal computer
  • PDA personal digital assistance
  • the computer system 400 comprises a processor such as a central processing unit (CPU) 401 which is capable of performing various processes in accordance with a program stored in a read only memory (ROM) 402 or a program loaded from a storage unit 408 to a random access memory (RAM) 403.
  • a processor such as a central processing unit (CPU) 401 which is capable of performing various processes in accordance with a program stored in a read only memory (ROM) 402 or a program loaded from a storage unit 408 to a random access memory (RAM) 403.
  • ROM read only memory
  • RAM random access memory
  • data required when the CPU 401 performs the various processes or the like is also stored as required.
  • the CPU 401, the ROM 402 and the RAM 403 are connected to one another via a bus 404.
  • An input/output (I/O) interface 405 is also connected to the bus 404.
  • the following components are connected to the I/O interface 405: an input unit 406 including a keyboard, a mouse, or the like; an output unit 407 including a display such as a cathode ray tube (CRT) , aliquid crystal display (LCD) , or the like, and a loudspeaker or the like; the storage unit 408 including a hard disk or the like; and a communication unit 409 including a network interface card such as a LAN card, a modem, or the like. The communication unit 409 performs a communication process via the network such as the internet.
  • a drive 410 is also connected to the I/O interface 405 as required.
  • a removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 410 as required, so that a computer program read therefrom is installed into the storage unit 408 as required.
  • embodiments of the present invention comprise a computer program product including a computer program tangibly embodied on a machine readable medium, the computer program including program code for performing the method 100 and/or the pseudo code shown in Table 1.
  • the computer program may be downloaded and mounted from the network via the communication unit 409, and/or installed from the removable medium 411.
  • the functionally described herein can be performed, at least in part, by one or more hardware logic components.
  • illustrative types of hardware logic components include Field-programmable Gate Arrays (FPGAs) , Application-specific Integrated Circuits (ASICs) , Application-specific Standard Products (ASSPs) , System-on-a-chip systems (SOCs) , Complex Programmable Logic Devices (CPLDs) , and the like.
  • Various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While various aspects of embodiments of the present invention are illustrated and described as block diagrams, flowcharts, or using some other pictorial representation, it will be appreciated that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • embodiments of the present invention can be described in the general context of machine-executable instructions, such as those included in program modules, being executed in a device on a target real or virtual processor.
  • program modules include routines, programs, libraries, objects, classes, components, data structures, or the like that perform particular tasks or implement particular abstract data types.
  • the functionality of the program modules may be combined or split between program modules as desired in various implementations.
  • Machine-executable instructions for program modules may be executed within a local or distributed device. In a distributed device, program modules may be located in both local and remote storage media.
  • Program code for carrying out methods of the invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented.
  • the program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine readable medium may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine readable medium may be a machine readable signal medium or a machine readable storage medium.
  • a machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • machine readable storage medium More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CD-ROM portable compact disc read-only memory
  • magnetic storage device or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to moving object detection in videos. In one embodiment, a plurality of frames in a video are transformed to a high dimensional image space in a non-linear way. Then the background of the plurality of frames can be modeled in the high dimensional image space. The foreground or moving object can be detected in the plurality of frames based on the modeling of the background in the high dimensional image space. By use of the non-linear model which is more powerful for describing complex factors such as changing background, changing background, illumination variation, camera motion, noise and the like, embodiments of the present invention is more robust and accurate to detect moving objects under the complex situations.

Description

MOVING OBJECT DETECTION IN VIDEOS FIELD
The present disclosure generally relates to video processing, and more specifically, to moving object detection in videos.
BACKGROUND
Detecting moving objects such as persons, automobiles and the like in the video plays an important role in video analysis such as intelligent video surveillance, traffic monitoring, vehicle navigation, and human-machine interaction. In the process of video analysis, the outcome of moving object detection can be input into the modules like object recognition, object tracking, behavior analysis or the like for further processing. Therefore, high performance of moving object detection is a key for successful video analysis.
In moving object detection, the detection of the background is a fundamental problem. In lot of conventional approaches for moving object detection in the videos, the detection accuracy is limited due to the changing background. More specifically, if the background of the video scene includes water ripples or waving trees, the detection of moving objects is prone to error. In addition, the illumination variation, camera motion, and/or other kinds of noises in the background may also put negative effects on the moving object detection. Due to the changes of the background, in the conventional solutions, parts of the background might be classified as moving objects, while parts of foreground might be classified as background.
SUMMARY
In general, embodiments of the present invention provide a solution for moving object detection in the videos.
In one aspect, one embodiment of the present invention provides a computer-implemented method. The method comprises: transforming a plurality of frames in a video from an initial image space to a high dimensional image space in a non-linear way; modeling background of the plurality of frames in the high dimensional  image space; and detecting a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
In another aspect, one embodiment of the present invention provides a computer-implemented apparatus. The apparatus comprises: an image transformer configured to transform a plurality of frames in a video from an initial image space to a high dimensional image space in a non-linear way; a modeler configured to model background of the plurality of frames in the high dimensional image space; and a moving object detector configured to detect a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
Through the following description, it would be appreciated that in accordance with example embodiments of the present invention, the frames in the videos may be transformed into a very high dimensional image space. By use of the non-linear model which is more powerful for describing complex factors such as changing background, changing background, illumination variation, camera motion, noise and the like, embodiments of the present invention is more robust and accurate to detect moving objects under the complex situations. Additionally, embodiments of the present invention achieve less false alarms and high detection rate.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows a flowchart of a method of detecting moving objects in a video according to one embodiment of the present invention;
FIGs. 2A-2C show the results of moving object detection obtained by a conventional approach and one embodiment of the present invention;
FIG. 3 shows a block diagram of an apparatus of detecting moving objects in a video according to one embodiment of the present invention; and
FIG. 4 shows a block diagram of an example computer system suitable for implementing example embodiments of the present invention.
Throughout the drawings, the same or corresponding reference symbols refer to  the same or corresponding parts.
DETAILED DESCRIPTION OF EXAMPLE EMBODIMENTS
Example embodiments of the present invention will now be discussed with reference to several example implementations. It should be understood these implementations are discussed only for the purpose of enabling those skilled persons in the art to better understand and thus implement embodiments of the invention, rather than suggesting any limitations on the scope of the invention.
As used herein, the term “includes” and its variants are to be read as open terms that mean “includes, but is not limited to. ” The term “or” is to be read as “and/or” unless the context clearly indicates otherwise. The term “based on” is to be read as “based at least in part on. ” The term “one implementation” and “an implementation” are to be read as“at least one implementation. ” The term “another implementation” is to be read as “at least one other implementation. ” The terms “first, ” “second, ” “third” and the like may be used to refer to different or same objects. Other definitions, explicit and implicit, may be included below.
Traditionally, the background modeling in the moving object detection is done using a linear model. The underlying assumption of the linear model is that the background follows a Gaussian distribution. However, the inventors have found that this is usually not the case in practice. Therefore, the linear model is unable to fully describe the complex factors of changing background, illumination, camera motion, noise, and the like. Example embodiments of the present invention model the background of the frames in the videos using a non-linear model. By using a nonlinear model which is better than the linear one in the sense of describing the complex factors, the accuracy and performance of moving object detection in the videos can be improved.
In general, the non-linear modeling of the background is achieved by transforming or mapping the original frames or images of the video being processed into a higher dimensional space. By modeling the background of the transformed frame in that high dimensional image space, the non-linear modeling of the initial background can be done effectively and efficiently.
For the sake of discussion, a number of notations are defined as follows. The  input of the moving object detection is a sequence of frames or images in the video, denoted as 
Figure PCTCN2014095643-appb-000001
where 
Figure PCTCN2014095643-appb-000002
represents vectorized image, n represents the number of pixels in a frame, T represents the number of frames being taken into consideration. In the following, the terms “image” and “frame” can be used interchangeably.
The goal is to find the positions of a moving object (s) or foreground in the frame xt. In the context of the present disclosure, the terms “foreground” and “moving object” can be used interchangeably. In one embodiment, the position of foreground location is represented by a foreground-indicator vector s ∈ {0,1} n. The i-element of s is si which equals to either zero or one, where si=1 means that the i pixel in the frame xt is foreground, while si=0 means that the i pixel in the frame xt is background. That is,
Figure PCTCN2014095643-appb-000003
The pixel value of the foreground can be determined according to the foreground-indicator vector:
Figure PCTCN2014095643-appb-000004
where Ps represents a foreground-extract operator. For the sake of discussion, the foreground-extract operator can be expressed as
Figure PCTCN2014095643-appb-000005
The pixel value of the background can also be determined according to the foreground-indicator vector:
Figure PCTCN2014095643-appb-000006
where
Figure PCTCN2014095643-appb-000007
represents a background-extract operator. For the sake of discussion, the background-extract operator can be expressed as
Figure PCTCN2014095643-appb-000008
Now some example embodiments of the present invention will be discussed. Reference is first made to FIG. 1 which shows the flowchart of a method 100 of detecting moving object in a video. According to embodiments of the present invention, the video  may be of any suitable format. The video may be compressed or encoded by any suitable technologies, either currently known or to be developed in the future.
As shown, the method 100 is entered at step 110, where a plurality of frames [xt-T, xt-T-1, ... , xt-2, xt-1, xt] in the video are transformed into a high dimensional image space in a non-linear way. According to embodiments of the present invention, the dimension m of the high dimensional image space can be very high. Theoretically, the dimension can be even infinite. For example, in one embodiment, the value of m can be selected such that m is much greater than the number of pixels in each frame. In this way, the non-linear correlations among the frames in the low dimensional image space can be better characterized and modeled.
In one embodiment, it is possible to use a non-linear transformation or mapping function, denoted as φ, in order to transform the frames. Any suitable mapping functions can be used in connection with embodiments of the present invention. Specifically, in one embodiment, the mapping function satisfying the Mercer’s theorem can be used to guarantee the compactness and convergence of the transform.
By applying the mapping function, the frames in the initial image space is transformed into the high dimensional image space, thereby obtaining a plurality of transformed frames [φ (xt-T) , … , φ (xt-1) , φ (xt) ] . Specifically, in one embodiment, by selecting the proper parameters for the mapping function φ (x) , the transformed frames [φ (xt-T) , … , φ (xt-1) , φ (xt) ] may be linear and can thus be more easily described, which will be discussed below. However, it is to be understood that the transformed frames [φ(xt-T) , … , φ (xt-1) , φ (xt) ] are not necessarily linear in the high dimensional image space. The scope of the invention is not limited in this regard.
Specifically, in one embodiment, the frames can be transformed into the high dimensional image space without explicitly defining the mapping function. For example, in one embodiment, the transformed frames and the modeling thereof can be described by use of proper kernel functions. Example embodiment in this regard will be discussed below.
The method 100 then proceeds to step 120, where the background of the plurality of frames [xt-T, xt-T-1, ... , xt-2, xt-1, xt] is modeled in the high dimensional image  space.
Traditionally, the background of the frames is assumed to follow the Gaussian distribution and therefore is modeled by a linear transformation matrix 
Figure PCTCN2014095643-appb-000009
where d represents the number of bases and ui is the i-th base vector. In such convention approaches, the representation
Figure PCTCN2014095643-appb-000010
of the background is given by:
Figure PCTCN2014095643-appb-000011
where 
Figure PCTCN2014095643-appb-000012
represents the background, U′ represents the transposed matrix of U. As a result, background
Figure PCTCN2014095643-appb-000013
is approximated as follows:
Figure PCTCN2014095643-appb-000014
It can be seen from equations (4) and (5) that the relationship between background and the base vectors is always linear. However, it is possible that the frames [xt-T, xt-T-1, ... , xt-2, xt-1, xt] are not linear, for example, if there is changing background, illumination, camera motion, noise, or the like. Experiments of the inventors have found that the conventional linear model is not robust to describe such complex factors. Inaccurate background modeling in turn degrades the detection rate of the moving objects in the foreground.
On the contrary, according to embodiments of the present invention, the initial frames are transformed into the high dimensional image space at step 110 and modeled in the image space with a very high dimension at step 120. As such, the non-linear modeling of the background of the initial frames is achieved. Along this line, the correlations of the frames can be better characterized to thereby identify the background and foreground (moving objects) more accurately.
Specifically, the transformed frames may be linear in the high dimensional image space in one embodiment, as described above. In this embodiment, the base vector uj may be calculated as a linear sum of the background of the transformed frames as follows:
Figure PCTCN2014095643-appb-000015
where
Figure PCTCN2014095643-appb-000016
represents the background part of the transfonned frames in the high dimensional image space,and where
Figure PCTCN2014095643-appb-000017
represents a coefficient. For the sake of discussion,
Figure PCTCN2014095643-appb-000018
is defined. That is, in this embodiment, the non-linear modeling of background of the flames is achieved by modeling or approximating the background of the transformed flames using a linear model in the high dimensional image space.
These base vectors uj together form a linear transformation matrix 
Figure PCTCN2014095643-appb-000019
Thus, the background
Figure PCTCN2014095643-appb-000020
of each transformed flame in the high dimensional image space can be represented as follows:
Figure PCTCN2014095643-appb-000021
where
Figure PCTCN2014095643-appb-000022
represents the coefficient vector.
It is to be understood that modeling the background of the transformed frame using a linear model in me high dimensional image space would be beneficial in terms of operation efriciency and computation complexity. However, this is not necessarily required. In an alternative embodiment, the background of the transformed frames can be approximated using any non-linear model in the high dimensional image space.
Still wim reference to FIG. 1, the method 100 proceeds to step 130 where one or more moving objects (foreground) are detected based on the modeling of the background of the frames in the high dimensional image space.
In one embodiment, at step 130, an objective function can be defined based on the modeling at step 120. More specifically, the objective function at least characterizes the error in the modeling or approximation of background of the frames. By way of example, in the embodiment where the background of me transformed frames are modeled in a linear way, the objective function may be defined as follows:
Figure PCTCN2014095643-appb-000023
Substituting equation (6) into equation (8) yields:
Figure PCTCN2014095643-appb-000024
In some embodiments, one or more other relevant factors can be used in the objective function. For example, in one embodiment, the area of the foreground (moving object) may be taken into consideration. In general, it is desired that the area of the moving object in each frame is below a predefined threshold because a too large moving objection would probably means inaccurate detection. In one embodiment, the area term can be given by:
Figure PCTCN2014095643-appb-000025
where ||·||1 represents the one-norm operator.
Additionally or alternatively, in one embodiment, the connectivity of the moving object across the plurality of frames can be considered. It would be appreciated that the trajectory of a moving object is usually continuous between two consecutive frames. In order to measure the object connectivity, in one embodiment, the connectivity may be defined as follows:
Figure PCTCN2014095643-appb-000026
where N (i) is the set of neighbors of the pixel i.
In one embodiment, the modeling error, foreground area and the connectivity can be combined together to define the objective function as follows:
L=Lbackground+βLarea+γLconnectivity,  (12)
where β and γ represent weights and can be set depending on specific requirements and use cases. By substituting equations (9) , (10) and (11) into equation (12) , the objective function is expressed as:
Figure PCTCN2014095643-appb-000027
It is to be understood that the objective function shown in equation (13) is discussed merely for the purpose of illustration, without suggesting any limitations to the scope of the invention. In other embodiment, any additional or alternative factors may be used in the objective function. Moreover, as described above, it is possible to simply use the approximation error Lbackground as the objective function.
In one embodiment, the background of the frames can be detected by minimizing the objective function. To this end, in one embodiment, it is possible to directly solve the foreground-indicator vector s, coefficient
Figure PCTCN2014095643-appb-000028
and the low-dimensional representation yi that can minimize the objective function L. In practice, however, it is sometimes difficult to directly find the optimal solutions. In order to improve the efficiency and reduce computation complexity, in one embodiment, the kernel functions associated with the high dimensional image space can be used to solve this optimization problem.
More specifically, given the objective function such as the one shown in equation (13) , the goal is to solve
Figure PCTCN2014095643-appb-000029
when s and yi are fixed. In one embodiment, the Kernel Principal Component Analysis (KPCA) can be used to accomplish this task. A T×T kernel matrix K is defined, in which the ij-th element is denoted by a kernel function
kij=k (xi, xj) .   (14)
The kernel function k (xi, xj) can be in any form as long as the resulting kernel matrix K is semi-definite. In one embodiment, an example of the kernel function is shown as follows:
Figure PCTCN2014095643-appb-000030
where σ is parameter which can be selected empirically. It is to be understood that the kernel functions shown in equation (15) is given merely for the purpose of illustration, without suggesting any limitations as to the scope of the invention. In other embodiments, any suitable kernel functions such as gaussian kernel function, radial basis function, and the like can be used as well.
In one embodiment, the optimization of the objective function can be achieved by solving the following eigen-decomposition problem:
Kα=λα,   (16)
where λ and α represent the eigenvalue and eigenvector, respectively. It would be appreciated that there are totally d eigenvalues λ1, ... , λd. In one embodiment, the eigenvalues may be sorted in ascending order, such that λ1>λ2>…λd. The eigenvector αi corresponds to the eigenvalues λi. The j entry of αi is 
Figure PCTCN2014095643-appb-000031
Given the s and
Figure PCTCN2014095643-appb-000032
the low-dimensional version of
Figure PCTCN2014095643-appb-000033
denoted as yi, can be solved. That is, yi is the background of the initial frames in the low dimensional image space. In one embodiment, yi is expressed as follows:
Figure PCTCN2014095643-appb-000034
Substituting equation (6) into equation (17) yields:
Figure PCTCN2014095643-appb-000035
It can be seen from equation (18) that yi can be determined by using the kernel function, without explicitly defining or applying the mapping function.
Next, in the case that yi and
Figure PCTCN2014095643-appb-000036
are fixed, s can be solved. More  specifically, it can be determined from equation (13) that the j-th element of yi is 
Figure PCTCN2014095643-appb-000037
Therefore, in one embodiment, the background approximation error Lbackground is written as:
Figure PCTCN2014095643-appb-000038
where
Figure PCTCN2014095643-appb-000039
Equation (19) can be calculated by the kernel function because
Figure PCTCN2014095643-appb-000040
Therefore, equation (19) can be formulated as terms in the form of kernel function 
Figure PCTCN2014095643-appb-000041
as follows:
Figure PCTCN2014095643-appb-000042
where cij represents a coefficient and C represents a constant independent to the data. As such, at least a part of the objective function (that is, the approximation error Lbackground) and the background of the frames are associated using the kernel functions.
In one embodiment, as described above, the foreground and background parts in the frames can be identified or indicated by the foreground indicator s which is defined in equation (1) , for example. Based on equation (21) , in one embodiment, it is possible  to express the objective function at least in part by the foreground indicator. That is, by means of the kernel functions, the objective function can be associated with the foreground indicator related to each pixel in each of the plurality of frames, where the foreground indicator indicates whether the related pixel belongs to the moving object (foreground) .
For the sake of discussion, suppose that the kernel function is in the form of equation (15) . By use of the Taylor expansion, the kernel function can be approximated by:
Figure PCTCN2014095643-appb-000043
where xiz represents the value of pixel z in frame i. By substituting equation (22) into equation (23) , Lbackground as defined in equation (19) can be expressed as follows:
Figure PCTCN2014095643-appb-000044
where
Figure PCTCN2014095643-appb-000045
It would be appreciated that C′ is a constant and thus can be removed from equation (24) . As a result, Lbackground is expressed as:
Figure PCTCN2014095643-appb-000046
For the sake of discussion, suppose that the objective function is in the form of equation (13) . That is, in addition to Lbackground, the objective function also includes the terms related to the area and connectivity of the moving object (s) . Based on equations (25) and (13) , the objective function L can be written as:
Figure PCTCN2014095643-appb-000047
where sz is the z entry of s. It can be seen that equation (26) is in a standard form of graph cuts. In one embodiment, by use of the well-known algorithm of graph cuts, the optimal solution s can be efficiently obtained.
In one embodiment where the kernel functions are used, the method 100 can be implemented by the pseudo code shown in the following table.
Figure PCTCN2014095643-appb-000048
Figure PCTCN2014095643-appb-000049
Table 1
It is to be understood that the pseudo code in Table 1 is given only for the purpose of illustration, without suggesting any limitations as to the scope of the invention. Various modifications or variations are possible in practice.
By use of the non-linear model which is more powerful for describing complex factors such as changing background (e.g., water ripples and waving trees) , illumination variation, camera motion, noise and the like, embodiments of the present invention is more robust and accurate to detect moving objects under the complex situations. The proposed approach achieves less false alarms and high detection rate.
FIGs. 2A-2C show an example of moving object detection. FIG. 2A shows frame in a video which has dynamic rain. FIG. 2B is the result of a conventional approach of moving object detection. It can be seen that in FIG. 2B, the spring is incorrectly classified as moving objects. On the contrary, in the result obtained by one embodiment of the present invention as shown in FIG. 2C, the spring was removed from the foreground and the moving person is correctly detected.
FIG. 3 shows a block diagram of a computer-implemented apparatus for moving object detection according to one embodiment of the present invention. As shown, the apparatus 300 comprises an image transformer 310 configured to transform a plurality of frames in a video from an initial image space to a high dimensional image space in a non-linear way; a modeler 320 configured to model background of the plurality of frames in the high dimensional image space; and a moving object detector 330 configured to detect a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
In one embodiment, the dimension of the high dimensional image space is greater than the number of pixels in each of the plurality of frames.
In one embodiment, the modeler 320 may comprise a non-linear modeler 325 configured to model background of a plurality of transformed frames using a linear model in the high dimensional image space, the plurality of transformed frames obtained by transforming the plurality of frames in the non-linear way.
In one embodiment, the apparatus 300 may further comprise an objective function controller 340 configured to determine an objective function characterizing an error of the modeling of the background of the plurality of frames. In this embodiment, the moving object detector 330 is configured to detect the moving object based on the objective function.
In one embodiment, the objective function may further characterize at least one of: areas of the moving object in the plurality of frames, and connectivity of the moving object across the plurality of frames.
In one embodiment, the apparatus 300 may further comprise a kernel function controller 350 configured to determine a set of kernel functions associated with the high dimensional image space. In this embodiment, the objective function controller 340 is configured to associate at least a part of the objective function and the background of the plurality of frames using the set of kernel functions, and the moving object detector 330 is configured to detect the moving object by minimizing the objective function.
In one embodiment, the objective function controller 340 is configured to associate the objective function with a foreground indicator related to each pixel in each of the plurality of frames using the set of kernel functions, where the foreground indicator  indicates whether the related pixel belongs to the moving object.
FIG. 4 shows a block diagram of an example computer system 400 suitable for implementing example embodiments of the present invention. The computer system 400 can be a fixed type machine such as a desktop personal computer (PC) , a server, a mainframe, or the like. Alternatively, the computer system 400 can be a mobile type machine such as a mobile phone, tablet PC, laptop, intelligent phone, personal digital assistance (PDA) , or the like.
As shown, the computer system 400 comprises a processor such as a central processing unit (CPU) 401 which is capable of performing various processes in accordance with a program stored in a read only memory (ROM) 402 or a program loaded from a storage unit 408 to a random access memory (RAM) 403. In the RAM 403, data required when the CPU 401 performs the various processes or the like is also stored as required. The CPU 401, the ROM 402 and the RAM 403 are connected to one another via a bus 404. An input/output (I/O) interface 405 is also connected to the bus 404.
The following components are connected to the I/O interface 405: an input unit 406 including a keyboard, a mouse, or the like; an output unit 407 including a display such as a cathode ray tube (CRT) , aliquid crystal display (LCD) , or the like, and a loudspeaker or the like; the storage unit 408 including a hard disk or the like; and a communication unit 409 including a network interface card such as a LAN card, a modem, or the like. The communication unit 409 performs a communication process via the network such as the internet. A drive 410 is also connected to the I/O interface 405 as required. A removable medium 411, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 410 as required, so that a computer program read therefrom is installed into the storage unit 408 as required.
Specifically, in accordance with example embodiments of the present invention, the processes described above with reference to FIG. 1 and Table 1 may be implemented by computer program. For example, embodiments of the present invention comprise a computer program product including a computer program tangibly embodied on a machine readable medium, the computer program including program code for performing the method 100 and/or the pseudo code shown in Table 1. In such embodiments, the computer program may be downloaded and mounted from the network via the  communication unit 409, and/or installed from the removable medium 411.
The functionally described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs) , Application-specific Integrated Circuits (ASICs) , Application-specific Standard Products (ASSPs) , System-on-a-chip systems (SOCs) , Complex Programmable Logic Devices (CPLDs) , and the like.
Various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While various aspects of embodiments of the present invention are illustrated and described as block diagrams, flowcharts, or using some other pictorial representation, it will be appreciated that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
By way of example, embodiments of the present invention can be described in the general context of machine-executable instructions, such as those included in program modules, being executed in a device on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, or the like that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various implementations. Machine-executable instructions for program modules may be executed within a local or distributed device. In a distributed device, program modules may be located in both local and remote storage media.
Program code for carrying out methods of the invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes,  when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine readable medium may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the the present invention, but rather as descriptions of features that may be specific to particular implementations. Certain features that are described in the context of separate implementations may also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation may also be implemented in multiple implementations separately or in any suitable sub-combination
Although the invention has been described in language specific to structural features and/or methodological acts, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or acts described above.  Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (16)

  1. A computer-implemented method comprising:
    transforming a plurality of frames in a video to a high dimensional image space in a non-linear way;
    modeling background of the plurality of frames in the high dimensional image space; and
    detecting a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
  2. The method of claim 1, wherein a dimension of the high dimensional image space is greater than the number of pixels in each of the plurality of frames.
  3. The method of claim 1, wherein the modeling background of the plurality of frames in the high dimensional image space comprises:
    modeling background of a plurality of transformed frames using a linear model in the high dimensional image space, the plurality of transformed frames obtained by transforming the plurality of frames in the non-linear way.
  4. The method of claim 1, wherein the detecting a moving object in the plurality of frames comprises:
    determining an objective function characterizing an error of the modeling of the background of the plurality of frames; and
    detecting the moving object based on the objective function.
  5. The method of claim 4, wherein the objective function further characterizes at least one of:
    areas of the moving object in the plurality of frames, and
    connectivity o f the moving object across the plurality o f frames.
  6. The method of claim 4, wherein the detecting the moving object based on the objective function comprises:
    determining a set of kernel functions associated with the high dimensional image space;
    associating at least a part of the objective function and the background of the plurality of frames using the set of kernel functions; and
    detecting the moving object by minimizing the objective function.
  7. The method of claim 6, wherein the associating at least a part of the objective function and the background of the plurality of frames using the set of kernel functions comprises:
    associating the objective function with a foreground indicator related to each pixel in each of the plurality of frames using the set of kernel functions, the foreground indicator indicating whether the related pixel belongs to the moving object.
  8. A computer-implemented apparatus comprising:
    an image transformer configured to transform a plurality of frames in a video to a high dimensional image space in a non-linear way;
    a modeler configured to model background of the plurality of frames in the high dimensional image space; and
    a moving object detector configured to detect a moving object in the plurality of frames based on the modeling of the background of the plurality of frames in the high dimensional image space.
  9. The apparatus of claim 8, wherein a dimension of the high dimensional image space is greater than the number of pixels in each of the plurality of frames.
  10. The apparatus of claim 8, wherein the modeler comprises:
    a non-linear modeler configured to model background of a plurality of transformed frames using a linear model in the high dimensional image space, the plurality of transformed frames obtained by transforming the plurality of frames in the non-linear way.
  11. The apparatus of claim 8, further comprising:
    an objective function controller configured to determine an objective function characterizing an error of the modeling of the background of the plurality of frames,
    wherein the moving object detector is configured to detect the moving object based on the objective function.
  12. The apparatus of claim 11, wherein the objective function further characterizes at least one of:
    areas of the moving object in the plurality of frames, and
    connectivity of the moving object across the plurality of frames.
  13. The apparatus of claim 11, further comprising:
    a kernel function controller configured to determine a set of kernel functions associated with the high dimensional image space,
    wherein the objective function controller is configured to associate at least a part of the objective function and the background of the plurality of frames using the set of kernel functions,
    and wherein the moving object detector is configured to detect the moving object by minimizing the objective function.
  14. The apparatus of claim 13, wherein the objective function controller is configured to associate the objective function with a foreground indicator related to each pixel in each of the plurality of frames using the set of kernel functions, the foreground indicator indicating whether the related pixel belongs to the moving object.
  15. A device comprising:
    a processor; and
    a memory including computer-executable instructions which, when executed by the processor, cause the device to carry out the method of any one of claims 1 to 7.
  16. A computer program product being tangibly stored on a non-transient computer-readable medium and comprising machine executable instructions which, when executed, cause the machine to perform steps of the method according to any one of claims 1 to 7.
PCT/CN2014/095643 2014-12-30 2014-12-30 Moving object detection in videos WO2016106595A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201480084456.9A CN107209941A (en) 2014-12-30 2014-12-30 Mobile object detection in video
EP14909406.2A EP3241185A4 (en) 2014-12-30 2014-12-30 Moving object detection in videos
PCT/CN2014/095643 WO2016106595A1 (en) 2014-12-30 2014-12-30 Moving object detection in videos

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/095643 WO2016106595A1 (en) 2014-12-30 2014-12-30 Moving object detection in videos

Publications (1)

Publication Number Publication Date
WO2016106595A1 true WO2016106595A1 (en) 2016-07-07

Family

ID=56283871

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/095643 WO2016106595A1 (en) 2014-12-30 2014-12-30 Moving object detection in videos

Country Status (3)

Country Link
EP (1) EP3241185A4 (en)
CN (1) CN107209941A (en)
WO (1) WO2016106595A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859302A (en) * 2017-11-29 2019-06-07 西门子保健有限责任公司 The compression of optical transport matrix senses
CN113591840A (en) * 2021-06-30 2021-11-02 北京旷视科技有限公司 Target detection method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101303763A (en) * 2007-12-26 2008-11-12 公安部上海消防研究所 Method for amplifying image based on rarefaction representation
CN103324955A (en) * 2013-06-14 2013-09-25 浙江智尔信息技术有限公司 Pedestrian detection method based on video processing
CN103500454A (en) * 2013-08-27 2014-01-08 东莞中国科学院云计算产业技术创新与育成中心 Method for extracting moving target of shaking video
US20140232862A1 (en) * 2012-11-29 2014-08-21 Xerox Corporation Anomaly detection using a kernel-based sparse reconstruction model
CN104200485A (en) * 2014-07-10 2014-12-10 浙江工业大学 Video-monitoring-oriented human body tracking method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103489199B (en) * 2012-06-13 2016-08-24 通号通信信息集团有限公司 video image target tracking processing method and system
CN104113789B (en) * 2014-07-10 2017-04-12 杭州电子科技大学 On-line video abstraction generation method based on depth learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101303763A (en) * 2007-12-26 2008-11-12 公安部上海消防研究所 Method for amplifying image based on rarefaction representation
US20140232862A1 (en) * 2012-11-29 2014-08-21 Xerox Corporation Anomaly detection using a kernel-based sparse reconstruction model
CN103324955A (en) * 2013-06-14 2013-09-25 浙江智尔信息技术有限公司 Pedestrian detection method based on video processing
CN103500454A (en) * 2013-08-27 2014-01-08 东莞中国科学院云计算产业技术创新与育成中心 Method for extracting moving target of shaking video
CN104200485A (en) * 2014-07-10 2014-12-10 浙江工业大学 Video-monitoring-oriented human body tracking method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3241185A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109859302A (en) * 2017-11-29 2019-06-07 西门子保健有限责任公司 The compression of optical transport matrix senses
CN113591840A (en) * 2021-06-30 2021-11-02 北京旷视科技有限公司 Target detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
EP3241185A4 (en) 2018-07-25
CN107209941A (en) 2017-09-26
EP3241185A1 (en) 2017-11-08

Similar Documents

Publication Publication Date Title
Postels et al. Sampling-free epistemic uncertainty estimation using approximated variance propagation
CN108230357B (en) Key point detection method and device, storage medium and electronic equipment
US20200074205A1 (en) Methods and apparatuses for vehicle appearance feature recognition, methods and apparatuses for vehicle retrieval, storage medium, and electronic devices
US20190279014A1 (en) Method and apparatus for detecting object keypoint, and electronic device
US20230410493A1 (en) Image processing system, image processing method, and program storage medium
US20150235092A1 (en) Parts based object tracking method and apparatus
US20230134967A1 (en) Method for recognizing activities using separate spatial and temporal attention weights
WO2016145591A1 (en) Moving object detection based on motion blur
US11822621B2 (en) Systems and methods for training a machine-learning-based monocular depth estimator
US9129152B2 (en) Exemplar-based feature weighting
CN110910445B (en) Object size detection method, device, detection equipment and storage medium
US20150030231A1 (en) Method for Data Segmentation using Laplacian Graphs
CN113420682A (en) Target detection method and device in vehicle-road cooperation and road side equipment
CN109711441B (en) Image classification method and device, storage medium and electronic equipment
CN109255382B (en) Neural network system, method and device for picture matching positioning
US9081800B2 (en) Object detection via visual search
WO2016106595A1 (en) Moving object detection in videos
US20210216829A1 (en) Object likelihood estimation device, method, and program
CN113869163B (en) Target tracking method and device, electronic equipment and storage medium
US20190251703A1 (en) Method of angle detection
CN112861940A (en) Binocular disparity estimation method, model training method and related equipment
Shi et al. The Augmented Lagrange Multiplier for robust visual tracking with sparse representation
CN111353464B (en) Object detection model training and object detection method and device
CN113469025B (en) Target detection method and device applied to vehicle-road cooperation, road side equipment and vehicle
Zhu et al. Robust multi-feature visual tracking with a saliency-based target descriptor

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14909406

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2014909406

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE